1. 01 Mar, 2024 10 commits
    • Boris Brezillon's avatar
      drm/panthor: Add the scheduler logical block · de854881
      Boris Brezillon authored
      This is the piece of software interacting with the FW scheduler, and
      taking care of some scheduling aspects when the FW comes short of slots
      scheduling slots. Indeed, the FW only expose a few slots, and the kernel
      has to give all submission contexts, a chance to execute their jobs.
      
      The kernel-side scheduler is timeslice-based, with a round-robin queue
      per priority level.
      
      Job submission is handled with a 1:1 drm_sched_entity:drm_gpu_scheduler,
      allowing us to delegate the dependency tracking to the core.
      
      All the gory details should be documented inline.
      
      v6:
      - Add Maxime's and Heiko's acks
      - Make sure the scheduler is initialized before queueing the tick work
        in the MMU fault handler
      - Keep header inclusion alphabetically ordered
      
      v5:
      - Fix typos
      - Call panthor_kernel_bo_destroy(group->syncobjs) unconditionally
      - Don't move the group to the waiting list tail when it was already
        waiting for a different syncobj
      - Fix fatal_queues flagging in the tiler OOM path
      - Don't warn when more than one job timesout on a group
      - Add a warning message when we fail to allocate a heap chunk
      - Add Steve's R-b
      
      v4:
      - Check drmm_mutex_init() return code
      - s/drm_gem_vmap_unlocked/drm_gem_vunmap_unlocked/ in
        panthor_queue_put_syncwait_obj()
      - Drop unneeded WARN_ON() in cs_slot_sync_queue_state_locked()
      - Use atomic_xchg() instead of atomic_fetch_and(0)
      - Fix typos
      - Let panthor_kernel_bo_destroy() check for IS_ERR_OR_NULL() BOs
      - Defer TILER_OOM event handling to a separate workqueue to prevent
        deadlocks when the heap chunk allocation is blocked on mem-reclaim.
        This is just a temporary solution, until we add support for
        non-blocking/failable allocations
      - Pass the scheduler workqueue to drm_sched instead of instantiating
        a separate one (no longer needed now that heap chunk allocation
        happens on a dedicated wq)
      - Set WQ_MEM_RECLAIM on the scheduler workqueue, so we can handle
        job timeouts when the system is under mem pressure, and hopefully
        free up some memory retained by these jobs
      
      v3:
      - Rework the FW event handling logic to avoid races
      - Make sure MMU faults kill the group immediately
      - Use the panthor_kernel_bo abstraction for group/queue buffers
      - Make in_progress an atomic_t, so we can check it without the reset lock
        held
      - Don't limit the number of groups per context to the FW scheduler
        capacity. Fix the limit to 128 for now.
      - Add a panthor_job_vm() helper
      - Account for panthor_vm changes
      - Add our job fence as DMA_RESV_USAGE_WRITE to all external objects
        (was previously DMA_RESV_USAGE_BOOKKEEP). I don't get why, given
        we're supposed to be fully-explicit, but other drivers do that, so
        there must be a good reason
      - Account for drm_sched changes
      - Provide a panthor_queue_put_syncwait_obj()
      - Unconditionally return groups to their idle list in
        panthor_sched_suspend()
      - Condition of sched_queue_{,delayed_}work fixed to be only when a reset
        isn't pending or in progress.
      - Several typos in comments fixed.
      Co-developed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-11-boris.brezillon@collabora.com
      de854881
    • Boris Brezillon's avatar
      drm/panthor: Add the heap logical block · 9cca48fa
      Boris Brezillon authored
      Tiler heap growing requires some kernel driver involvement: when the
      tiler runs out of heap memory, it will raise an exception which is
      either directly handled by the firmware if some free heap chunks are
      available in the heap context, or passed back to the kernel otherwise.
      The heap helpers will be used by the scheduler logic to allocate more
      heap chunks to a heap context, when such a situation happens.
      
      Heap context creation is explicitly requested by userspace (using
      the TILER_HEAP_CREATE ioctl), and the returned context is attached to a
      queue through some command stream instruction.
      
      All the kernel does is keep the list of heap chunks allocated to a
      context, so they can be freed when TILER_HEAP_DESTROY is called, or
      extended when the FW requests a new chunk.
      
      v6:
      - Add Maxime's and Heiko's acks
      
      v5:
      - Fix FIXME comment
      - Add Steve's R-b
      
      v4:
      - Rework locking to allow concurrent calls to panthor_heap_grow()
      - Add a helper to return a heap chunk if we couldn't pass it to the
        FW because the group was scheduled out
      
      v3:
      - Add a FIXME for the heap OOM deadlock
      - Use the panthor_kernel_bo abstraction for the heap context and heap
        chunks
      - Drop the panthor_heap_gpu_ctx struct as it is opaque to the driver
      - Ensure that the heap context is aligned to the GPU cache line size
      - Minor code tidy ups
      Co-developed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-10-boris.brezillon@collabora.com
      9cca48fa
    • Boris Brezillon's avatar
      drm/panthor: Add the FW logical block · 2718d918
      Boris Brezillon authored
      Contains everything that's FW related, that includes the code dealing
      with the microcontroller unit (MCU) that's running the FW, and anything
      related to allocating memory shared between the FW and the CPU.
      
      A few global FW events are processed in the IRQ handler, the rest is
      forwarded to the scheduler, since scheduling is the primary reason for
      the FW existence, and also the main source of FW <-> kernel
      interactions.
      
      v6:
      - Add Maxime's and Heiko's acks
      - Keep header inclusion alphabetically ordered
      
      v5:
      - Fix typo in GLB_PERFCNT_SAMPLE definition
      - Fix unbalanced panthor_vm_idle/active() calls
      - Fallback to a slow reset when the fast reset fails
      - Add extra information when reporting a FW boot failure
      
      v4:
      - Add a MODULE_FIRMWARE() entry for gen 10.8
      - Fix a wrong return ERR_PTR() in panthor_fw_load_section_entry()
      - Fix typos
      - Add Steve's R-b
      
      v3:
      - Make the FW path more future-proof (Liviu)
      - Use one waitqueue for all FW events
      - Simplify propagation of FW events to the scheduler logic
      - Drop the panthor_fw_mem abstraction and use panthor_kernel_bo instead
      - Account for the panthor_vm changes
      - Replace magic number with 0x7fffffff with ~0 to better signify that
        it's the maximum permitted value.
      - More accurate rounding when computing the firmware timeout.
      - Add a 'sub iterator' helper function. This also adds a check that a
        firmware entry doesn't overflow the firmware image.
      - Drop __packed from FW structures, natural alignment is good enough.
      - Other minor code improvements.
      Co-developed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-9-boris.brezillon@collabora.com
      2718d918
    • Boris Brezillon's avatar
      drm/panthor: Add the MMU/VM logical block · 647810ec
      Boris Brezillon authored
      MMU and VM management is related and placed in the same source file.
      
      Page table updates are delegated to the io-pgtable-arm driver that's in
      the iommu subsystem.
      
      The VM management logic is based on drm_gpuva_mgr, and is assuming the
      VA space is mostly managed by the usermode driver, except for a reserved
      portion of this VA-space that's used for kernel objects (like the heap
      contexts/chunks).
      
      Both asynchronous and synchronous VM operations are supported, and
      internal helpers are exposed to allow other logical blocks to map their
      buffers in the GPU VA space.
      
      There's one VM_BIND queue per-VM (meaning the Vulkan driver can only
      expose one sparse-binding queue), and this bind queue is managed with
      a 1:1 drm_sched_entity:drm_gpu_scheduler, such that each VM gets its own
      independent execution queue, avoiding VM operation serialization at the
      device level (things are still serialized at the VM level).
      
      The rest is just implementation details that are hopefully well explained
      in the documentation.
      
      v6:
      - Add Maxime's and Heiko's acks
      - Add Steve's R-b
      - Adjust the TRANSCFG value to account for SW VA space limitation on
        32-bit systems
      - Keep header inclusion alphabetically ordered
      
      v5:
      - Fix a double panthor_vm_cleanup_op_ctx() call
      - Fix a race between panthor_vm_prepare_map_op_ctx() and
        panthor_vm_bo_put()
      - Fix panthor_vm_pool_destroy_vm() kernel doc
      - Fix paddr adjustment in panthor_vm_map_pages()
      - Fix bo_offset calculation in panthor_vm_get_bo_for_va()
      
      v4:
      - Add an helper to return the VM state
      - Check drmm_mutex_init() return code
      - Remove the VM from the AS reclaim list when panthor_vm_active() is
        called
      - Count the number of active VM users instead of considering there's
        at most one user (several scheduling groups can point to the same
        vM)
      - Pre-allocate a VMA object for unmap operations (unmaps can trigger
        a sm_step_remap() call)
      - Check vm->root_page_table instead of vm->pgtbl_ops to detect if
        the io-pgtable is trying to allocate the root page table
      - Don't memset() the va_node in panthor_vm_alloc_va(), make it a
        caller requirement
      - Fix the kernel doc in a few places
      - Drop the panthor_vm::base offset constraint and modify
        panthor_vm_put() to explicitly check for a NULL value
      - Fix unbalanced vm_bo refcount in panthor_gpuva_sm_step_remap()
      - Drop stale comments about the shared_bos list
      - Patch mmu_features::va_bits on 32-bit builds to reflect the
        io_pgtable limitation and let the UMD know about it
      
      v3:
      - Add acks for the MIT/GPL2 relicensing
      - Propagate MMU faults to the scheduler
      - Move pages pinning/unpinning out of the dma_signalling path
      - Fix 32-bit support
      - Rework the user/kernel VA range calculation
      - Make the auto-VA range explicit (auto-VA range doesn't cover the full
        kernel-VA range on the MCU VM)
      - Let callers of panthor_vm_alloc_va() allocate the drm_mm_node
        (embedded in panthor_kernel_bo now)
      - Adjust things to match the latest drm_gpuvm changes (extobj tracking,
        resv prep and more)
      - Drop the per-AS lock and use slots_lock (fixes a race on vm->as.id)
      - Set as.id to -1 when reusing an address space from the LRU list
      - Drop misleading comment about page faults
      - Remove check for irq being assigned in panthor_mmu_unplug()
      Co-developed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
      Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
      Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-8-boris.brezillon@collabora.com
      647810ec
    • Boris Brezillon's avatar
      drm/panthor: Add the devfreq logical block · fac9b22d
      Boris Brezillon authored
      Every thing related to devfreq in placed in panthor_devfreq.c, and
      helpers that can be called by other logical blocks are exposed through
      panthor_devfreq.h.
      
      This implementation is loosely based on the panfrost implementation,
      the only difference being that we don't count device users, because
      the idle/active state will be managed by the scheduler logic.
      
      v6:
      - Add Maxime's and Heiko's acks
      - Keep header inclusion alphabetically ordered
      
      v4:
      - Add Clément's A-b for the relicensing
      
      v3:
      - Add acks for the MIT/GPL2 relicensing
      
      v2:
      - Added in v2
      
      Cc: Clément Péron <peron.clem@gmail.com> # MIT+GPL2 relicensing
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
      Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
      Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
      Acked-by: Clément Péron <peron.clem@gmail.com> # MIT+GPL2 relicensing
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-7-boris.brezillon@collabora.com
      fac9b22d
    • Boris Brezillon's avatar
      drm/panthor: Add GEM logical block · 8a1cc075
      Boris Brezillon authored
      Anything relating to GEM object management is placed here. Nothing
      particularly interesting here, given the implementation is based on
      drm_gem_shmem_object, which is doing most of the work.
      
      v6:
      - Add Maxime's and Heiko's acks
      - Return a page-aligned BO size to userspace when creating a BO
      - Keep header inclusion alphabetically ordered
      
      v5:
      - Add Liviu's and Steve's R-b
      
      v4:
      - Force kernel BOs to be GPU mapped
      - Make panthor_kernel_bo_destroy() robust against ERR/NULL BO pointers
        to simplify the call sites
      
      v3:
      - Add acks for the MIT/GPL2 relicensing
      - Provide a panthor_kernel_bo abstraction for buffer objects managed by
        the kernel (will replace panthor_fw_mem and be used everywhere we were
        using panthor_gem_create_and_map() before)
      - Adjust things to match drm_gpuvm changes
      - Change return of panthor_gem_create_with_handle() to int
      Co-developed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
      Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
      Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
      Reviewed-by: default avatarLiviu Dudau <liviu.dudau@arm.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-6-boris.brezillon@collabora.com
      8a1cc075
    • Boris Brezillon's avatar
      drm/panthor: Add the GPU logical block · 5cd894e2
      Boris Brezillon authored
      Handles everything that's not related to the FW, the MMU or the
      scheduler. This is the block dealing with the GPU property retrieval,
      the GPU block power on/off logic, and some global operations, like
      global cache flushing.
      
      v6:
      - Add Maxime's and Heiko's acks
      
      v5:
      - Fix GPU_MODEL() kernel doc
      - Fix test in panthor_gpu_block_power_off()
      - Add Steve's R-b
      
      v4:
      - Expose CORE_FEATURES through DEV_QUERY
      
      v3:
      - Add acks for the MIT/GPL2 relicensing
      - Use macros to extract GPU ID info
      - Make sure we reset clear pending_reqs bits when wait_event_timeout()
        times out but the corresponding bit is cleared in GPU_INT_RAWSTAT
        (can happen if the IRQ is masked or HW takes to long to call the IRQ
        handler)
      - GPU_MODEL now takes separate arch and product majors to be more
        readable.
      - Drop GPU_IRQ_MCU_STATUS_CHANGED from interrupt mask.
      - Handle GPU_IRQ_PROTM_FAULT correctly (don't output registers that are
        not updated for protected interrupts).
      - Minor code tidy ups
      
      Cc: Alexey Sheplyakov <asheplyakov@basealt.ru> # MIT+GPL2 relicensing
      Co-developed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
      Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
      Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-5-boris.brezillon@collabora.com
      5cd894e2
    • Boris Brezillon's avatar
      drm/panthor: Add the device logical block · 5fe909ca
      Boris Brezillon authored
      The panthor driver is designed in a modular way, where each logical
      block is dealing with a specific HW-block or software feature. In order
      for those blocks to communicate with each other, we need a central
      panthor_device collecting all the blocks, and exposing some common
      features, like interrupt handling, power management, reset, ...
      
      This what this panthor_device logical block is about.
      
      v6:
      - Add Maxime's and Heiko's acks
      - Keep header inclusion alphabetically ordered
      
      v5:
      - Suspend the MMU/GPU blocks if panthor_fw_resume() fails in
        panthor_device_resume()
      - Move the pm_runtime_use_autosuspend() call before drm_dev_register()
      - Add Liviu's R-b
      
      v4:
      - Check drmm_mutex_init() return code
      - Fix panthor_device_reset_work() out path
      - Fix the race in the unplug logic
      - Fix typos
      - Unplug blocks when something fails in panthor_device_init()
      - Add Steve's R-b
      
      v3:
      - Add acks for the MIT+GPL2 relicensing
      - Fix 32-bit support
      - Shorten the sections protected by panthor_device::pm::mmio_lock to fix
        lock ordering issues.
      - Rename panthor_device::pm::lock into panthor_device::pm::mmio_lock to
        better reflect what this lock is protecting
      - Use dev_err_probe()
      - Make sure we call drm_dev_exit() when something fails half-way in
        panthor_device_reset_work()
      - Replace CSF_GPU_LATEST_FLUSH_ID_DEFAULT with a constant '1' and a
        comment to explain. Also remove setting the dummy flush ID on suspend.
      - Remove drm_WARN_ON() in panthor_exception_name()
      - Check pirq->suspended in panthor_xxx_irq_raw_handler()
      Co-developed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
      Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
      Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Reviewed-by: default avatarLiviu Dudau <liviu.dudau@arm.com>
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-4-boris.brezillon@collabora.com
      5fe909ca
    • Boris Brezillon's avatar
      drm/panthor: Add GPU register definitions · 546b3666
      Boris Brezillon authored
      Those are the registers directly accessible through the MMIO range.
      
      FW registers are exposed in panthor_fw.h.
      
      v6:
      - Add Maxime's and Heiko's acks
      
      v4:
      - Add the CORE_FEATURES register (needed for GPU variants)
      - Add Steve's R-b
      
      v3:
      - Add macros to extract GPU ID info
      - Formatting changes
      - Remove AS_TRANSCFG_ADRMODE_LEGACY - it doesn't exist post-CSF
      - Remove CSF_GPU_LATEST_FLUSH_ID_DEFAULT
      - Add GPU_L2_FEATURES_LINE_SIZE for extracting the GPU cache line size
      Co-developed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
      Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
      Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-3-boris.brezillon@collabora.com
      546b3666
    • Boris Brezillon's avatar
      drm/panthor: Add uAPI · 0f25e493
      Boris Brezillon authored
      Panthor follows the lead of other recently submitted drivers with
      ioctls allowing us to support modern Vulkan features, like sparse memory
      binding:
      
      - Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET),
        with the 'exclusive-VM' bit to speed-up BO reservation on job submission
      - VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND
        ioctl is loosely based on the Xe model, and can handle both
        asynchronous and synchronous requests
      - GPU execution context creation/destruction, tiler heap context creation
        and job submission. Those ioctls reflect how the hardware/scheduler
        works and are thus driver specific.
      
      We also have a way to expose IO regions, such that the usermode driver
      can directly access specific/well-isolate registers, like the
      LATEST_FLUSH register used to implement cache-flush reduction.
      
      This uAPI intentionally keeps usermode queues out of the scope, which
      explains why doorbell registers and command stream ring-buffers are not
      directly exposed to userspace.
      
      v6:
      - Add Maxime's and Heiko's acks
      
      v5:
      - Fix typo
      - Add Liviu's R-b
      
      v4:
      - Add a VM_GET_STATE ioctl
      - Fix doc
      - Expose the CORE_FEATURES register so we can deal with variants in the
        UMD
      - Add Steve's R-b
      
      v3:
      - Add the concept of sync-only VM operation
      - Fix support for 32-bit userspace
      - Rework drm_panthor_vm_create to pass the user VA size instead of
        the kernel VA size (suggested by Robin Murphy)
      - Typo fixes
      - Explicitly cast enums with top bit set to avoid compiler warnings in
        -pedantic mode.
      - Drop property core_group_count as it can be easily calculated by the
        number of bits set in l2_present.
      Co-developed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Reviewed-by: default avatarLiviu Dudau <liviu.dudau@arm.com>
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Acked-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-2-boris.brezillon@collabora.com
      0f25e493
  2. 29 Feb, 2024 3 commits
  3. 28 Feb, 2024 19 commits
  4. 26 Feb, 2024 8 commits
    • Jocelyn Falempe's avatar
      drm/mgag200: Add a workaround for low-latency · bfa4437f
      Jocelyn Falempe authored
      We found a regression in v5.10 on real-time server, using the
      rt-kernel and the mgag200 driver. It's some really specialized
      workload, with <10us latency expectation on isolated core.
      After the v5.10, the real time tasks missed their <10us latency
      when something prints on the screen (fbcon or printk)
      
      The regression has been bisected to 2 commits:
      commit 0b34d58b ("drm/mgag200: Enable caching for SHMEM pages")
      commit 4862ffae ("drm/mgag200: Move vmap out of commit tail")
      
      The first one changed the system memory framebuffer from Write-Combine
      to the default caching.
      Before the second commit, the mgag200 driver used to unmap the
      framebuffer after each frame, which implicitly does a cache flush.
      Both regressions are fixed by this commit, which restore WC mapping
      for the framebuffer in system memory, and add a cache flush.
      This is only needed on x86_64, for low-latency workload,
      so the new kconfig DRM_MGAG200_IOBURST_WORKAROUND depends on
      PREEMPT_RT and X86.
      
      For more context, the whole thread can be found here [1]
      Signed-off-by: default avatarJocelyn Falempe <jfalempe@redhat.com>
      Link: https://lore.kernel.org/dri-devel/20231019135655.313759-1-jfalempe@redhat.com/ # 1
      Acked-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240208095125.377908-1-jfalempe@redhat.com
      bfa4437f
    • Thomas Zimmermann's avatar
      Merge drm/drm-next into drm-misc-next · 04751849
      Thomas Zimmermann authored
      Backmerging to get drm-misc-next up to v6.8-rc6.
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      04751849
    • Maxime Ripard's avatar
      drm/edid/firmware: Remove built-in EDIDs · 89ac522d
      Maxime Ripard authored
      The EDID firmware loading mechanism introduced a few built-in EDIDs that
      could be forced on any connector, bypassing the EDIDs it exposes.
      
      While convenient, this limited set of EDIDs doesn't take into account
      the connector type, and we can end up with an EDID that is completely
      invalid for a given connector.
      
      For example, the edid/800x600.bin file matches the following EDID:
      
        edid-decode (hex):
      
        00 ff ff ff ff ff ff 00 31 d8 00 00 00 00 00 00
        05 16 01 03 6d 1b 14 78 ea 5e c0 a4 59 4a 98 25
        20 50 54 01 00 00 45 40 01 01 01 01 01 01 01 01
        01 01 01 01 01 01 a0 0f 20 00 31 58 1c 20 28 80
        14 00 15 d0 10 00 00 1e 00 00 00 ff 00 4c 69 6e
        75 78 20 23 30 0a 20 20 20 20 00 00 00 fd 00 3b
        3d 24 26 05 00 0a 20 20 20 20 20 20 00 00 00 fc
        00 4c 69 6e 75 78 20 53 56 47 41 0a 20 20 00 c2
      
        ----------------
      
        Block 0, Base EDID:
          EDID Structure Version & Revision: 1.3
          Vendor & Product Identification:
            Manufacturer: LNX
            Model: 0
            Made in: week 5 of 2012
          Basic Display Parameters & Features:
            Analog display
            Signal Level Standard: 0.700 : 0.000 : 0.700 V p-p
            Blank level equals black level
            Sync: Separate Composite Serration
            Maximum image size: 27 cm x 20 cm
            Gamma: 2.20
            DPMS levels: Standby Suspend Off
            RGB color display
            First detailed timing is the preferred timing
          Color Characteristics:
            Red  : 0.6416, 0.3486
            Green: 0.2919, 0.5957
            Blue : 0.1474, 0.1250
            White: 0.3125, 0.3281
          Established Timings I & II:
            DMT 0x09:   800x600    60.316541 Hz   4:3     37.879 kHz     40.000000 MHz
          Standard Timings:
            DMT 0x09:   800x600    60.316541 Hz   4:3     37.879 kHz     40.000000 MHz
          Detailed Timing Descriptors:
            DTD 1:   800x600    60.316541 Hz   4:3     37.879 kHz     40.000000 MHz (277 mm x 208 mm)
                         Hfront   40 Hsync 128 Hback   88 Hpol P
                         Vfront    1 Vsync   4 Vback   23 Vpol P
            Display Product Serial Number: 'Linux #0'
            Display Range Limits:
              Monitor ranges (GTF): 59-61 Hz V, 36-38 kHz H, max dotclock 50 MHz
            Display Product Name: 'Linux SVGA'
        Checksum: 0xc2
      
      So, an analog monitor EDID. However, if the connector was an HDMI
      monitor for example, it breaks the HDMI specification that requires,
      among other things, a digital display, the VIC 1 mode and an HDMI Forum
      Vendor Specific Data Block in an CTA-861 extension.
      
      We thus end up with a completely invalid EDID, which thus might confuse
      HDMI-related code that could parse it.
      
      After some discussions on IRC, we identified mainly two ways to fix
      this:
      
        - We can either create more EDIDs for each connector type to provide
          a built-in EDID that matches the resolution passed in the name, and
          still be a sensible EDID for that connector type;
      
        - Or we can just prevent the EDID to be exposed to userspace if it's
          built-in.
      
      Or possibly both.
      
      However, the conclusion was that maybe we just don't need the built-in
      EDIDs at all and we should just get rid of them. So here we are.
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Acked-by: default avatarPekka Paalanen <pekka.paalanen@collabora.com>
      Acked-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Signed-off-by: default avatarMaxime Ripard <mripard@kernel.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240221092636.691701-1-mripard@kernel.org
      89ac522d
    • Daniel Vetter's avatar
      Merge v6.8-rc6 into drm-next · f112b68f
      Daniel Vetter authored
      Thomas Zimmermann asked to backmerge -rc6 for drm-misc branches,
      there's a few same-area-changed conflicts (xe and amdgpu mostly) that
      are getting a bit too annoying.
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      f112b68f
    • Daniel Vetter's avatar
      Merge tag 'drm-habanalabs-next-2024-02-26' of... · aa775edb
      Daniel Vetter authored
      Merge tag 'drm-habanalabs-next-2024-02-26' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
      
      This tag contains habanalabs driver and accel changes for v6.9.
      
      The notable changes are:
      
      - New features and improvements:
        - Configure interrupt affinity according to NUMA nodes for the MSI-X interrupts that are
          assigned to the userspace application which acquires the device.
        - Move the HBM MMU page tables to reside inside the HBM to minimize latency when doing
          page-walks.
        - Improve the device reset mechanism when consecutive heartbeat failures occur (firmware
          fails to ack on heartbeat message).
        - Check also extended errors in the PCIe addr_dec interrupt information.
        - Rate limit the error messages that can be printed to dmesg log by userspace actions.
      
      - Firmware related fixes:
        - Handle requests from firmware to reserve device memory
      
      - Bug fixes and code cleanups:
        - constify the struct device_type usage in accel (accel_sysfs_device_minor).
        - Fix the PCI health check by reading uncached register.
        - Fix reporting of drain events.
        - Fix debugfs files permissions.
        - Fix calculation of DRAM BAR base address.
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      From: Oded Gabbay <ogabbay@kernel.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/ZdxJprop0EniVQtf@ogabbay-vm-u22.habana-labs.com
      aa775edb
    • Daniel Vetter's avatar
      Merge tag 'drm-xe-next-2024-02-25' of ssh://gitlab.freedesktop.org/drm/xe/kernel into drm-next · 19b232b9
      Daniel Vetter authored
      drm/xe feature pull for v6.9:
      
      UAPI Changes:
      
      - New query to the GuC firmware submission version. (José Roberto de Souza)
      - Remove unused persistent exec_queues (Thomas Hellström)
      - Add vram frequency sysfs attributes (Sujaritha Sundaresan, Rodrigo Vivi)
      - Add the flag XE_VM_BIND_FLAG_DUMPABLE to notify devcoredump that mapping
        should be dumped (Maarten Lankhorst)
      
      Cross-drivers Changes:
      
      - Make sure intel_wakeref_t is treated as opaque type on i915-display
        and fix its type on xe
      
      Driver Changes:
      
      - Drop pre-production workarounds (Matt Roper)
      - Drop kunit tests for unsuported platforms: PVC and pre-production DG2 (Lucas De Marchi)
      - Start pumbling SR-IOV support with memory based interrupts
        for VF (Michal Wajdeczko)
      - Allow to map BO in GGTT with PAT index corresponding to
        XE_CACHE_UC to work with memory based interrupts (Michal Wajdeczko)
      - Improve logging with GT-oriented drm_printers (Michal Wajdeczko)
      - Add GuC Doorbells Manager as prep work SR-IOV during
        VF provisioning ((Michal Wajdeczko)
      - Refactor fake device handling in kunit integration ((Michal Wajdeczko)
      - Implement additional workarounds for xe2 and MTL (Tejas Upadhyay,
        Lucas De Marchi, Shekhar Chauhan, Karthik Poosa)
      - Program a few registers according to perfomance guide spec for Xe2 (Shekhar Chauhan)
      - Add error handling for non-blocking communication with GuC (Daniele Ceraolo Spurio)
      - Fix remaining 32b build issues and enable it back (Lucas De  Marchi)
      - Fix build with CONFIG_DEBUG_FS=n (Jani Nikula)
      - Fix warnings from GuC ABI headers (Matthew Brost)
      - Introduce Relay Communication for SR-IOV for VF <-> GuC <-> PF (Michal Wajdeczko)
      - Add mocs reset kunit (Ruthuvikas Ravikumar)
      - Fix spellings (Colin Ian King)
      - Disable mid-thread preemption when not properly supported by hardware (Nirmoy Das)
      - Release mmap mappings on rpm suspend (Badal Nilawar)
      - Fix BUG_ON on xe_exec by moving fence reservation to the validate stage (Matthew Auld)
      - Fix xe_exec by reserving extra fence slot for CPU bind (Matthew Brost)
      - Fix xe_exec with full long running exec queue, now returning
        -EWOULDBLOCK to userspace (Matthew Brost)
      - Fix CT irq handler when CT is disabled (Matthew Brost)
      - Fix VM_BIND_OP_UNMAP_ALL without any bound vmas (Thomas Hellström)
      - Fix missing __iomem annotations (Thomas Hellström)
      - Fix exec queue priority handling with GuC (Brian Welty)
      - Fix setting SLPC flag to GuC when it's not supported (Vinay Belgaumkar)
      - Fix C6 disabling without SLPC (Matt Roper)
      - Drop -Wstringop-overflow to fix build with GCC11 (Paul E. McKenney)
      - Circumvent bogus -Wstringop-overflow in one case (Arnd Bergmann)
      - Refactor exec_queue user extensions handling and fix USM attributes
        being applied too late (Brian Welty)
      - Use circ_buf head/tail convention (Matthew Brost)
      - Fail build if circ_buf-related defines are modified with incompatible values
        (Matthew Brost)
      - Fix several error paths (Dan Carpenter)
      - Fix CCS copy for small VRAM copy chunks (Thomas Hellström)
      - Rework driver initialization order and paths to account for driver running
        in VF mode (Michal Wajdeczko)
      - Initialize GuC earlier during probe to handle driver in VF mode (Michał Winiarski)
      - Fix migration use of MI_STORE_DATA_IMM to write PTEs (Matt Roper)
      - Fix bounds checking in __xe_bo_placement_for_flags (Brian Welty)
      - Drop display dependency on CONFIG_EXPERT (Jani Nikula)
      - Do not hand-roll kstrdup when creating snapshot (Michal Wajdeczko)
      - Stop creating one kunit module per kunit suite (Lucas De Marchi)
      - Reduce scope and constify variables (Thomas Hellström, Jani Nikula, Michal Wajdeczko)
      - Improve and document xe_guc_ct_send_recv() (Michal Wajdeczko)
      - Add proxy communication between CSME and GSC uC (Daniele Ceraolo Spurio)
      - Fix size calculation when writing pgtable (Fei Yang)
      - Make sure cfb is page size aligned in stolen memory (Vinod Govindapillai)
      - Stop printing guc log to dmesg when waiting for GuC fails (Rodrigo Vivi)
      - Use XE_CACHE_WB instead of XE_CACHE_NONE for cpu coherency on migration
        (Himal Prasad Ghimiray)
      - Fix error path in xe_vm_create (Moti Haimovski)
      - Fix warnings in doc generation (Thomas Hellström, Badal Nilawar)
      - Improve devcoredump content for mesa debugging (José Roberto de Souza)
      - Fix crash in trace_dma_fence_init() (José Roberto de Souza)
      - Improve CT state change handling (Matthew Brost)
      - Toggle USM support for Xe2 (Lucas De Marchi)
      - Reduces code duplication to emit PIPE_CONTROL (José Roberto de Souza)
      - Canonicalize addresses where needed for Xe2 and add to devcoredump
        (José Roberto de Souza)
      - Only allow 1 ufence per exec / bind IOCTL (Matthew Brost)
      - Move all display code to display/ (Jani Nikula)
      - Fix sparse warnings by correctly using annotations (Thomas Hellström)
      - Warn on job timeouts instead of using asserts (Matt Roper)
      - Prefix macros to avoid clashes with sparc (Matthew Brost)
      - Fix -Walloc-size by subclassing instead of allocating size smaller than struct (Thomas Hellström)
      - Add status check during gsc header readout (Suraj Kandpal)
      - Fix infinite loop in vm_bind_ioctl_ops_unwind() (Matthew Brost)
      - Fix fence refcounting (Matthew Brost)
      - Fix picking incorrect userptr VMA (Matthew Brost)
      - Fix USM on integrated by mapping both mem.kernel_bb_pool and usm.bb_pool (Matthew Brost)
      - Fix double initialization of display power domains (Xiaoming Wang)
      - Check expected uC versions by major.minor.patch instead of just major.minor (John Harrison)
      - Bump minimum GuC version to 70.19.2 for all platforms under force-probe
        (John Harrison)
      - Add GuC firmware loading for Lunar Lake (John Harrison)
      - Use kzalloc() instead of hand-rolled alloc + memset (Nirmoy Das)
      - Fix max page size of VMA during a REMAP (Matthew Brost)
      - Don't ignore error when pinning pages in kthread (Matthew Auld)
      - Refactor xe hwmon (Karthik Poosa)
      - Add debug logs for D3cold (Riana Tauro)
      - Remove broken TEST_VM_ASYNC_OPS_ERROR (Matthew Brost)
      - Always allow to override firmware blob with module param and improve
        log when no firmware is found (Lucas De Marchi)
      - Fix shift-out-of-bounds due to xe_vm_prepare_vma() accepting zero fences (Thomas Hellström)
      - Fix shift-out-of-bounds by distinguishing xe_pt/xe_pt_dir subclass (Thomas Hellström)
      - Fail driver bind if platform supports MSIX, but fails to allocate all of them (Dani Liberman)
      - Fix intel_fbdev thinking memory is backed by shmem (Matthew Auld)
      - Prefer drm_dbg() over dev_dbg() (Jani Nikula)
      - Avoid function cast warnings with clang-16 (Arnd Bergmann)
      - Enhance xe_bo_move trace (Priyanka Dandamudi)
      - Fix xe_vma_set_pte_size() not setting the right gpuva.flags for 4K size (Matthew Brost)
      - Add XE_VMA_PTE_64K VMA flag (Matthew Brost)
      - Return 2MB page size for compact 64k PTEs (Matthew Brost)
      - Remove usage of the deprecated ida_simple_xx() API (Christophe JAILLET)
      - Fix modpost warning on xe_mocs live kunit module (Ashutosh Dixit)
      - Drop extra newline in from sysfs files (Ashutosh Dixit)
      - Implement VM snapshot support for BO's and userptr (Maarten Lankhorst)
      - Add debug logs when skipping rebinds (Matthew Brost)
      - Fix code generation when mixing build directories (Dafna Hirschfeld)
      - Prefer struct_size over open coded arithmetic (Erick Archer)
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      From: Lucas De Marchi <lucas.demarchi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/dbdkrwmcoqqlwftuc3olbauazc3pbamj26wa34puztowsnauoh@i3zms7ut4yuw
      19b232b9
    • Maxime Ripard's avatar
      drm/sun4i: hdmi: Consolidate atomic_check and mode_valid · 358e76fd
      Maxime Ripard authored
      atomic_check and mode_valid do not check for the same things which can
      lead to surprising result if the userspace commits a mode that didn't go
      through mode_valid. Let's merge the two implementations into a function
      called by both.
      Acked-by: default avatarSui Jingfeng <sui.jingfeng@linux.dev>
      Reviewed-by: default avatarJernej Skrabec <jernej.skrabec@gmail.com>
      Signed-off-by: default avatarMaxime Ripard <mripard@kernel.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240222-kms-hdmi-connector-state-v7-35-8f4af575fce2@kernel.org
      358e76fd
    • Maxime Ripard's avatar
      drm/sun4i: hdmi: Switch to container_of_const · c6686f27
      Maxime Ripard authored
      container_of_const() allows to preserve the pointer constness and is
      thus more flexible than inline functions.
      
      Let's switch all our instances of container_of() to container_of_const().
      Reviewed-by: default avatarSui Jingfeng <sui.jingfeng@linux.dev>
      Reviewed-by: default avatarJernej Skrabec <jernej.skrabec@gmail.com>
      Signed-off-by: default avatarMaxime Ripard <mripard@kernel.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240222-kms-hdmi-connector-state-v7-34-8f4af575fce2@kernel.org
      c6686f27