1. 24 Jul, 2024 6 commits
    • ZhenGuo Yin's avatar
      drm/amdgpu: reset vm state machine after gpu reset(vram lost) · 47c0388b
      ZhenGuo Yin authored
      [Why]
      Page table of compute VM in the VRAM will lost after gpu reset.
      VRAM won't be restored since compute VM has no shadows.
      
      [How]
      Use higher 32-bit of vm->generation to record a vram_lost_counter.
      Reset the VM state machine when vm->genertaion is not equal to
      the new generation token.
      
      v2: Check vm->generation instead of calling drm_sched_entity_error
      in amdgpu_vm_validate.
      v3: Use new generation token instead of vram_lost_counter for check.
      Signed-off-by: default avatarZhenGuo Yin <zhenguo.yin@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      47c0388b
    • Srinivasan Shanmugam's avatar
      drm/amd/display: Add null check for set_output_gamma in dcn30_set_output_transfer_func · 08ae395e
      Srinivasan Shanmugam authored
      This commit adds a null check for the set_output_gamma function pointer
      in the  dcn30_set_output_transfer_func function. Previously,
      set_output_gamma was being checked for nullity at line 386, but then it
      was being dereferenced without any nullity check at line 401. This
      could potentially lead to a null pointer dereference error if
      set_output_gamma is indeed null.
      
      To fix this, we now ensure that set_output_gamma is not null before
      dereferencing it. We do this by adding a nullity check for
      set_output_gamma before the call to set_output_gamma at line 401. If
      set_output_gamma is null, we log an error message and do not call the
      function.
      
      This fix prevents a potential null pointer dereference error.
      
      drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn30/dcn30_hwseq.c:401 dcn30_set_output_transfer_func()
      error: we previously assumed 'mpc->funcs->set_output_gamma' could be null (see line 386)
      
      drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn30/dcn30_hwseq.c
          373 bool dcn30_set_output_transfer_func(struct dc *dc,
          374                                 struct pipe_ctx *pipe_ctx,
          375                                 const struct dc_stream_state *stream)
          376 {
          377         int mpcc_id = pipe_ctx->plane_res.hubp->inst;
          378         struct mpc *mpc = pipe_ctx->stream_res.opp->ctx->dc->res_pool->mpc;
          379         const struct pwl_params *params = NULL;
          380         bool ret = false;
          381
          382         /* program OGAM or 3DLUT only for the top pipe*/
          383         if (pipe_ctx->top_pipe == NULL) {
          384                 /*program rmu shaper and 3dlut in MPC*/
          385                 ret = dcn30_set_mpc_shaper_3dlut(pipe_ctx, stream);
          386                 if (ret == false && mpc->funcs->set_output_gamma) {
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If this is NULL
      
          387                         if (stream->out_transfer_func.type == TF_TYPE_HWPWL)
          388                                 params = &stream->out_transfer_func.pwl;
          389                         else if (pipe_ctx->stream->out_transfer_func.type ==
          390                                         TF_TYPE_DISTRIBUTED_POINTS &&
          391                                         cm3_helper_translate_curve_to_hw_format(
          392                                         &stream->out_transfer_func,
          393                                         &mpc->blender_params, false))
          394                                 params = &mpc->blender_params;
          395                          /* there are no ROM LUTs in OUTGAM */
          396                         if (stream->out_transfer_func.type == TF_TYPE_PREDEFINED)
          397                                 BREAK_TO_DEBUGGER();
          398                 }
          399         }
          400
      --> 401         mpc->funcs->set_output_gamma(mpc, mpcc_id, params);
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Then it will crash
      
          402         return ret;
          403 }
      
      Fixes: d99f1387 ("drm/amd/display: Add DCN3 HWSEQ")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Cc: Tom Chung <chiahsuan.chung@amd.com>
      Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Cc: Roman Li <roman.li@amd.com>
      Cc: Hersen Wu <hersenxs.wu@amd.com>
      Cc: Alex Hung <alex.hung@amd.com>
      Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
      Cc: Harry Wentland <harry.wentland@amd.com>
      Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
      Signed-off-by: default avatarSrinivasan Shanmugam <srinivasan.shanmugam@amd.com>
      Reviewed-by: default avatarTom Chung <chiahsuan.chung@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      08ae395e
    • Tim Huang's avatar
      drm/amdgpu: add missed harvest check for VCN IP v4/v5 · 0b071245
      Tim Huang authored
      To prevent below probe failure, add a check for models with VCN
      IP v4.0.6 where VCN1 may be harvested.
      
      v2:
      Apply the same check to VCN IP v4.0 and v5.0.
      
      [   54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
      [   54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43
      c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02
      00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00
      [   54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286
      [   54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX:
      0000000000000000
      [   54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI:
      ffff99a82f680000
      [   54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09:
      0000000000000000
      [   54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12:
      0000000000000000
      [   54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15:
      0000000000000001
      [   54.075400] FS:  00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000)
      knlGS:0000000000000000
      [   54.075988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4:
      0000000000750ef0
      [   54.076927] PKRU: 55555554
      [   54.077132] Call Trace:
      [   54.077319]  <TASK>
      [   54.077484]  ? show_regs+0x69/0x80
      [   54.077747]  ? __die+0x28/0x70
      [   54.077979]  ? page_fault_oops+0x180/0x4b0
      [   54.078286]  ? do_user_addr_fault+0x2d2/0x680
      [   54.078610]  ? exc_page_fault+0x84/0x190
      [   54.078910]  ? asm_exc_page_fault+0x2b/0x30
      [   54.079224]  ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
      [   54.079941]  ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu]
      [   54.080617]  vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu]
      [   54.081316]  amdgpu_device_ip_set_powergating_state+0x64/0xc0
      [amdgpu]
      [   54.082057]  amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu]
      [   54.082727]  amdgpu_ring_alloc+0x44/0x70 [amdgpu]
      [   54.083351]  amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu]
      [   54.084054]  amdgpu_ring_test_helper+0x22/0x90 [amdgpu]
      [   54.084698]  vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu]
      [   54.085307]  amdgpu_device_init+0x1f96/0x2780 [amdgpu]
      [   54.085951]  amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu]
      [   54.086591]  amdgpu_pci_probe+0x19f/0x550 [amdgpu]
      [   54.087215]  local_pci_probe+0x48/0xa0
      [   54.087509]  pci_device_probe+0xc9/0x250
      [   54.087812]  really_probe+0x1a4/0x3f0
      [   54.088101]  __driver_probe_device+0x7d/0x170
      [   54.088443]  driver_probe_device+0x24/0xa0
      [   54.088765]  __driver_attach+0xdd/0x1d0
      [   54.089068]  ? __pfx___driver_attach+0x10/0x10
      [   54.089417]  bus_for_each_dev+0x8e/0xe0
      [   54.089718]  driver_attach+0x22/0x30
      [   54.090000]  bus_add_driver+0x120/0x220
      [   54.090303]  driver_register+0x62/0x120
      [   54.090606]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
      [   54.091255]  __pci_register_driver+0x62/0x70
      [   54.091593]  amdgpu_init+0x67/0xff0 [amdgpu]
      [   54.092190]  do_one_initcall+0x5f/0x330
      [   54.092495]  do_init_module+0x68/0x240
      [   54.092794]  load_module+0x201c/0x2110
      [   54.093093]  init_module_from_file+0x97/0xd0
      [   54.093428]  ? init_module_from_file+0x97/0xd0
      [   54.093777]  idempotent_init_module+0x11c/0x2a0
      [   54.094134]  __x64_sys_finit_module+0x64/0xc0
      [   54.094476]  do_syscall_64+0x58/0x120
      [   54.094767]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      Signed-off-by: default avatarTim Huang <tim.huang@amd.com>
      Reviewed-by: default avatarSaleemkhan Jamadar <saleemkhan.jamadar@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      0b071245
    • Yifan Zhang's avatar
      drm/amdgpu: skip kfd init if GFX is not ready. · 3b37e272
      Yifan Zhang authored
      avoid kfd init crash in that case.
      Signed-off-by: default avatarYifan Zhang <yifan1.zhang@amd.com>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Tested-by: default avatarJesse Zhang <Jesse.Zhang@amd.com>
      Reviewed-by: default avatarJesse Zhang <Jesse.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      3b37e272
    • Philip Yang's avatar
      drm/amdkfd: Validate user queue update · 305cd109
      Philip Yang authored
      Ensure update queue new ring buffer is mapped on GPU with correct size.
      
      Decrease queue old ring_bo queue_refcount and increase new ring_bo
      queue_refcount.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      305cd109
    • Philip Yang's avatar
      drm/amdkfd: Validate user queue svm memory residency · b049504e
      Philip Yang authored
      Queue CWSR area maybe registered to GPU as svm memory, create queue to
      ensure svm mapped to GPU with KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag.
      
      Add queue_refcount to struct svm_range, to track queue CWSR area usage.
      
      Because unmap mmu notifier callback return value is ignored, if
      application unmap the CWSR area while queue is active, pr_warn message
      in dmesg log. To be safe, evict user queue.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      b049504e
  2. 23 Jul, 2024 34 commits