1. 29 May, 2024 5 commits
  2. 23 May, 2024 5 commits
  3. 22 May, 2024 2 commits
  4. 21 May, 2024 1 commit
  5. 20 May, 2024 2 commits
    • Lang Yu's avatar
      drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs · eb853413
      Lang Yu authored
      Small APUs(i.e., consumer, embedded products) usually have a small
      carveout device memory which can't satisfy most compute workloads
      memory allocation requirements.
      
      We can't even run a Basic MNIST Example with a default 512MB carveout.
      https://github.com/pytorch/examples/tree/main/mnist. Error Log:
      
      "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate
      84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes
      is free. Of the allocated memory 103.83 MiB is allocated by PyTorch,
      and 22.17 MiB is reserved by PyTorch but unallocated"
      
      Though we can change BIOS settings to enlarge carveout size,
      which is inflexible and may bring complaint. On the other hand,
      the memory resource can't be effectively used between host and device.
      
      The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
      Then device and host can flexibly and effectively share memory resource.
      
      v2: Report local_mem_size_private as 0. (Felix)
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      eb853413
    • Lang Yu's avatar
      drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms · 2a705f3e
      Lang Yu authored
      Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
      Two attachments use the same VM, root PD would be locked twice.
      
      [   57.910418] Call Trace:
      [   57.793726]  ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu]
      [   57.793820]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu]
      [   57.793923]  ? idr_get_next_ul+0xbe/0x100
      [   57.793933]  kfd_process_device_free_bos+0x7e/0xf0 [amdgpu]
      [   57.794041]  kfd_process_wq_release+0x2ae/0x3c0 [amdgpu]
      [   57.794141]  ? process_scheduled_works+0x29c/0x580
      [   57.794147]  process_scheduled_works+0x303/0x580
      [   57.794157]  ? __pfx_worker_thread+0x10/0x10
      [   57.794160]  worker_thread+0x1a2/0x370
      [   57.794165]  ? __pfx_worker_thread+0x10/0x10
      [   57.794167]  kthread+0x11b/0x150
      [   57.794172]  ? __pfx_kthread+0x10/0x10
      [   57.794177]  ret_from_fork+0x3d/0x60
      [   57.794181]  ? __pfx_kthread+0x10/0x10
      [   57.794184]  ret_from_fork_asm+0x1b/0x30
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      2a705f3e
  6. 19 May, 2024 2 commits
  7. 16 May, 2024 2 commits
  8. 13 May, 2024 11 commits
  9. 10 May, 2024 3 commits
  10. 09 May, 2024 2 commits
  11. 08 May, 2024 2 commits
  12. 07 May, 2024 3 commits