• Jason Gunthorpe's avatar
    mm/hmm: remove the customizable pfn format from hmm_range_fault · 2733ea14
    Jason Gunthorpe authored
    Presumably the intent here was that hmm_range_fault() could put the data
    into some HW specific format and thus avoid some work. However, nothing
    actually does that, and it isn't clear how anything actually could do that
    as hmm_range_fault() provides CPU addresses which must be DMA mapped.
    
    Perhaps there is some special HW that does not need DMA mapping, but we
    don't have any examples of this, and the theoretical performance win of
    avoiding an extra scan over the pfns array doesn't seem worth the
    complexity. Plus pfns needs to be scanned anyhow to sort out any
    DEVICE_PRIVATE pages.
    
    This version replaces the uint64_t with an usigned long containing a pfn
    and fixed flags. On input flags is filled with the HMM_PFN_REQ_* values,
    on successful output it is filled with HMM_PFN_* values, describing the
    state of the pages.
    
    amdgpu is simple to convert, it doesn't use snapshot and doesn't use
    per-page flags.
    
    nouveau uses only 16 hmm_pte entries at most (ie fits in a few cache
    lines), and it sweeps over its pfns array a couple of times anyhow. It
    also has a nasty call chain before it reaches the dma map and hardware
    suggesting performance isn't important:
    
       nouveau_svm_fault():
         args.i.m.method = NVIF_VMM_V0_PFNMAP
         nouveau_range_fault()
          nvif_object_ioctl()
           client->driver->ioctl()
    	  struct nvif_driver nvif_driver_nvkm:
    	    .ioctl = nvkm_client_ioctl
    	   nvkm_ioctl()
    	    nvkm_ioctl_path()
    	      nvkm_ioctl_v0[type].func(..)
    	      nvkm_ioctl_mthd()
    	       nvkm_object_mthd()
    		  struct nvkm_object_func nvkm_uvmm:
    		    .mthd = nvkm_uvmm_mthd
    		   nvkm_uvmm_mthd()
    		    nvkm_uvmm_mthd_pfnmap()
    		     nvkm_vmm_pfn_map()
    		      nvkm_vmm_ptes_get_map()
    		       func == gp100_vmm_pgt_pfn
    			struct nvkm_vmm_desc_func gp100_vmm_desc_spt:
    			  .pfn = gp100_vmm_pgt_pfn
    			 nvkm_vmm_iter()
    			  REF_PTES == func == gp100_vmm_pgt_pfn()
    			    dma_map_page()
    
    Link: https://lore.kernel.org/r/5-v2-b4e84f444c7d+24f57-hmm_no_flags_jgg@mellanox.comAcked-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
    Tested-by: default avatarRalph Campbell <rcampbell@nvidia.com>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    2733ea14
nouveau_dmem.c 16.7 KB