Commits · 0d06132fc84bd91379300768c75a19474e06d0c5 · Kirill Smelkov / linux

26 Jul, 2022 13 commits

PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() · 0d06132f

Logan Gunthorpe authored Jul 08, 2022

This interface is superseded by support in dma_map_sg() which now supports
heterogeneous scatterlists. There are no longer any users, so remove it.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

0d06132f

RDMA/rw: drop pci_p2pdma_[un]map_sg() · 1e97af7f

Logan Gunthorpe authored Jul 08, 2022

dma_map_sg() now supports the use of P2PDMA pages so pci_p2pdma_map_sg()
is no longer necessary and may be dropped. This means the
rdma_rw_[un]map_sg() helpers are no longer necessary. Remove it all.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

1e97af7f

RDMA/core: introduce ib_dma_pci_p2p_dma_supported() · 495758bb

Logan Gunthorpe authored Jul 08, 2022

Introduce the helper function ib_dma_pci_p2p_dma_supported() to check
if a given ib_device can be used in P2PDMA transfers. This ensures
the ib_device is not using virt_dma and also that the underlying
dma_device supports P2PDMA.

Use the new helper in nvme-rdma to replace the existing check for
ib_uses_virt_dma(). Adding the dma_pci_p2pdma_supported() check allows
switching away from pci_p2pdma_[un]map_sg().
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

495758bb

nvme-pci: convert to using dma_map_sgtable() · 91fb2b60

Logan Gunthorpe authored Jul 08, 2022

The dma_map operations now support P2PDMA pages directly. So remove
the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls
to dma_map_sgtable().

dma_map_sgtable() returns more complete error codes than dma_map_sg()
and allows differentiating EREMOTEIO errors in case an unsupported
P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET
so the request isn't retried.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

91fb2b60

nvme-pci: check DMA ops when indicating support for PCI P2PDMA · 2f859441

Logan Gunthorpe authored Jul 08, 2022

Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

2f859441

iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg · 30280eee

Logan Gunthorpe authored Jul 08, 2022

Call pci_p2pdma_map_segment() when a PCI P2PDMA page is seen so the bus
address is set in the dma address and the segment is marked with
sg_dma_mark_bus_address(). iommu_map_sg() will then skip these segments.
Then, in __finalise_sg(), copy the dma address from the input segment
to the output segment. __invalidate_sg() must also learn to skip these
segments.

A P2PDMA page may have three possible outcomes when being mapped:
  1) If the data path between the two devices doesn't go through
     the root port, then it should be mapped with a PCI bus address
  2) If the data path goes through the host bridge, it should be mapped
     normally with an IOMMU IOVA.
  3) It is not possible for the two devices to communicate and thus
     the mapping operation should fail (and it will return -EREMOTEIO).

Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

30280eee

iommu: Explicitly skip bus address marked segments in __iommu_map_sg() · c9632183

Logan Gunthorpe authored Jul 08, 2022

In order to support PCI P2PDMA mappings with dma-iommu, explicitly skip
any segments marked with sg_dma_mark_bus_address() in __iommu_map_sg().

These segments should not be mapped into the IOVA and will be handled
separately in as subsequent patch for dma-iommu.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

c9632183

dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support · 159bf192

Logan Gunthorpe authored Jul 08, 2022

Add a flags member to the dma_map_ops structure with one flag to
indicate support for PCI P2PDMA.

Also, add a helper to check if a device supports PCI P2PDMA.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

159bf192

dma-direct: support PCI P2PDMA pages in dma-direct map_sg · f02ad36d

Logan Gunthorpe authored Jul 08, 2022

Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for heterogeneous SGLs that contain both P2PDMA and regular pages.

A P2PDMA page may have three possible outcomes when being mapped:
  1) If the data path between the two devices doesn't go through the
     root port, then it should be mapped with a PCI bus address
  2) If the data path goes through the host bridge, it should be mapped
     normally, as though it were a CPU physical address
  3) It is not possible for the two devices to communicate and thus
     the mapping operation should fail (and it will return -EREMOTEIO).

SGL segments that contain PCI bus addresses are marked with
sg_dma_mark_pci_p2pdma() and are ignored when unmapped.

P2PDMA mappings are also failed if swiotlb needs to be used on the
mapping.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

f02ad36d

dma-mapping: allow EREMOTEIO return code for P2PDMA transfers · 7c2645a2

Logan Gunthorpe authored Jul 08, 2022

Add EREMOTEIO error return to dma_map_sgtable() which will be used
by .map_sg() implementations that detect P2PDMA pages that the
underlying DMA device cannot access.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

7c2645a2

PCI/P2PDMA: Introduce helpers for dma_map_sg implementations · 5e180ff3

Logan Gunthorpe authored Jul 08, 2022

Add pci_p2pdma_map_segment() as a helper for dma_map_sg()
implementations. It takes an scatterlist segment that must point to a
pci_p2pdma struct page and will map it if the mapping requires a bus
address.

The return value indicates whether the mapping required a bus address
or whether the caller still needs to map the segment normally. If the
segment should not be mapped, -EREMOTEIO is returned.

This helper uses a state structure to track the changes to the
pgmap across calls and avoid needing to lookup into the xarray for
every page.

The prototype for the helper is added to dma-map-ops.h as it is only
useful to dma map implementations and don't need to pollute the public
pci-p2pdma header.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

5e180ff3

PCI/P2PDMA: Attempt to set map_type if it has not been set · 719c9865

Logan Gunthorpe authored Jul 08, 2022

Attempt to find the mapping type for P2PDMA pages on the first
DMA map attempt if it has not been done ahead of time.

Previously, the mapping type was expected to be calculated ahead of
time, but if pages are to come from userspace then there's no
way to ensure the path was checked ahead of time.

This change will calculate the mapping type if it hasn't pre-calculated
so it is no longer invalid to call pci_p2pdma_map_sg() before the mapping
type is calculated, so drop the WARN_ON when that is the case.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

719c9865

lib/scatterlist: add flag for indicating P2PDMA segments in an SGL · 42399301

Logan Gunthorpe authored Jul 08, 2022

Introduce a dma_flags field in struct scatterlist. These flags will be
used by dma_[un]map_sg_p2pdma() to determine when a given SGL segments
dma_address points to a PCI bus address. dma_unmap_sg_p2pdma() will need
to perform different cleanup when a segment is marked as a bus address.

The dma_flags field will fit in the existing padding on 64BIT systems
(assuming CONFIG_NEED_SG_DMA_LENGTH is also set).

The new bit will only be used when CONFIG_PCI_P2PDMA is set; this means
PCI P2PDMA will require CONFIG_64BIT. This should be acceptable as the
majority of P2PDMA use cases are restricted to newer root complexes and
roughly require the extra address space for memory BARs used in the
transactions.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

42399301

22 Jul, 2022 3 commits

swiotlb: clean up some coding style and minor issues · 72311809

Tianyu Lan authored Jul 21, 2022

- Fix the used field of struct io_tlb_area wasn't initialized
- Set area number to be 0 if input area number parameter is 0
- Use array_size() to calculate io_tlb_area array size
- Make parameters of swiotlb_do_find_slots() more reasonable

Fixes: 26ffb91fa5e0 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

72311809

dma-mapping: update comment after dmabounce removal · a45e52bf

Lukas Bulwahn authored Jul 20, 2022

Commit e3217540 ("ARM/dma-mapping: remove dmabounce") removes the
config DMABOUNCE. A comment to the function __dma_page_cpu_to_dev() refers
to this removed config DMABOUNCE.

Remove the obsolete explanation, but keep the recommendation not to use
__dma_page_cpu_to_dev() and use dma_sync_* functions instead.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

a45e52bf

scsi: sd: Add a comment about limiting max_sectors to shost optimal limit · c9337ad4

John Garry authored Jul 19, 2022

Add a comment about limiting the default the SCSI disk request_queue
max_sectors initial value to that of the SCSI host optimal sectors limit.
Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

c9337ad4

19 Jul, 2022 6 commits

ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors · 0568e612

John Garry authored Jul 14, 2022

ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
according to shost limits, which includes host DMA mapping limits.

Cap the ata_device max_sectors according to shost->max_sectors to respect
this shost limit.
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

0568e612

scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit · 4cbfca5f

John Garry authored Jul 14, 2022

Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.

For performance reasons set the request queue max_sectors from
dma_opt_mapping_size(), which knows this mapping limit.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

4cbfca5f

scsi: sd: allow max_sectors be capped at DMA optimal size limit · 608128d3

John Garry authored Jul 14, 2022

Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.

New member Scsi_Host.opt_sectors is added, which is the optimal host
max_sectors, and use this value to cap the request queue max_sectors when
set.

It could be considered to have request queues io_opt value initially
set at Scsi_Host.opt_sectors in __scsi_init_queue(), but that is not
really the purpose of io_opt.

Finally, even though Scsi_Host.opt_sectors value should never be greater
than the request queue max_hw_sectors value, continue to limit to this
value for safety.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

608128d3

scsi: core: cap shost max_sectors according to DMA limits only once · bb7d1283

John Garry authored Jul 14, 2022

The shost->max_sectors is repeatedly capped according to the host DMA
mapping limit for each sdev in __scsi_init_queue(). This is unnecessary, so
set only once when adding the host.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

bb7d1283

dma-iommu: add iommu_dma_opt_mapping_size() · 6d9870b7

John Garry authored Jul 14, 2022

Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.

This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may be quite slow.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

6d9870b7

dma-mapping: add dma_opt_mapping_size() · a229cc14

John Garry authored Jul 14, 2022

Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.

Provide an API for device drivers to know this "optimal" limit, such that
they may try to produce mapping which don't exceed it.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

a229cc14

18 Jul, 2022 5 commits

swiotlb: move struct io_tlb_slot to swiotlb.c · 942a8186

Christoph Hellwig authored Jul 12, 2022

No need to expose this structure definition in the header.
Signed-off-by: Christoph Hellwig <hch@lst.de>

942a8186

swiotlb: ensure a segment doesn't cross the area boundary · 57e6840c

Chao Gao authored Jul 15, 2022

Free slots tracking assumes that slots in a segment can be allocated to
fulfill a request. This implies that slots in a segment should belong to
the same area. Although the possibility of a violation is low, it is better
to explicitly enforce segments won't span multiple areas by adjusting the
number of slabs when configuring areas.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

57e6840c

swiotlb: consolidate rounding up default_nslabs · 44335487

Chao Gao authored Jul 15, 2022

default_nslabs are rounded up in two cases with exactly same comments.
Add a simple wrapper to reduce duplicate code/comments. It is preparatory
to adding more logics into the round-up.

No functional change intended.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

44335487

swiotlb: remove unused fields in io_tlb_mem · 91561d4e

Chao Gao authored Jul 15, 2022

Commit 20347fca ("swiotlb: split up the global swiotlb lock") splits
io_tlb_mem into multiple areas. Each area has its own lock and index. The
global ones are not used so remove them.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

91561d4e

swiotlb: fix use after free on error handling path · 4a977394

Dan Carpenter authored Jul 15, 2022

Don't dereference "mem" after it has been freed.  Flip the
two kfree()s around to address this bug.

Fixes: 26ffb91fa5e0 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

4a977394

13 Jul, 2022 1 commit

swiotlb: split up the global swiotlb lock · 20347fca

Tianyu Lan authored Jul 08, 2022

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb kernel parameter and is default
to be possible cpu number. If possible cpu number is not power of
2, area number will be round up to the next power of 2.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/
4529b5784c141782c72ec9bd9a92df2b68cb7d45).
Based-on-idea-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

20347fca

12 Jul, 2022 1 commit

swiotlb: fail map correctly with failed io_tlb_default_mem · c51ba246

Robin Murphy authored Jul 12, 2022

In the failure case of trying to use a buffer which we'd previously
failed to allocate, the "!mem" condition is no longer sufficient since
io_tlb_default_mem became static and assigned by default. Update the
condition to work as intended per the rest of that conversion.

Fixes: 463e862a ("swiotlb: Convert io_default_tlb_mem to static allocation")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

c51ba246

07 Jul, 2022 10 commits

ARM/dma-mapping: merge IOMMU ops · 4136ce90

Robin Murphy authored Apr 21, 2022

The dma_sync_* operations are now the only difference between the
coherent and non-coherent IOMMU ops. Some minor tweaks to make those
safe for coherent devices with minimal overhead, and we can condense
down to a single set of DMA ops.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Marc Zyngier <maz@kernel.org>

4136ce90

ARM/dma-mapping: consolidate IOMMU ops callbacks · d563bccf

Robin Murphy authored Apr 21, 2022

Merge the coherent and non-coherent callbacks down to a single
implementation each, relying on the generic dev->dma_coherent
flag at the points where the difference matters.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Marc Zyngier <maz@kernel.org>

d563bccf

ARM/dma-mapping: drop .dma_supported for IOMMU ops · 42998ef0

Robin Murphy authored Apr 21, 2022

When an IOMMU is present, we trust that it should be capable
of remapping any physical memory, and since the device masks
represent the input (virtual) addresses to the IOMMU it makes
no sense to validate them against physical PFNs anyway.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Marc Zyngier <maz@kernel.org>

42998ef0

ARM/dma-mapping: use dma-direct unconditionally · ae626eb9

Christoph Hellwig authored Apr 19, 2022

Use dma-direct unconditionally on arm.  It has already been used for
some time for LPAE and nommu configurations.

This mostly changes the streaming mapping implementation and the (simple)
coherent allocator for device that are DMA coherent.  The existing
complex allocator for uncached mappings for non-coherent devices is still
used as is using the arch_dma_alloc/arch_dma_free hooks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Andre Przywara <andre.przywara@arm.com> [highbank]
Tested-by: Marc Zyngier <maz@kernel.org>

ae626eb9

ARM/dma-mapping: use the generic versions of dma_to_phys/phys_to_dma by default · af6f23b8

Christoph Hellwig authored Apr 19, 2022

Only the footbridge platforms provide their own DMA address translation
helpers, so switch to the generic version for all other platforms, and
consolidate the footbridge implementation to remove two levels of
indirection.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Marc Zyngier <maz@kernel.org>

af6f23b8

ARM/dma-mapping: use dma_to_phys/phys_to_dma in the dma-mapping code · f9774cfd

Christoph Hellwig authored Apr 19, 2022

Use the helpers as expected by the dma-direct code in the old arm
dma-mapping code to ease a gradual switch to the common DMA code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Marc Zyngier <maz@kernel.org>

f9774cfd

ARM/dma-mapping: remove the unused virt_to_dma helper · d6e2e925

Christoph Hellwig authored Apr 19, 2022

virt_to_dma was only used by the now removed dmabounce code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Marc Zyngier <maz@kernel.org>

d6e2e925

ARM/dma-mapping: mark various dma-mapping routines static in dma-mapping.c · 5ed390e5

Christoph Hellwig authored Apr 19, 2022

With the dmabounce removal these aren't used outside of dma-mapping.c,
so mark them static.  Move the dma_map_ops declarations down a bit
to avoid lots of forward declarations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Marc Zyngier <maz@kernel.org>

5ed390e5

ARM/dma-mapping: remove dmabounce · e3217540

Christoph Hellwig authored Apr 19, 2022

Remove the now unused dmabounce code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>

e3217540

ARM: sa1100/assabet: move dmabounce hack to ohci driver · 9ba26f5c

Arnd Bergmann authored Feb 03, 2022

The sa1111 platform is one of the two remaining users of the old Arm
specific "dmabounce" code, which is an earlier implementation of the
generic swiotlb.

Linus Walleij submitted a patch that removes dmabounce support from
the ixp4xx, and I had a look at the other user, which is the sa1111
companion chip.

Looking at how dmabounce is used, I could narrow it down to one driver
one three machines:

 - dmabounce is only initialized on assabet/neponset, jornada720 and
   badge4, which are the platforms that have an sa1111 and support
   DMA on it.

 - All three of these suffer from "erratum #7" that requires only
   doing DMA to half the memory sections based on one of the address
   lines, in addition, the neponset also can't DMA to the RAM that
   is connected to sa1111 itself.

 - the pxa lubbock machine also has sa1111, but does not support DMA
   on it and does not set dmabounce.

 - only the OHCI and audio devices on sa1111 support DMA, but as
   there is no audio driver for this hardware, only OHCI remains.

In the OHCI code, I noticed that two other platforms already have
a local bounce buffer support in the form of the "local_mem"
allocator. Specifically, TMIO and SM501 use this on a few other ARM
boards with 16KB or 128KB of local SRAM that can be accessed from the
OHCI and from the CPU.

While this is not the same problem as on sa1111, I could not find a
reason why we can't re-use the existing implementation but replace the
physical SRAM address mapping with a locally allocated DMA buffer.

There are two main downsides:

 - rather than using a dynamically sized pool, this buffer needs
   to be allocated at probe time using a fixed size. Without
   having any idea of what it should be, I picked a size of
   64KB, which is between what the other two OHCI front-ends use
   in their SRAM. If anyone has a better idea what that size
   is reasonable, this can be trivially changed.

 - Previously, only USB transfers to unaddressable memory needed
   to go through the bounce buffer, now all of them do, which may
   impact runtime performance for USB endpoints that do a lot of
   transfers.

On the upside, the local_mem support uses write-combining buffers,
which should be a bit faster for transfers to the device compared to
normal uncached coherent memory as used in dmabounce.

Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Cc: linux-usb@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>

9ba26f5c

22 Jun, 2022 1 commit

swiotlb: panic if nslabs is too small · 0bf28fc4

Dongli Zhang authored Jun 11, 2022

Panic on purpose if nslabs is too small, in order to sync with the remap
retry logic.

In addition, print the number of bytes for tlb alloc failure.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

0bf28fc4