[PATCH] x86_64: When allocation of merged SG lists fails in the IOMMU don't merge

[ AK: I redid Kevin's fix to be simpler, but the idea and original analysis of the problem is from Kevin] This avoid allocation failures on some SATA systems like Nvidia CK8 when the IOMMU gets fragmented. Modern SATA devices have quite large queues (128 entries) and the FS with ext2/3 is good enough now that it often passes whole 128 page sg lists down to the driver. These require 512K of continuous free space in the IOMMU aperture to map when merged. When the IOMMU is fragmented this could lead to spurious IO errors due to failing mappings. Short term fix is to just try to map the SG list again unmerged page by page - this way fragmentation doesn't matter anymore. The code for that was already there, but it just wasn't enabled for the merge case. According to Kevin at least the Nvidia device doesn't seem to benefit from merging much anyways, so the only slowdown is from trying to do an unnecessary merge attempt. Kevin plans to implement better fragmentation avoidance in the future, but that wouldn't be 2.6.16 material. TBD: should add some statistic counters to count how often that really happens. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86_64: When allocation of merged SG lists fails in the IOMMU don't merge
[ AK: I redid Kevin's fix to be simpler, but the idea and original analysis of the problem is from Kevin] This avoid allocation failures on some SATA systems like Nvidia CK8 when the IOMMU gets fragmented. Modern SATA devices have quite large queues (128 entries) and the FS with ext2/3 is good enough now that it often passes whole 128 page sg lists down to the driver. These require 512K of continuous free space in the IOMMU aperture to map when merged. When the IOMMU is fragmented this could lead to spurious IO errors due to failing mappings. Short term fix is to just try to map the SG list again unmerged page by page - this way fragmentation doesn't matter anymore. The code for that was already there, but it just wasn't enabled for the merge case. According to Kevin at least the Nvidia device doesn't seem to benefit from merging much anyways, so the only slowdown is from trying to do an unnecessary merge attempt. Kevin plans to implement better fragmentation avoidance in the future, but that wouldn't be 2.6.16 material. TBD: should add some statistic counters to count how often that really happens. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a1002a48 · Kevin VanMaren · Linus Torvalds · 1de6bf33 · a1002a48
Commit a1002a48 authored Feb 03, 2006 by Kevin VanMaren Committed by Linus Torvalds Feb 04, 2006
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 3 deletions

arch/x86_64/kernel/pci-gart.c arch/x86_64/kernel/pci-gart.c +6 -3

No files found.
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -457,9 +457,12 @@ int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents, int dir)
 error:
 	flush_gart(NULL);
 	gart_unmap_sg(dev, sg, nents, dir);
-	/* When it was forced try again unforced */
-	if (force_iommu) 
-		return dma_map_sg_nonforce(dev, sg, nents, dir);
+	/* When it was forced or merged try again in a dumb way */
+	if (force_iommu || iommu_merge) {
+		out = dma_map_sg_nonforce(dev, sg, nents, dir);
+		if (out > 0)
+			return out;
+	}
 	if (panic_on_overflow)
 		panic("dma_map_sg: overflow on %lu pages\n", pages);
 	iommu_full(dev, pages << PAGE_SHIFT, dir);