Commit 54af9a3b authored by Russ Cox's avatar Russ Cox

runtime: reintroduce ``dead'' space during GC scan

Reintroduce an optimization discarded during the initial conversion
from 4-bit heap bitmaps to 2-bit heap bitmaps: when we reach the
place in the bitmap where there are no more pointers, mark that position
for the GC so that it can avoid scanning past that place.

During heapBitsSetType we can also avoid initializing heap bitmap
beyond that location, which gives a bit of a win compared to Go 1.4.
This particular optimization (not initializing the heap bitmap) may not last:
we might change typedmemmove to use the heap bitmap, in which
case it would all need to be initialized. The early stop in the GC scan
will stay no matter what.

Compared to Go 1.4 (github.com/rsc/go, branch go14bench):
name                    old mean              new mean              delta
SetTypeNode64           80.7ns × (1.00,1.01)  57.4ns × (1.00,1.01)  -28.83% (p=0.000)
SetTypeNode64Dead       80.5ns × (1.00,1.01)  13.1ns × (0.99,1.02)  -83.77% (p=0.000)
SetTypeNode64Slice      2.16µs × (1.00,1.01)  1.54µs × (1.00,1.01)  -28.75% (p=0.000)
SetTypeNode64DeadSlice  2.16µs × (1.00,1.01)  1.52µs × (1.00,1.00)  -29.74% (p=0.000)

Compared to previous CL:
name                    old mean              new mean              delta
SetTypeNode64           56.7ns × (1.00,1.00)  57.4ns × (1.00,1.01)   +1.19% (p=0.000)
SetTypeNode64Dead       57.2ns × (1.00,1.00)  13.1ns × (0.99,1.02)  -77.15% (p=0.000)
SetTypeNode64Slice      1.56µs × (1.00,1.01)  1.54µs × (1.00,1.01)   -0.89% (p=0.000)
SetTypeNode64DeadSlice  1.55µs × (1.00,1.01)  1.52µs × (1.00,1.00)   -2.23% (p=0.000)

This is the last CL in the sequence converting from the 4-bit heap
to the 2-bit heap, with all the same optimizations reenabled.
Compared to before that process began (compared to CL 9701 patch set 1):

name                    old mean              new mean              delta
BinaryTree17             5.87s × (0.94,1.09)   5.91s × (0.96,1.06)    ~    (p=0.578)
Fannkuch11               4.32s × (1.00,1.00)   4.32s × (1.00,1.00)    ~    (p=0.474)
FmtFprintfEmpty         89.1ns × (0.95,1.16)  89.0ns × (0.93,1.10)    ~    (p=0.942)
FmtFprintfString         283ns × (0.98,1.02)   298ns × (0.98,1.06)  +5.33% (p=0.000)
FmtFprintfInt            284ns × (0.98,1.04)   286ns × (0.98,1.03)    ~    (p=0.208)
FmtFprintfIntInt         486ns × (0.98,1.03)   498ns × (0.97,1.06)  +2.48% (p=0.000)
FmtFprintfPrefixedInt    400ns × (0.99,1.02)   408ns × (0.98,1.02)  +2.23% (p=0.000)
FmtFprintfFloat          566ns × (0.99,1.01)   587ns × (0.98,1.01)  +3.69% (p=0.000)
FmtManyArgs             1.91µs × (0.99,1.02)  1.94µs × (0.99,1.02)  +1.81% (p=0.000)
GobDecode               15.5ms × (0.98,1.05)  15.8ms × (0.98,1.03)  +1.94% (p=0.002)
GobEncode               11.9ms × (0.97,1.03)  12.0ms × (0.96,1.09)    ~    (p=0.263)
Gzip                     648ms × (0.99,1.01)   648ms × (0.99,1.01)    ~    (p=0.992)
Gunzip                   143ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.585)
HTTPClientServer        89.2µs × (0.99,1.02)  90.3µs × (0.98,1.01)  +1.24% (p=0.000)
JSONEncode              32.3ms × (0.97,1.06)  31.6ms × (0.99,1.01)  -2.29% (p=0.000)
JSONDecode               106ms × (0.99,1.01)   107ms × (1.00,1.01)  +0.62% (p=0.000)
Mandelbrot200           6.02ms × (1.00,1.00)  6.03ms × (1.00,1.01)    ~    (p=0.250)
GoParse                 6.57ms × (0.97,1.06)  6.53ms × (0.99,1.03)    ~    (p=0.243)
RegexpMatchEasy0_32      162ns × (1.00,1.00)   161ns × (1.00,1.01)  -0.80% (p=0.000)
RegexpMatchEasy0_1K      561ns × (0.99,1.02)   541ns × (0.99,1.01)  -3.67% (p=0.000)
RegexpMatchEasy1_32      145ns × (0.95,1.04)   138ns × (1.00,1.00)  -5.04% (p=0.000)
RegexpMatchEasy1_1K      864ns × (0.99,1.04)   887ns × (0.99,1.01)  +2.57% (p=0.000)
RegexpMatchMedium_32     255ns × (0.99,1.04)   253ns × (0.99,1.01)  -1.05% (p=0.012)
RegexpMatchMedium_1K    73.9µs × (0.98,1.04)  72.8µs × (1.00,1.00)  -1.51% (p=0.005)
RegexpMatchHard_32      3.92µs × (0.98,1.04)  3.85µs × (1.00,1.01)  -1.88% (p=0.002)
RegexpMatchHard_1K       120µs × (0.98,1.04)   117µs × (1.00,1.01)  -2.02% (p=0.001)
Revcomp                  936ms × (0.95,1.08)   922ms × (0.97,1.08)    ~    (p=0.234)
Template                 130ms × (0.98,1.04)   126ms × (0.99,1.01)  -2.99% (p=0.000)
TimeParse                638ns × (0.98,1.05)   628ns × (0.99,1.01)  -1.54% (p=0.004)
TimeFormat               674ns × (0.99,1.01)   668ns × (0.99,1.01)  -0.80% (p=0.001)

The slowdown of the first few benchmarks seems to be due to the new
atomic operations for certain small size allocations. But the larger
benchmarks mostly improve, probably due to the decreased memory
pressure from having half as much heap bitmap.

CL 9706, which removes the (never used anymore) wbshadow mode,
gets back what is lost in the early microbenchmarks.

Change-Id: I37423a209e8ec2a2e92538b45cac5422a6acd32d
Reviewed-on: https://go-review.googlesource.com/9705Reviewed-by: default avatarRick Hudson <rlh@golang.org>
parent feb8a3b6
...@@ -687,7 +687,7 @@ func haspointers(t *Type) bool { ...@@ -687,7 +687,7 @@ func haspointers(t *Type) bool {
// typeptrdata returns the length in bytes of the prefix of t // typeptrdata returns the length in bytes of the prefix of t
// containing pointer data. Anything after this offset is scalar data. // containing pointer data. Anything after this offset is scalar data.
func typeptrdata(t *Type) uint64 { func typeptrdata(t *Type) int64 {
if !haspointers(t) { if !haspointers(t) {
return 0 return 0
} }
...@@ -699,24 +699,24 @@ func typeptrdata(t *Type) uint64 { ...@@ -699,24 +699,24 @@ func typeptrdata(t *Type) uint64 {
TFUNC, TFUNC,
TCHAN, TCHAN,
TMAP: TMAP:
return uint64(Widthptr) return int64(Widthptr)
case TSTRING: case TSTRING:
// struct { byte *str; intgo len; } // struct { byte *str; intgo len; }
return uint64(Widthptr) return int64(Widthptr)
case TINTER: case TINTER:
// struct { Itab *tab; void *data; } or // struct { Itab *tab; void *data; } or
// struct { Type *type; void *data; } // struct { Type *type; void *data; }
return 2 * uint64(Widthptr) return 2 * int64(Widthptr)
case TARRAY: case TARRAY:
if Isslice(t) { if Isslice(t) {
// struct { byte *array; uintgo len; uintgo cap; } // struct { byte *array; uintgo len; uintgo cap; }
return uint64(Widthptr) return int64(Widthptr)
} }
// haspointers already eliminated t.Bound == 0. // haspointers already eliminated t.Bound == 0.
return uint64(t.Bound-1)*uint64(t.Type.Width) + typeptrdata(t.Type) return (t.Bound-1)*t.Type.Width + typeptrdata(t.Type)
case TSTRUCT: case TSTRUCT:
// Find the last field that has pointers. // Find the last field that has pointers.
...@@ -726,7 +726,7 @@ func typeptrdata(t *Type) uint64 { ...@@ -726,7 +726,7 @@ func typeptrdata(t *Type) uint64 {
lastPtrField = t1 lastPtrField = t1
} }
} }
return uint64(lastPtrField.Width) + typeptrdata(lastPtrField.Type) return lastPtrField.Width + typeptrdata(lastPtrField.Type)
default: default:
Fatal("typeptrdata: unexpected type, %v", t) Fatal("typeptrdata: unexpected type, %v", t)
...@@ -794,7 +794,7 @@ func dcommontype(s *Sym, ot int, t *Type) int { ...@@ -794,7 +794,7 @@ func dcommontype(s *Sym, ot int, t *Type) int {
// zero unsafe.Pointer // zero unsafe.Pointer
// } // }
ot = duintptr(s, ot, uint64(t.Width)) ot = duintptr(s, ot, uint64(t.Width))
ot = duintptr(s, ot, typeptrdata(t)) ot = duintptr(s, ot, uint64(typeptrdata(t)))
ot = duint32(s, ot, typehash(t)) ot = duint32(s, ot, typehash(t))
ot = duint8(s, ot, 0) // unused ot = duint8(s, ot, 0) // unused
...@@ -1428,17 +1428,12 @@ func usegcprog(t *Type) bool { ...@@ -1428,17 +1428,12 @@ func usegcprog(t *Type) bool {
} }
// Calculate size of the unrolled GC mask. // Calculate size of the unrolled GC mask.
nptr := (t.Width + int64(Widthptr) - 1) / int64(Widthptr) nptr := typeptrdata(t) / int64(Widthptr)
size := (nptr + 7) / 8
// Decide whether to use unrolled GC mask or GC program. // Decide whether to use unrolled GC mask or GC program.
// We could use a more elaborate condition, but this seems to work well in practice. // We could use a more elaborate condition, but this seems to work well in practice.
// For small objects GC program can't give significant reduction. // For small objects, the GC program can't give significant reduction.
// While large objects usually contain arrays; and even if it don't return nptr > int64(2*Widthptr*8)
// the program uses 2-bits per word while mask uses 4-bits per word,
// so the program is still smaller.
return size > int64(2*Widthptr)
} }
// Generates GC bitmask (1 bit per word). // Generates GC bitmask (1 bit per word).
...@@ -1450,11 +1445,11 @@ func gengcmask(t *Type, gcmask []byte) { ...@@ -1450,11 +1445,11 @@ func gengcmask(t *Type, gcmask []byte) {
return return
} }
vec := bvalloc(2 * int32(Widthptr) * 8) vec := bvalloc(int32(2 * Widthptr * 8))
xoffset := int64(0) xoffset := int64(0)
onebitwalktype1(t, &xoffset, vec) onebitwalktype1(t, &xoffset, vec)
nptr := (t.Width + int64(Widthptr) - 1) / int64(Widthptr) nptr := typeptrdata(t) / int64(Widthptr)
for i := int64(0); i < nptr; i++ { for i := int64(0); i < nptr; i++ {
if bvget(vec, int32(i)) == 1 { if bvget(vec, int32(i)) == 1 {
gcmask[i/8] |= 1 << (uint(i) % 8) gcmask[i/8] |= 1 << (uint(i) % 8)
......
...@@ -1109,8 +1109,7 @@ func proggenaddsym(g *ProgGen, s *LSym) { ...@@ -1109,8 +1109,7 @@ func proggenaddsym(g *ProgGen, s *LSym) {
// Skip alignment hole from the previous symbol. // Skip alignment hole from the previous symbol.
proggenskip(g, g.pos, s.Value-g.pos) proggenskip(g, g.pos, s.Value-g.pos)
g.pos = s.Value
g.pos += s.Value - g.pos
// The test for names beginning with . here is meant // The test for names beginning with . here is meant
// to keep .dynamic and .dynsym from turning up as // to keep .dynamic and .dynsym from turning up as
...@@ -1142,16 +1141,16 @@ func proggenaddsym(g *ProgGen, s *LSym) { ...@@ -1142,16 +1141,16 @@ func proggenaddsym(g *ProgGen, s *LSym) {
proggendata(g, 0) proggendata(g, 0)
proggenarrayend(g) proggenarrayend(g)
} }
g.pos = s.Value + s.Size g.pos = s.Value + s.Size
} else if decodetype_usegcprog(s.Gotype) != 0 { } else if decodetype_usegcprog(s.Gotype) != 0 {
// gc program, copy directly // gc program, copy directly
// TODO(rsc): Maybe someday the gc program will only describe
// the first decodetype_ptrdata(s.Gotype) bytes instead of the full size.
proggendataflush(g) proggendataflush(g)
gcprog := decodetype_gcprog(s.Gotype) gcprog := decodetype_gcprog(s.Gotype)
size := decodetype_size(s.Gotype) size := decodetype_size(s.Gotype)
if (size%int64(Thearch.Ptrsize) != 0) || (g.pos%int64(Thearch.Ptrsize) != 0) { if (size%int64(Thearch.Ptrsize) != 0) || (g.pos%int64(Thearch.Ptrsize) != 0) {
Diag("proggenaddsym: unaligned gcprog symbol %s: size=%d pos=%d", s.Name, s.Size, g.pos) Diag("proggenaddsym: unaligned gcprog symbol %s: size=%d pos=%d", s.Name, size, g.pos)
} }
for i := int64(0); i < int64(len(gcprog.P)-1); i++ { for i := int64(0); i < int64(len(gcprog.P)-1); i++ {
proggenemit(g, uint8(gcprog.P[i])) proggenemit(g, uint8(gcprog.P[i]))
...@@ -1160,16 +1159,15 @@ func proggenaddsym(g *ProgGen, s *LSym) { ...@@ -1160,16 +1159,15 @@ func proggenaddsym(g *ProgGen, s *LSym) {
} else { } else {
// gc mask, it's small so emit as data // gc mask, it's small so emit as data
mask := decodetype_gcmask(s.Gotype) mask := decodetype_gcmask(s.Gotype)
ptrdata := decodetype_ptrdata(s.Gotype)
size := decodetype_size(s.Gotype) if (ptrdata%int64(Thearch.Ptrsize) != 0) || (g.pos%int64(Thearch.Ptrsize) != 0) {
if (size%int64(Thearch.Ptrsize) != 0) || (g.pos%int64(Thearch.Ptrsize) != 0) { Diag("proggenaddsym: unaligned gcmask symbol %s: size=%d pos=%d", s.Name, ptrdata, g.pos)
Diag("proggenaddsym: unaligned gcmask symbol %s: size=%d pos=%d", s.Name, s.Size, g.pos)
} }
for i := int64(0); i < size; i += int64(Thearch.Ptrsize) { for i := int64(0); i < ptrdata; i += int64(Thearch.Ptrsize) {
word := uint(i / int64(Thearch.Ptrsize)) word := uint(i / int64(Thearch.Ptrsize))
proggendata(g, (mask[word/8]>>(word%8))&1) proggendata(g, (mask[word/8]>>(word%8))&1)
} }
g.pos = s.Value + size g.pos = s.Value + ptrdata
} }
} }
......
...@@ -67,6 +67,11 @@ func decodetype_size(s *LSym) int64 { ...@@ -67,6 +67,11 @@ func decodetype_size(s *LSym) int64 {
return int64(decode_inuxi(s.P, Thearch.Ptrsize)) // 0x8 / 0x10 return int64(decode_inuxi(s.P, Thearch.Ptrsize)) // 0x8 / 0x10
} }
// Type.commonType.ptrdata
func decodetype_ptrdata(s *LSym) int64 {
return int64(decode_inuxi(s.P[Thearch.Ptrsize:], Thearch.Ptrsize)) // 0x8 / 0x10
}
// Type.commonType.gc // Type.commonType.gc
func decodetype_gcprog(s *LSym) *LSym { func decodetype_gcprog(s *LSym) *LSym {
if s.Type == obj.SDYNIMPORT { if s.Type == obj.SDYNIMPORT {
......
...@@ -34,23 +34,23 @@ func TestGCInfo(t *testing.T) { ...@@ -34,23 +34,23 @@ func TestGCInfo(t *testing.T) {
verifyGCInfo(t, "data eface", &dataEface, infoEface) verifyGCInfo(t, "data eface", &dataEface, infoEface)
verifyGCInfo(t, "data iface", &dataIface, infoIface) verifyGCInfo(t, "data iface", &dataIface, infoIface)
verifyGCInfo(t, "stack ScalarPtr", new(ScalarPtr), nonStackInfo(infoScalarPtr)) verifyGCInfo(t, "stack ScalarPtr", new(ScalarPtr), infoScalarPtr)
verifyGCInfo(t, "stack PtrScalar", new(PtrScalar), nonStackInfo(infoPtrScalar)) verifyGCInfo(t, "stack PtrScalar", new(PtrScalar), infoPtrScalar)
verifyGCInfo(t, "stack BigStruct", new(BigStruct), nonStackInfo(infoBigStruct())) verifyGCInfo(t, "stack BigStruct", new(BigStruct), infoBigStruct())
verifyGCInfo(t, "stack string", new(string), nonStackInfo(infoString)) verifyGCInfo(t, "stack string", new(string), infoString)
verifyGCInfo(t, "stack slice", new([]string), nonStackInfo(infoSlice)) verifyGCInfo(t, "stack slice", new([]string), infoSlice)
verifyGCInfo(t, "stack eface", new(interface{}), nonStackInfo(infoEface)) verifyGCInfo(t, "stack eface", new(interface{}), infoEface)
verifyGCInfo(t, "stack iface", new(Iface), nonStackInfo(infoIface)) verifyGCInfo(t, "stack iface", new(Iface), infoIface)
for i := 0; i < 10; i++ { for i := 0; i < 10; i++ {
verifyGCInfo(t, "heap PtrSlice", escape(&make([]*byte, 10)[0]), infoPtr10) verifyGCInfo(t, "heap PtrSlice", escape(&make([]*byte, 10)[0]), trimDead(infoPtr10))
verifyGCInfo(t, "heap ScalarPtr", escape(new(ScalarPtr)), infoScalarPtr) verifyGCInfo(t, "heap ScalarPtr", escape(new(ScalarPtr)), trimDead(infoScalarPtr))
verifyGCInfo(t, "heap ScalarPtrSlice", escape(&make([]ScalarPtr, 4)[0]), infoScalarPtr4) verifyGCInfo(t, "heap ScalarPtrSlice", escape(&make([]ScalarPtr, 4)[0]), trimDead(infoScalarPtr4))
verifyGCInfo(t, "heap PtrScalar", escape(new(PtrScalar)), infoPtrScalar) verifyGCInfo(t, "heap PtrScalar", escape(new(PtrScalar)), trimDead(infoPtrScalar))
verifyGCInfo(t, "heap BigStruct", escape(new(BigStruct)), infoBigStruct()) verifyGCInfo(t, "heap BigStruct", escape(new(BigStruct)), trimDead(infoBigStruct()))
verifyGCInfo(t, "heap string", escape(new(string)), infoString) verifyGCInfo(t, "heap string", escape(new(string)), trimDead(infoString))
verifyGCInfo(t, "heap eface", escape(new(interface{})), infoEface) verifyGCInfo(t, "heap eface", escape(new(interface{})), trimDead(infoEface))
verifyGCInfo(t, "heap iface", escape(new(Iface)), infoIface) verifyGCInfo(t, "heap iface", escape(new(Iface)), trimDead(infoIface))
} }
} }
...@@ -67,16 +67,11 @@ func verifyGCInfo(t *testing.T, name string, p interface{}, mask0 []byte) { ...@@ -67,16 +67,11 @@ func verifyGCInfo(t *testing.T, name string, p interface{}, mask0 []byte) {
} }
} }
func nonStackInfo(mask []byte) []byte { func trimDead(mask []byte) []byte {
// typeDead is replaced with typeScalar everywhere except stacks. for len(mask) > 2 && mask[len(mask)-1] == typeScalar {
mask1 := make([]byte, len(mask)) mask = mask[:len(mask)-1]
for i, v := range mask {
if v == typeDead {
v = typeScalar
}
mask1[i] = v
} }
return mask1 return mask
} }
var gcinfoSink interface{} var gcinfoSink interface{}
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment