• Kirill Smelkov's avatar
    go/zodb/internal/weak: Try to fix GC crashes via reworking Ref to keep only one word instead of two · dbf5a97b
    Kirill Smelkov authored
    We are observing garbage-collector crashes due to weak package under
    load(*) with GC crashing as e.g.
    
        runtime: full=0xc0001f10000005 next=205 jobs=204 nDataRoots=1 nBSSRoots=1 nSpanRoots=16 nStackRoots=184
        panic: non-empty mark queue after concurrent mark
        fatal error: panic on system stack
    
        runtime stack:
        runtime.throw({0x5c60fe?, 0x601d70?})
                /home/kirr/src/tools/go/go1.21/src/runtime/panic.go:1077 +0x5c fp=0xc000051e88 sp=0xc000051e58 pc=0x436efc
        panic({0x585100?, 0x601d70?})
                /home/kirr/src/tools/go/go1.21/src/runtime/panic.go:840 +0x6ea fp=0xc000051f38 sp=0xc000051e88 pc=0x436e0a
        runtime.gcMark(0x118946?)
                /home/kirr/src/tools/go/go1.21/src/runtime/mgc.go:1464 +0x40c fp=0xc000051fb0 sp=0xc000051f38 pc=0x41bd6c
        runtime.gcMarkTermination.func1()
                /home/kirr/src/tools/go/go1.21/src/runtime/mgc.go:962 +0x17 fp=0xc000051fc8 sp=0xc000051fb0 pc=0x41b277
        runtime.systemstack()
                /home/kirr/src/tools/go/go1.21/src/runtime/asm_amd64.s:509 +0x4a fp=0xc000051fd8 sp=0xc000051fc8 pc=0x468eea
    
    One problem in current implementation is that weak.Ref keeps two words
    for the copy of original interface object and recreates that interface
    object on Ref.Get from those two words _nonatomically_. Which is
    explicitly documented by Go memory model as something that can lead to
    corrupted memory and crashes. From https://go.dev/ref/mem#restrictions:
    
        This means that races on multiword data structures can lead to
        inconsistent values not corresponding to a single write. When the values
        depend on the consistency of internal (pointer, length) or (pointer,
        type) pairs, as can be the case for interface values, maps, slices, and
        strings in most Go implementations, such races can in turn lead to
        arbitrary memory corruption.
    
    We can avoid doing multiword writes during object resurrection by using
    concrete type T* instead of interface{}.
    
    Unfortunately as wendelin.core@9b44fc23
    shows it does not help with the above issue and the GC continues to
    crash with the same "panic: non-empty mark queue after concurrent mark"
    message. This is because weak.Ref implementation needs tight integration
    with concurrent GC that Go does and in practice we are unable to do that
    from outside of Go runtime internals. See e.g.
    https://github.com/golang/go/commit/dfc86e922cd0 and
    https://github.com/golang/go/commit/79fd633632cd to get an idea what
    kind of integration that needs to be.
    
    Anyway before disabling weak references support I wanted to commit this
    change to show that one-word approach was tried and it does not work.
    This leaves a trace in the history.
    
    On the good side we are going to use standard package weak in the future
    hopefully (https://github.com/golang/go/issues/67552). That needs to
    wait for at least Go 1.24 though.
    
    (*) see https://github.com/golang/go/issues/41303 for details
    dbf5a97b
weak_test.go 2.13 KB