• Kirill Smelkov's avatar
    X neo/connection: Fix race between link.shutdown() and conn.lightClose() · 8eac771c
    Kirill Smelkov authored
    The scenario could be: nl.shutdown sets nl.connTab=nil and
    then iterates connTab snapshot taken under nl.connMu lock. If so
    this activity (which calls e.g. Conn.shutdown) could be running with
    conn.lightClose -> conn.release() in parallel.
    
    Bug triggers under neotest in zhash.go +prefetch128:
    
    	crc32:bf9deea9   ; oid=0..8499  nread=34134938  t=9.010696705s (1.060081ms / object)  x=zhash.go +prefetch128
    	==================
    	WARNING: DATA RACE
    	Write at 0x00c424b55ed0 by goroutine 147:
    	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).reinit()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:258 +0x102
    	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).release()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:240 +0x38
    	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).lightClose()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1633 +0xa4
    	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).Ask1()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1719 +0xde
    	  lab.nexedi.com/kirr/neo/go/neo/client.(*Client)._Load()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:466 +0x5d3
    	  lab.nexedi.com/kirr/neo/go/neo/client.(*Client).Load()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:392 +0xa8
    	  lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).loadRCE()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:297 +0xfb
    
    	Previous write at 0x00c424b55ed0 by goroutine 75:
    	  sync/atomic.AddInt32()
    	      /home/kirr/src/tools/go/go/src/runtime/race_amd64.s:269 +0xb
    	  sync.(*Mutex).Unlock()
    	      /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54
    	  sync.(*Once).Do()
    	      /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95
    	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1
    	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62
    	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f
    
    	Goroutine 147 (running) created at:
    	  lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).Prefetch()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:198 +0xbb
    	  main.zhash.func2()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:134 +0x86
    	  main.zhash.func4()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:157 +0x8f
    	  main.zhash()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:193 +0xc7b
    	  main.main()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:107 +0x48d
    
    	Goroutine 75 (finished) created at:
    	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:391 +0x116
    	  sync.(*Once).Do()
    	      /home/kirr/src/tools/go/go/src/sync/once.go:44 +0xe1
    	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:383 +0x99
    	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).serveRecv()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:748 +0x651
    	==================
    
    and without -race zhash.go instrumentation it was showing itself as
    "panic: unlock of unlocked mutex" in
    
    	  sync.(*Mutex).Unlock()
    	      /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54
    	  sync.(*Once).Do()
    	      /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95
    	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1
    	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62
    	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1()
    	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f
    8eac771c
connection.go 45.9 KB