1. 09 Oct, 2017 6 commits
    • Test's avatar
      X neo1 localhost timings Oct09 · af41382e
      Test authored
      Same as Oct05 5μs C-latency with added:
      
      - pystones
      - t(sha1) py & go
      - zhash.go working in prefetch mode
      - 16 clients loading 1 server at the same time
      af41382e
    • Kirill Smelkov's avatar
      X deco localhost timings Oct09 · 1e9f23cb
      Kirill Smelkov authored
      Same as Oct05 5μs C-latency with added:
      
      - pystones
      - t(sha1) py & go
      - zhash.go working in prefetch mode
      - 16 clients loading 1 server at the same time
      1e9f23cb
    • Kirill Smelkov's avatar
      . · 0d1206b9
      Kirill Smelkov authored
      0d1206b9
    • Kirill Smelkov's avatar
      X neo/connection: Fix race between link.shutdown() and conn.lightClose() · 8eac771c
      Kirill Smelkov authored
      The scenario could be: nl.shutdown sets nl.connTab=nil and
      then iterates connTab snapshot taken under nl.connMu lock. If so
      this activity (which calls e.g. Conn.shutdown) could be running with
      conn.lightClose -> conn.release() in parallel.
      
      Bug triggers under neotest in zhash.go +prefetch128:
      
      	crc32:bf9deea9   ; oid=0..8499  nread=34134938  t=9.010696705s (1.060081ms / object)  x=zhash.go +prefetch128
      	==================
      	WARNING: DATA RACE
      	Write at 0x00c424b55ed0 by goroutine 147:
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).reinit()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:258 +0x102
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).release()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:240 +0x38
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).lightClose()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1633 +0xa4
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).Ask1()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1719 +0xde
      	  lab.nexedi.com/kirr/neo/go/neo/client.(*Client)._Load()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:466 +0x5d3
      	  lab.nexedi.com/kirr/neo/go/neo/client.(*Client).Load()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:392 +0xa8
      	  lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).loadRCE()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:297 +0xfb
      
      	Previous write at 0x00c424b55ed0 by goroutine 75:
      	  sync/atomic.AddInt32()
      	      /home/kirr/src/tools/go/go/src/runtime/race_amd64.s:269 +0xb
      	  sync.(*Mutex).Unlock()
      	      /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54
      	  sync.(*Once).Do()
      	      /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f
      
      	Goroutine 147 (running) created at:
      	  lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).Prefetch()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:198 +0xbb
      	  main.zhash.func2()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:134 +0x86
      	  main.zhash.func4()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:157 +0x8f
      	  main.zhash()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:193 +0xc7b
      	  main.main()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:107 +0x48d
      
      	Goroutine 75 (finished) created at:
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:391 +0x116
      	  sync.(*Once).Do()
      	      /home/kirr/src/tools/go/go/src/sync/once.go:44 +0xe1
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:383 +0x99
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).serveRecv()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:748 +0x651
      	==================
      
      and without -race zhash.go instrumentation it was showing itself as
      "panic: unlock of unlocked mutex" in
      
      	  sync.(*Mutex).Unlock()
      	      /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54
      	  sync.(*Once).Do()
      	      /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f
      8eac771c
    • Kirill Smelkov's avatar
      X zodb: Disable Buf tests under race · e4a100de
      Kirill Smelkov authored
      It was failing in a strange way, e.g.
      
      panic: Buf: refcnt < 0
      fatal error: panic holding locks
      
      goroutine 19 [running]:
      runtime.throw(0x5f9d65, 0x13)
              /home/kirr/src/tools/go/go/src/runtime/panic.go:605 +0x95 fp=0xc42003cc20 sp=0xc42003cc00 pc=0x452935
      panic(0x5c21c0, 0x611f60)
              /home/kirr/src/tools/go/go/src/runtime/panic.go:451 +0x585 fp=0xc42003ccc8 sp=0xc42003cc20 pc=0x4524c5
      lab.nexedi.com/kirr/neo/go/zodb.(*Buf).Release(0xc4200900e0)
              /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/buffer.go:112 +0x161 fp=0xc42003ccf8 sp=0xc42003ccc8 pc=0x59ebc1
      lab.nexedi.com/kirr/neo/go/zodb.TestBufAllocFree(0xc4200a80f0)
              /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/buffer_test.go:111 +0x6ae fp=0xc42003cf78 sp=0xc42003ccf8 pc=0x5a137e
      testing.tRunner(0xc4200a80f0, 0x600930)
              /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16d fp=0xc42003cfd0 sp=0xc42003cf78 pc=0x51688d
      runtime.goexit()
              /home/kirr/src/tools/go/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc42003cfd8 sp=0xc42003cfd0 pc=0x4835f1
      created by testing.(*T).Run
              /home/kirr/src/tools/go/go/src/testing/testing.go:789 +0x569
      
      goroutine 1 [chan receive]:
      testing.(*T).Run(0xc4200a8000, 0x5f9180, 0x10, 0x600930, 0xc420051c50)
              /home/kirr/src/tools/go/go/src/testing/testing.go:790 +0x59b
      testing.runTests.func1(0xc4200a8000)
              /home/kirr/src/tools/go/go/src/testing/testing.go:1004 +0xa8
      testing.tRunner(0xc4200a8000, 0xc420051d90)
              /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16d
      testing.runTests(0xc420090040, 0x6ed4a0, 0x7, 0x7, 0x4159b0)
              /home/kirr/src/tools/go/go/src/testing/testing.go:1002 +0x522
      testing.(*M).Run(0xc420051f20, 0x1d7ab28)
              /home/kirr/src/tools/go/go/src/testing/testing.go:921 +0x207
      main.main()
              lab.nexedi.com/kirr/neo/go/zodb/_test/_testmain.go:56 +0x1d4
      exit status 2
      FAIL    lab.nexedi.com/kirr/neo/go/zodb 0.006s
      
      becuase upon release object was not put back into pool, then we were
      getting another object from pool, complainig but still doing
      buf.Release.
      e4a100de
    • Kirill Smelkov's avatar
      X tracing: Silence race-detector about probe.Detach · bab79665
      Kirill Smelkov authored
      Race-detector does not know Probe.Detach works under world stopped and
      that this way it cannot break consistency of probes list attached to a
      trace event - on event signalling either a probe will be run or not run
      at all.
      
      And we do not mind that e.g. while Detach was in progress a probe was
      was read from traceevent list and deceded to be run and the probe
      function was actually called just after Detach finished.
      
      For this reason tell race-detector to not take into account all memory
      read/write that are performed while the world is stopped.
      
      If we do not it complains e.g. this way:
      
          ==================
          WARNING: DATA RACE
          Read at 0x00c42000d760 by goroutine 7:
            lab.nexedi.com/kirr/neo/go/zodb/storage._traceCacheGCFinish_run()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:265 +0x81
            lab.nexedi.com/kirr/neo/go/zodb/storage.traceCacheGCFinish()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/ztrace.go:22 +0x63
            lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).gc()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:497 +0x62c
            lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).gcmain()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:478 +0x4c
      
          Previous write at 0x00c42000d760 by goroutine 6:
            lab.nexedi.com/kirr/neo/go/xcommon/tracing.(*Probe).Detach()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:319 +0x103
            lab.nexedi.com/kirr/neo/go/xcommon/tracing.(*ProbeGroup).Done()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:344 +0xa5
            lab.nexedi.com/kirr/neo/go/zodb/storage.TestCache()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache_test.go:576 +0x7f94
            testing.tRunner()
                /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c
      
          Goroutine 7 (running) created at:
            lab.nexedi.com/kirr/neo/go/zodb/storage.NewCache()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:129 +0x227
            lab.nexedi.com/kirr/neo/go/zodb/storage.TestCache()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache_test.go:165 +0x7b1
            testing.tRunner()
                /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c
      
          Goroutine 6 (finished) created at:
            testing.(*T).Run()
                /home/kirr/src/tools/go/go/src/testing/testing.go:789 +0x568
            testing.runTests.func1()
                /home/kirr/src/tools/go/go/src/testing/testing.go:1004 +0xa7
            testing.tRunner()
                /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c
            testing.runTests()
                /home/kirr/src/tools/go/go/src/testing/testing.go:1002 +0x521
            testing.(*M).Run()
                /home/kirr/src/tools/go/go/src/testing/testing.go:921 +0x206
            main.main()
                lab.nexedi.com/kirr/neo/go/zodb/storage/_test/_testmain.go:44 +0x1d3
          ==================
      bab79665
  2. 06 Oct, 2017 5 commits
  3. 05 Oct, 2017 23 commits
    • Kirill Smelkov's avatar
      X time z6001 - z600 + prefetch · b0cb5454
      Kirill Smelkov authored
      Max < 3x (nprefetch=8) in go-go case.
      b0cb5454
    • Kirill Smelkov's avatar
      X time z6001 (localhost) +prefetch · 6be37f4e
      Kirill Smelkov authored
      improves max 2x becase probably of go scheduler.
      6be37f4e
    • Kirill Smelkov's avatar
      . · 41d2b5b4
      Kirill Smelkov authored
      41d2b5b4
    • Test's avatar
      X neo1-neo2 time with C-states >C1 (and RX delay) disabled on both machines · 4f8313c1
      Test authored
      Compared to no C-states disabled (but RX delay = 0 as before) it improves:
      
      ping56            36μs  ->    33μs
      ping1472        ~150μs  ->  ~107μs
      
      TCP1             ~65μs  ->   ~45μs
      TCP1472     ~440-515μs  ->  ~420μs    (still very bad)
      TCP4096         ~370μs  ->  ~280μs    (still bad)
      
      ZEO:            ~900μs  ->  ~670μs
      NEO/pylite:    ~1050μs  ->  ~600μs    (Cpy)
      NEO/pylite:     ~800μs  ->  ~505μs    (Cgo)
      NEO/pysql:     ~1700μs  ->  ~900μs    (Cpy)
      NEO/pysql:     ~1300μs  ->  ~790μs    (Cgo)
      NEO/go:         ~730μs  ->  ~430μs    (Cpy)
      NEO/go:         ~370μs  ->  ~210μs    (Cgo)
      NEO/go-nosha1:  ~280μs  ->  ~190μs
      
      Raw TCP timings are still bad.
      4f8313c1
    • Test's avatar
      X neo1 localhost time with C-states >C1 disabled · 8e7527d0
      Test authored
      It improves:
      
      ZEO:             ~500μs ->  ~470μs
      NEO/pylite:      ~640μs ->  ~460μs      (Cpy)
      NEO/pylite:      ~415μs ->  ~350μs      (Cgo)
      NEO/pysql:      ~1300μs ->  ~790μs      (Cpy)
      NEO/pysql:       ~850μs ->  ~650μs      (Cgo)
      NEO/go:          ~350μs ->  ~275μs      (Cpy)
      NEO/go:           ~78μs ->   ~78μs      (Cgo)
      NEO/go-nosha1:    ~45μs ->   ~45μs
      8e7527d0
    • Kirill Smelkov's avatar
      . · 119d2908
      Kirill Smelkov authored
      119d2908
    • Test's avatar
      X time neo1-neo2 · a4e95d4e
      Test authored
      Same as on oct4 - we just add C-states profiling. In particular TCP1472 and TCP4096 are awful.
      C-states are not yet disabled.
      a4e95d4e
    • Test's avatar
      X time on neo1 (localhost) · de5d9c06
      Test authored
      Same as oct04 - we just add C-states profiling. C-states are not yet disabled.
      de5d9c06
    • Kirill Smelkov's avatar
      . · d2b13cbb
      Kirill Smelkov authored
      d2b13cbb
    • Kirill Smelkov's avatar
      X z6001-z600 time with C-states >C1 and RX delay disabled on both machines · cf6e57b0
      Kirill Smelkov authored
      Compared to just C-states disabled it improves:
      
      ping56           ~76μs  ->   ~40μs
      ping1472        ~150μs  ->  ~120μs
      
      TCP1             ~90μs  ->   ~50μs
      TCP1472         ~180μs  ->  ~145μs
      TCP4096      ~220-230μs ->  ~175μs
      
      ZEO:            ~660μs  ->  ~660μs
      NEO/pylite:     ~630μs  ->  ~575μs    (Cpy)
      NEO/pylite:     ~505μs  ->  ~460μs    (Cgo)
      NEO/pysql:      ~930μs  ->  ~880μs    (Cpy)
      NEO/pysql:      ~810μs  ->  ~750μs    (Cgo)
      NEO/go:         ~430μs  ->  ~380μs    (Cpy)
      NEO/go:         ~215μs  ->  ~170μs    (Cgo)
      NEO/go-nosha1:  ~195μs  ->  ~150μs
      cf6e57b0
    • Kirill Smelkov's avatar
      X z6001-z600 time with C-states >C1 disabled on both machines · a85ae5aa
      Kirill Smelkov authored
      It improves:
      
      ping56          ~
      ping1472        ~
      
      TCP1             ~92μs  ->  ~89μs       (c -> c)
      TCP1            ~120μs  ->  ~91μs       (c -> go)
      TCP1            ~120μs  ->  ~90μs       (c <- c)
      TCP1            ~120μs  ->  ~90μs       (go <- c)
      
      TCP1472         ~220-250μs  -> ~180μs
      TCP4096         ~270-300μs  -> ~220-230μs
      
      ZEO:             ~750μs ->  ~660μs
      NEO/pylite:      ~850μs ->  ~630μs      (Cpy)
      NEO/pylite:      ~640μs ->  ~505μs      (Cgo)
      NEO/pysql:      ~1500μs ->  ~930μs      (Cpy)
      NEO/pysql:      ~1350μs ->  ~810μs      (Cgo)
      NEO/go:          ~600μs ->  ~430μs      (Cpy)
      NEO/go:          ~320μs ->  ~215μs      (Cgo)
      NEO/go-nosha1:   ~260μs ->  ~195μs
      a85ae5aa
    • Kirill Smelkov's avatar
      . · 9c2b63c9
      Kirill Smelkov authored
      9c2b63c9
    • Kirill Smelkov's avatar
      X z6001 localhost time with C-states >C1 disabled · 8d708026
      Kirill Smelkov authored
      It improves a lot:
      
      ZEO:             ~660μs ->  ~490μs
      NEO/pylite:      ~750μs ->  ~470μs      (Cpy)
      NEO/pylite:      ~500μs ->  ~370μs      (Cgo)
      NEO/pysql:      ~1450μs ->  ~780μs      (Cpy)
      NEO/pysql:      ~1000μs ->  ~670μs      (Cgo)
      NEO/go:          ~450μs ->  ~280μs      (Cpy)
      NEO/go:           ~90μs ->   ~80μs      (Cgo)
      NEO/go-nosha1:    ~60μs ->   ~55μs
      8d708026
    • Kirill Smelkov's avatar
      X time z6001 - z600 · 42abaf7a
      Kirill Smelkov authored
      Same as on oct04 but now we add C-states profile. C-states are not yet disabled.
      42abaf7a
    • Kirill Smelkov's avatar
      . · 2764bf35
      Kirill Smelkov authored
      2764bf35
    • Kirill Smelkov's avatar
      X time on z6001 (localhost) · 4b5db5b0
      Kirill Smelkov authored
      ~ the same as on oct04 only we add C-states profiling. C-states are not yet disabled.
      4b5db5b0
    • Kirill Smelkov's avatar
      X deco localhost time with C-states >C1 disabled · b1f98a52
      Kirill Smelkov authored
      py timings improve a bit - ~15-20μs per python part (i.e. ~30-40μs if
      both client and server is in python).
      
      go-go timings stays ~ the same.
      b1f98a52
    • Kirill Smelkov's avatar
      X time on deco (localhost) · 2ad17ebc
      Kirill Smelkov authored
      Timings are ~same as on sep17 but now a lot more info about system is
      added + C-stats profile. C-states are not yet disabled.
      2ad17ebc
    • Kirill Smelkov's avatar
      . · 472c0da7
      Kirill Smelkov authored
      472c0da7
    • Kirill Smelkov's avatar
      . · cfde4271
      Kirill Smelkov authored
      cfde4271
    • Kirill Smelkov's avatar
      X first cut on C-state profiling · a4a18b55
      Kirill Smelkov authored
      a4a18b55
    • Kirill Smelkov's avatar
      . · 630a742b
      Kirill Smelkov authored
      630a742b
    • Kirill Smelkov's avatar
      . · 58e7d04b
      Kirill Smelkov authored
      58e7d04b
  4. 04 Oct, 2017 6 commits