- 10 Oct, 2017 8 commits
-
-
Test authored
See previous commit for details about why latency is bad for TCP payload > MSS without TSO. Compared to latest neo1-neo2 timings from Oct05 (C-states disabled, no rx-delay) it improves: TCP1 ~45μs -> ~45μs TCP1472 ~430μs # TCP lat. anaomaly TCP1400 -> ~120μs # finally TCP1500 -> ~130μs # fixed TCP4096 ~285μs -> ~170μs # ! ZEO ~670μs -> ~580-1045μs (?) NEO/pylite ~605μs -> ~600-700μs (?) (Cpy) NEO/pylite ~505μs -> ~525-580μs (?) (Cgo) NEO/pysql ~900μs -> ~820-930μs (?) (Cpy) NEO/pysql ~780μs -> ~740-800μs (?) (Cgo) NEO/go ~430μs -> ~360μs (Cpy) # <-- NOTE NEO/go ~210μs -> ~160μs (Cgo) # <-- NOTE NEO/go-nosha1 ~190μs -> ~140μs # <-- NOTE not sure about noise in pure py runs but given raw tcp latency absolutely improves this should be a good change to make.
-
Kirill Smelkov authored
On neo1 - neo2 without TSO latency becomes very poor in lat_tcp when payload size becomes greater TCP MSS (lat_tcp -m 1448 ~ 130μs; lat_tcp -m 1449 ~ 500μs and more)
-
Kirill Smelkov authored
e.g. neotest cpustat ls
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Same a before with added information. In particular z6001-z600 shows there is no TCP RR step-wise 400μs increase while going 1400B -> 1500B as it is currently the case on RTL.
-
Kirill Smelkov authored
-
Test authored
Same as before but with more information and adjusted ping and TCP RR. In particular TCP RR shows: TCPRR1400 ~ 120μs TCPRR1500 ~ 430μs The time when this increases step-wise is when TCP packet crosses fitting 1 ethernet frame. It does not happen on z600* with their current settings.
-
Kirill Smelkov authored
-
- 09 Oct, 2017 6 commits
-
-
Test authored
Same as Oct05 5μs C-latency with added: - pystones - t(sha1) py & go - zhash.go working in prefetch mode - 16 clients loading 1 server at the same time
-
Kirill Smelkov authored
Same as Oct05 5μs C-latency with added: - pystones - t(sha1) py & go - zhash.go working in prefetch mode - 16 clients loading 1 server at the same time
-
Kirill Smelkov authored
-
Kirill Smelkov authored
The scenario could be: nl.shutdown sets nl.connTab=nil and then iterates connTab snapshot taken under nl.connMu lock. If so this activity (which calls e.g. Conn.shutdown) could be running with conn.lightClose -> conn.release() in parallel. Bug triggers under neotest in zhash.go +prefetch128: crc32:bf9deea9 ; oid=0..8499 nread=34134938 t=9.010696705s (1.060081ms / object) x=zhash.go +prefetch128 ================== WARNING: DATA RACE Write at 0x00c424b55ed0 by goroutine 147: lab.nexedi.com/kirr/neo/go/neo.(*Conn).reinit() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:258 +0x102 lab.nexedi.com/kirr/neo/go/neo.(*Conn).release() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:240 +0x38 lab.nexedi.com/kirr/neo/go/neo.(*Conn).lightClose() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1633 +0xa4 lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).Ask1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1719 +0xde lab.nexedi.com/kirr/neo/go/neo/client.(*Client)._Load() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:466 +0x5d3 lab.nexedi.com/kirr/neo/go/neo/client.(*Client).Load() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:392 +0xa8 lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).loadRCE() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:297 +0xfb Previous write at 0x00c424b55ed0 by goroutine 75: sync/atomic.AddInt32() /home/kirr/src/tools/go/go/src/runtime/race_amd64.s:269 +0xb sync.(*Mutex).Unlock() /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54 sync.(*Once).Do() /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95 lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1 lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62 lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f Goroutine 147 (running) created at: lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).Prefetch() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:198 +0xbb main.zhash.func2() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:134 +0x86 main.zhash.func4() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:157 +0x8f main.zhash() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:193 +0xc7b main.main() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:107 +0x48d Goroutine 75 (finished) created at: lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:391 +0x116 sync.(*Once).Do() /home/kirr/src/tools/go/go/src/sync/once.go:44 +0xe1 lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:383 +0x99 lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).serveRecv() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:748 +0x651 ================== and without -race zhash.go instrumentation it was showing itself as "panic: unlock of unlocked mutex" in sync.(*Mutex).Unlock() /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54 sync.(*Once).Do() /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95 lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1 lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62 lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f
-
Kirill Smelkov authored
It was failing in a strange way, e.g. panic: Buf: refcnt < 0 fatal error: panic holding locks goroutine 19 [running]: runtime.throw(0x5f9d65, 0x13) /home/kirr/src/tools/go/go/src/runtime/panic.go:605 +0x95 fp=0xc42003cc20 sp=0xc42003cc00 pc=0x452935 panic(0x5c21c0, 0x611f60) /home/kirr/src/tools/go/go/src/runtime/panic.go:451 +0x585 fp=0xc42003ccc8 sp=0xc42003cc20 pc=0x4524c5 lab.nexedi.com/kirr/neo/go/zodb.(*Buf).Release(0xc4200900e0) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/buffer.go:112 +0x161 fp=0xc42003ccf8 sp=0xc42003ccc8 pc=0x59ebc1 lab.nexedi.com/kirr/neo/go/zodb.TestBufAllocFree(0xc4200a80f0) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/buffer_test.go:111 +0x6ae fp=0xc42003cf78 sp=0xc42003ccf8 pc=0x5a137e testing.tRunner(0xc4200a80f0, 0x600930) /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16d fp=0xc42003cfd0 sp=0xc42003cf78 pc=0x51688d runtime.goexit() /home/kirr/src/tools/go/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc42003cfd8 sp=0xc42003cfd0 pc=0x4835f1 created by testing.(*T).Run /home/kirr/src/tools/go/go/src/testing/testing.go:789 +0x569 goroutine 1 [chan receive]: testing.(*T).Run(0xc4200a8000, 0x5f9180, 0x10, 0x600930, 0xc420051c50) /home/kirr/src/tools/go/go/src/testing/testing.go:790 +0x59b testing.runTests.func1(0xc4200a8000) /home/kirr/src/tools/go/go/src/testing/testing.go:1004 +0xa8 testing.tRunner(0xc4200a8000, 0xc420051d90) /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16d testing.runTests(0xc420090040, 0x6ed4a0, 0x7, 0x7, 0x4159b0) /home/kirr/src/tools/go/go/src/testing/testing.go:1002 +0x522 testing.(*M).Run(0xc420051f20, 0x1d7ab28) /home/kirr/src/tools/go/go/src/testing/testing.go:921 +0x207 main.main() lab.nexedi.com/kirr/neo/go/zodb/_test/_testmain.go:56 +0x1d4 exit status 2 FAIL lab.nexedi.com/kirr/neo/go/zodb 0.006s becuase upon release object was not put back into pool, then we were getting another object from pool, complainig but still doing buf.Release.
-
Kirill Smelkov authored
Race-detector does not know Probe.Detach works under world stopped and that this way it cannot break consistency of probes list attached to a trace event - on event signalling either a probe will be run or not run at all. And we do not mind that e.g. while Detach was in progress a probe was was read from traceevent list and deceded to be run and the probe function was actually called just after Detach finished. For this reason tell race-detector to not take into account all memory read/write that are performed while the world is stopped. If we do not it complains e.g. this way: ================== WARNING: DATA RACE Read at 0x00c42000d760 by goroutine 7: lab.nexedi.com/kirr/neo/go/zodb/storage._traceCacheGCFinish_run() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:265 +0x81 lab.nexedi.com/kirr/neo/go/zodb/storage.traceCacheGCFinish() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/ztrace.go:22 +0x63 lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).gc() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:497 +0x62c lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).gcmain() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:478 +0x4c Previous write at 0x00c42000d760 by goroutine 6: lab.nexedi.com/kirr/neo/go/xcommon/tracing.(*Probe).Detach() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:319 +0x103 lab.nexedi.com/kirr/neo/go/xcommon/tracing.(*ProbeGroup).Done() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:344 +0xa5 lab.nexedi.com/kirr/neo/go/zodb/storage.TestCache() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache_test.go:576 +0x7f94 testing.tRunner() /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c Goroutine 7 (running) created at: lab.nexedi.com/kirr/neo/go/zodb/storage.NewCache() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:129 +0x227 lab.nexedi.com/kirr/neo/go/zodb/storage.TestCache() /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache_test.go:165 +0x7b1 testing.tRunner() /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c Goroutine 6 (finished) created at: testing.(*T).Run() /home/kirr/src/tools/go/go/src/testing/testing.go:789 +0x568 testing.runTests.func1() /home/kirr/src/tools/go/go/src/testing/testing.go:1004 +0xa7 testing.tRunner() /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c testing.runTests() /home/kirr/src/tools/go/go/src/testing/testing.go:1002 +0x521 testing.(*M).Run() /home/kirr/src/tools/go/go/src/testing/testing.go:921 +0x206 main.main() lab.nexedi.com/kirr/neo/go/zodb/storage/_test/_testmain.go:44 +0x1d3 ==================
-
- 06 Oct, 2017 5 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
- 05 Oct, 2017 21 commits
-
-
Kirill Smelkov authored
Max < 3x (nprefetch=8) in go-go case.
-
Kirill Smelkov authored
improves max 2x becase probably of go scheduler.
-
Kirill Smelkov authored
-
Test authored
Compared to no C-states disabled (but RX delay = 0 as before) it improves: ping56 36μs -> 33μs ping1472 ~150μs -> ~107μs TCP1 ~65μs -> ~45μs TCP1472 ~440-515μs -> ~420μs (still very bad) TCP4096 ~370μs -> ~280μs (still bad) ZEO: ~900μs -> ~670μs NEO/pylite: ~1050μs -> ~600μs (Cpy) NEO/pylite: ~800μs -> ~505μs (Cgo) NEO/pysql: ~1700μs -> ~900μs (Cpy) NEO/pysql: ~1300μs -> ~790μs (Cgo) NEO/go: ~730μs -> ~430μs (Cpy) NEO/go: ~370μs -> ~210μs (Cgo) NEO/go-nosha1: ~280μs -> ~190μs Raw TCP timings are still bad.
-
Test authored
It improves: ZEO: ~500μs -> ~470μs NEO/pylite: ~640μs -> ~460μs (Cpy) NEO/pylite: ~415μs -> ~350μs (Cgo) NEO/pysql: ~1300μs -> ~790μs (Cpy) NEO/pysql: ~850μs -> ~650μs (Cgo) NEO/go: ~350μs -> ~275μs (Cpy) NEO/go: ~78μs -> ~78μs (Cgo) NEO/go-nosha1: ~45μs -> ~45μs
-
Kirill Smelkov authored
-
Test authored
Same as on oct4 - we just add C-states profiling. In particular TCP1472 and TCP4096 are awful. C-states are not yet disabled.
-
Test authored
Same as oct04 - we just add C-states profiling. C-states are not yet disabled.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Compared to just C-states disabled it improves: ping56 ~76μs -> ~40μs ping1472 ~150μs -> ~120μs TCP1 ~90μs -> ~50μs TCP1472 ~180μs -> ~145μs TCP4096 ~220-230μs -> ~175μs ZEO: ~660μs -> ~660μs NEO/pylite: ~630μs -> ~575μs (Cpy) NEO/pylite: ~505μs -> ~460μs (Cgo) NEO/pysql: ~930μs -> ~880μs (Cpy) NEO/pysql: ~810μs -> ~750μs (Cgo) NEO/go: ~430μs -> ~380μs (Cpy) NEO/go: ~215μs -> ~170μs (Cgo) NEO/go-nosha1: ~195μs -> ~150μs
-
Kirill Smelkov authored
It improves: ping56 ~ ping1472 ~ TCP1 ~92μs -> ~89μs (c -> c) TCP1 ~120μs -> ~91μs (c -> go) TCP1 ~120μs -> ~90μs (c <- c) TCP1 ~120μs -> ~90μs (go <- c) TCP1472 ~220-250μs -> ~180μs TCP4096 ~270-300μs -> ~220-230μs ZEO: ~750μs -> ~660μs NEO/pylite: ~850μs -> ~630μs (Cpy) NEO/pylite: ~640μs -> ~505μs (Cgo) NEO/pysql: ~1500μs -> ~930μs (Cpy) NEO/pysql: ~1350μs -> ~810μs (Cgo) NEO/go: ~600μs -> ~430μs (Cpy) NEO/go: ~320μs -> ~215μs (Cgo) NEO/go-nosha1: ~260μs -> ~195μs
-
Kirill Smelkov authored
-
Kirill Smelkov authored
It improves a lot: ZEO: ~660μs -> ~490μs NEO/pylite: ~750μs -> ~470μs (Cpy) NEO/pylite: ~500μs -> ~370μs (Cgo) NEO/pysql: ~1450μs -> ~780μs (Cpy) NEO/pysql: ~1000μs -> ~670μs (Cgo) NEO/go: ~450μs -> ~280μs (Cpy) NEO/go: ~90μs -> ~80μs (Cgo) NEO/go-nosha1: ~60μs -> ~55μs
-
Kirill Smelkov authored
Same as on oct04 but now we add C-states profile. C-states are not yet disabled.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
~ the same as on oct04 only we add C-states profiling. C-states are not yet disabled.
-
Kirill Smelkov authored
py timings improve a bit - ~15-20μs per python part (i.e. ~30-40μs if both client and server is in python). go-go timings stays ~ the same.
-
Kirill Smelkov authored
Timings are ~same as on sep17 but now a lot more info about system is added + C-stats profile. C-states are not yet disabled.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-