• Kirill Smelkov's avatar
    go/neo/t/neotest: CPU information & benchmarks · a60c472c
    Kirill Smelkov authored
    Add to neotest bench-cpu command that performs basic CPU benchmarks:
    pystone and CRC32/SHA1 for now. While every benchmark is run
    additionally C-states profile is collected(*). Example output:
    
    	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu
    	node:   deco
    	cluster:
    	Benchmarkpystone 1 283297 pystone/s     # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6
    	Benchmarkpystone 1 289788 pystone/s     # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7
    	Benchmarkpystone 1 286329 pystone/s     # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6
    	Benchmarkpystone 1 292087 pystone/s     # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3
    	Benchmarkpystone 1 290119 pystone/s     # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5
    	Benchmarkcrc32/py/4K 300000     3.415 µs/op     # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71
    	Benchmarkcrc32/py/4K 300000     3.402 µs/op     # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77
    	Benchmarkcrc32/py/4K 300000     3.396 µs/op     # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36
    	Benchmarkcrc32/py/4K 300000     3.435 µs/op     # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79
    	Benchmarkcrc32/py/4K 300000     3.434 µs/op     # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55
    	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295
    	Benchmarkcrc32/go/4K 10000000   0.216 µs/op     # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330
    	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301
    	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309
    	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328
    	Benchmarksha1/py/4K 300000      4.553 µs/op     # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94
    	Benchmarksha1/py/4K 300000      4.459 µs/op     # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92
    	Benchmarksha1/py/4K 300000      4.492 µs/op     # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62
    	Benchmarksha1/py/4K 300000      4.550 µs/op     # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93
    	Benchmarksha1/py/4K 300000      4.518 µs/op     # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78
    	Benchmarksha1/go/4K 300000      4.312 µs/op     # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190
    	Benchmarksha1/go/4K 300000      4.383 µs/op     # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182
    	Benchmarksha1/go/4K 300000      4.387 µs/op     # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186
    	Benchmarksha1/go/4K 300000      4.328 µs/op     # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179
    	Benchmarksha1/go/4K 300000      4.337 µs/op     # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191
    
    Such raw output can be summarized with the help of benchstat - either
    with Go[1] or Python[2] implementations:
    
    	$ benchstat x.txt
    	name         pystone/s
    	pystone        288k ± 2%
    
    	name         time/op
    	crc32/py/4K  3.42µs ± 1%
    	crc32/go/4K   218ns ± 1%
    	sha1/py/4K   4.51µs ± 1%
    	sha1/go/4K   4.35µs ± 1%
    
    See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some
    discussion on SHA1 vs CRC32.
    
    While at CPU topic, teach info/info-local to show related information
    about node's CPU: available processors, frequency and idle governors.
    Example of lines added:
    
    	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6
    	...
    	cpu:    Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz
    	cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz]
    	cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs
    	WARNING: cpu: frequency not fixed - benchmark timings won't be stable
    	WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable
    	WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency)
    
    See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to
    understand why there are warnings in above example.
    
    Some draft history related to this patch:
    
    	lab.nexedi.com/kirr/neo/commit/cf1f7c24	X tcpu: Don't depend on running tests with cwd = .../go/neo/t/
    	lab.nexedi.com/kirr/neo/commit/1e438610	fixup! X neotest: Also show target-latency for C-states
    	lab.nexedi.com/kirr/neo/commit/4af48245	X neotest: Also show target-latency for C-states
    	lab.nexedi.com/kirr/neo/commit/2910cf56	X neotest: Prefer first part of FQDN for hostname
    	lab.nexedi.com/kirr/neo/commit/c86ba1b0	X bench-cpu += crc32, adler32
    	lab.nexedi.com/kirr/neo/commit/4ac3a550	X neotest: Don't use bc
    	lab.nexedi.com/kirr/neo/commit/3918a997	X neotest: Don't assume we are invoked from the directory where neotest is
    	lab.nexedi.com/kirr/neo/commit/9a266d11	X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B
    	lab.nexedi.com/kirr/neo/commit/b6a830d8	X switch cpu benchmarks to go format
    	lab.nexedi.com/kirr/neo/commit/4436b983	X neotest: Provide cpustat command so it is possible to cpustat something external
    	lab.nexedi.com/kirr/neo/commit/b062b349	X microbenchmark CPU first
    	lab.nexedi.com/kirr/neo/commit/a4a18b55	X first cut on C-state profiling
    	lab.nexedi.com/kirr/neo/commit/ea1e0835	X found that cpuidle can be affecting latency a lot!
    
    (*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and
        http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for
        why this is important.
    
        Since being able to profile C-states can be generally useful, we
        expose such profiling with externally-visible `neotest cpustat` utility.
    
    [1] https://godoc.org/golang.org/x/perf/cmd/benchstat
    [2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py
    a60c472c
tcpu.go 3.02 KB