go/neo/t/neotest: CPU information & benchmarks

Add to neotest bench-cpu command that performs basic CPU benchmarks: pystone and CRC32/SHA1 for now. While every benchmark is run additionally C-states profile is collected(*). Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu node: deco cluster: Benchmarkpystone 1 283297 pystone/s # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6 Benchmarkpystone 1 289788 pystone/s # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7 Benchmarkpystone 1 286329 pystone/s # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6 Benchmarkpystone 1 292087 pystone/s # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3 Benchmarkpystone 1 290119 pystone/s # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5 Benchmarkcrc32/py/4K 300000 3.415 µs/op # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71 Benchmarkcrc32/py/4K 300000 3.402 µs/op # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77 Benchmarkcrc32/py/4K 300000 3.396 µs/op # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36 Benchmarkcrc32/py/4K 300000 3.435 µs/op # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79 Benchmarkcrc32/py/4K 300000 3.434 µs/op # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295 Benchmarkcrc32/go/4K 10000000 0.216 µs/op # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328 Benchmarksha1/py/4K 300000 4.553 µs/op # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94 Benchmarksha1/py/4K 300000 4.459 µs/op # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92 Benchmarksha1/py/4K 300000 4.492 µs/op # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62 Benchmarksha1/py/4K 300000 4.550 µs/op # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93 Benchmarksha1/py/4K 300000 4.518 µs/op # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78 Benchmarksha1/go/4K 300000 4.312 µs/op # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190 Benchmarksha1/go/4K 300000 4.383 µs/op # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182 Benchmarksha1/go/4K 300000 4.387 µs/op # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186 Benchmarksha1/go/4K 300000 4.328 µs/op # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179 Benchmarksha1/go/4K 300000 4.337 µs/op # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191 Such raw output can be summarized with the help of benchstat - either with Go[1] or Python[2] implementations: $ benchstat x.txt name pystone/s pystone 288k ± 2% name time/op crc32/py/4K 3.42µs ± 1% crc32/go/4K 218ns ± 1% sha1/py/4K 4.51µs ± 1% sha1/go/4K 4.35µs ± 1% See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some discussion on SHA1 vs CRC32. While at CPU topic, teach info/info-local to show related information about node's CPU: available processors, frequency and idle governors. Example of lines added: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6 ... cpu: Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz] cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs WARNING: cpu: frequency not fixed - benchmark timings won't be stable WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency) See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to understand why there are warnings in above example. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/cf1f7c24 X tcpu: Don't depend on running tests with cwd = .../go/neo/t/ lab.nexedi.com/kirr/neo/commit/1e438610 fixup! X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/4af48245 X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/2910cf56 X neotest: Prefer first part of FQDN for hostname lab.nexedi.com/kirr/neo/commit/c86ba1b0 X bench-cpu += crc32, adler32 lab.nexedi.com/kirr/neo/commit/4ac3a550 X neotest: Don't use bc lab.nexedi.com/kirr/neo/commit/3918a997 X neotest: Don't assume we are invoked from the directory where neotest is lab.nexedi.com/kirr/neo/commit/9a266d11 X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B lab.nexedi.com/kirr/neo/commit/b6a830d8 X switch cpu benchmarks to go format lab.nexedi.com/kirr/neo/commit/4436b983 X neotest: Provide cpustat command so it is possible to cpustat something external lab.nexedi.com/kirr/neo/commit/b062b349 X microbenchmark CPU first lab.nexedi.com/kirr/neo/commit/a4a18b55 X first cut on C-state profiling lab.nexedi.com/kirr/neo/commit/ea1e0835 X found that cpuidle can be affecting latency a lot! (*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for why this is important. Since being able to profile C-states can be generally useful, we expose such profiling with externally-visible `neotest cpustat` utility. [1] https://godoc.org/golang.org/x/perf/cmd/benchstat [2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py

go/neo/t/neotest: CPU information & benchmarks
Add to neotest bench-cpu command that performs basic CPU benchmarks: pystone and CRC32/SHA1 for now. While every benchmark is run additionally C-states profile is collected(*). Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu node: deco cluster: Benchmarkpystone 1 283297 pystone/s # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6 Benchmarkpystone 1 289788 pystone/s # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7 Benchmarkpystone 1 286329 pystone/s # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6 Benchmarkpystone 1 292087 pystone/s # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3 Benchmarkpystone 1 290119 pystone/s # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5 Benchmarkcrc32/py/4K 300000 3.415 µs/op # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71 Benchmarkcrc32/py/4K 300000 3.402 µs/op # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77 Benchmarkcrc32/py/4K 300000 3.396 µs/op # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36 Benchmarkcrc32/py/4K 300000 3.435 µs/op # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79 Benchmarkcrc32/py/4K 300000 3.434 µs/op # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295 Benchmarkcrc32/go/4K 10000000 0.216 µs/op # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328 Benchmarksha1/py/4K 300000 4.553 µs/op # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94 Benchmarksha1/py/4K 300000 4.459 µs/op # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92 Benchmarksha1/py/4K 300000 4.492 µs/op # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62 Benchmarksha1/py/4K 300000 4.550 µs/op # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93 Benchmarksha1/py/4K 300000 4.518 µs/op # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78 Benchmarksha1/go/4K 300000 4.312 µs/op # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190 Benchmarksha1/go/4K 300000 4.383 µs/op # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182 Benchmarksha1/go/4K 300000 4.387 µs/op # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186 Benchmarksha1/go/4K 300000 4.328 µs/op # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179 Benchmarksha1/go/4K 300000 4.337 µs/op # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191 Such raw output can be summarized with the help of benchstat - either with Go[1] or Python[2] implementations: $ benchstat x.txt name pystone/s pystone 288k ± 2% name time/op crc32/py/4K 3.42µs ± 1% crc32/go/4K 218ns ± 1% sha1/py/4K 4.51µs ± 1% sha1/go/4K 4.35µs ± 1% See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some discussion on SHA1 vs CRC32. While at CPU topic, teach info/info-local to show related information about node's CPU: available processors, frequency and idle governors. Example of lines added: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6 ... cpu: Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz] cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs WARNING: cpu: frequency not fixed - benchmark timings won't be stable WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency) See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to understand why there are warnings in above example. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/cf1f7c24 X tcpu: Don't depend on running tests with cwd = .../go/neo/t/ lab.nexedi.com/kirr/neo/commit/1e438610 fixup! X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/4af48245 X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/2910cf56 X neotest: Prefer first part of FQDN for hostname lab.nexedi.com/kirr/neo/commit/c86ba1b0 X bench-cpu += crc32, adler32 lab.nexedi.com/kirr/neo/commit/4ac3a550 X neotest: Don't use bc lab.nexedi.com/kirr/neo/commit/3918a997 X neotest: Don't assume we are invoked from the directory where neotest is lab.nexedi.com/kirr/neo/commit/9a266d11 X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B lab.nexedi.com/kirr/neo/commit/b6a830d8 X switch cpu benchmarks to go format lab.nexedi.com/kirr/neo/commit/4436b983 X neotest: Provide cpustat command so it is possible to cpustat something external lab.nexedi.com/kirr/neo/commit/b062b349 X microbenchmark CPU first lab.nexedi.com/kirr/neo/commit/a4a18b55 X first cut on C-state profiling lab.nexedi.com/kirr/neo/commit/ea1e0835 X found that cpuidle can be affecting latency a lot! (*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for why this is important. Since being able to profile C-states can be generally useful, we expose such profiling with externally-visible `neotest cpustat` utility. [1] https://godoc.org/golang.org/x/perf/cmd/benchstat [2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py
a60c472c · Kirill Smelkov · c12f2991 · a60c472c · a60c472c · a60c472c
Commit a60c472c authored Jul 09, 2018 by Kirill Smelkov
Showing with 457 additions and 0 deletions

go/neo/t/.gitignore go/neo/t/.gitignore +2 -0

go/neo/t/neotest go/neo/t/neotest +218 -0

go/neo/t/tcpu.go go/neo/t/tcpu.go +120 -0

go/neo/t/tcpu.py go/neo/t/tcpu.py +117 -0

No files found.
--- a/go/neo/t/.gitignore
+++ b/go/neo/t/.gitignore
+/tcpu
+/tcpu_go
--- a/go/neo/t/neotest
+++ b/go/neo/t/neotest
@@ -72,6 +72,8 @@ EOF

 	. env.sh

+	pip install pygolang	# for tcpu.py
+
 	mkdir -p src/lab.nexedi.com/kirr
 	pushd src/lab.nexedi.com/kirr
 	test -d neo || git clone -o kirr https://lab.nexedi.com/kirr/neo.git neo
@@ -204,6 +206,23 @@ proginfo() {
 	which $prog >/dev/null 2>&1 && $prog "$@" || printf "%-16s: ø\n" "$prog"
 }

+# fkghz file	- extract value from file (in KHz) and render it as GHz
+fkghz() {
+	python -c "print '%.2fGHz' % (`cat $1` / 1E6)"
+}
+
+# xhostname	- show short system host name
+xhostname() {
+	# prefer first part of FQDN for misconfigured systems like
+	# fqdn=z6001.ivan.nexedi.com, hostname=z6001-COMP-2784
+	fqdn=`hostname --fqdn 2>/dev/null || :`
+	if test -n "$fqdn"; then
+		echo "$fqdn" |sed -e 's/\./ /' |awk '{print $1}'
+	else
+		hostname
+	fi
+}
+
 # show information about local system (os, hardware, versions, ...)
 system_info() {
 	echo -ne "date:\t"; date --rfc-2822
@@ -215,6 +234,93 @@ system_info() {
 	echo ")"
 	echo -ne "uname:\t"; uname -a

+	# cpu
+	echo -ne "cpu:\t"; grep "^model name" /proc/cpuinfo |head -1 |sed -e 's/model name\s*: //'
+	syscpu=/sys/devices/system/cpu
+	sysidle=$syscpu/cpuidle
+
+	cpuvabbrev() {	# cpuvabbrev cpu0 cpu1 cpu2 ... cpuN	-> cpu/[0-N]
+		test $# -le 1 && echo "$@" && return
+
+		min=""
+		max=""
+		while [ $# -ne 0 ]; do
+			v=$1
+			shift
+			n=${v#cpu}
+
+			test -z "$min" && min=$n && max=$n continue
+			if (( $n != $max + 1 )); then
+				die "cpuvabbrev: assert: nonconsecutive $max $n"
+			fi
+			max=$n
+		done
+		echo "cpu/[$min-$max]"
+	}
+
+	freqcpuv=()	# [] of cpu
+	freqstr=""	# text about cpufreq for cpus in ^^^
+	freqdump() {
+		test "${#freqcpuv[@]}" = 0 && return
+		echo "`cpuvabbrev ${freqcpuv[*]}`/freq: $freqstr"
+		freqcpuv=()
+		freqstr=""
+	}
+
+	idlecpuv=()	# ----//---- for cpuidle
+	idlestr=""
+	idledump() {
+		test "${#idlecpuv[@]}" = 0 && return
+		echo "`cpuvabbrev ${idlecpuv[*]}`/idle: $idlestr"
+		idlecpuv=()
+		idlestr=""
+	}
+
+	freqstable=y
+	while read cpu; do
+		f="$cpu/cpufreq"
+		fmin=`fkghz $f/scaling_min_freq`
+		fmax=`fkghz $f/scaling_max_freq`
+		fs="`cat $f/scaling_driver`/`cat $f/scaling_governor` [$fmin - $fmax]"
+		if [ "$fs" != "$freqstr" ]; then
+			freqdump
+			freqstr="$fs"
+		fi
+		freqcpuv+=(`basename $cpu`)
+		test "$fmin" != "$fmax" && freqstable=n
+	done \
+	< <(ls -vd $syscpu/cpu[0-9]*)
+	freqdump
+
+	latmax=0
+	while read cpu; do
+		is="`cat $sysidle/current_driver`/`cat $sysidle/current_governor_ro`:"
+		while read state; do
+			is+=" "
+			lat=`cat $state/latency`
+			res=`cat $state/residency 2>/dev/null` || res="?"	# added in linux 3.15
+			test "`cat $state/disable`" = "1" && is+="!" || latmax=$(($lat>$latmax?$lat:$latmax))
+			is+="`cat $state/name`·${lat}/${res}"
+		done \
+		< <(ls -vd $cpu/cpuidle/state[0-9]*)
+
+		is+=" # elat/tres µs"
+
+		if [ "$is" != "$idlestr" ]; then
+			idledump
+			idlestr="$is"
+		fi
+		idlecpuv+=(`basename $cpu`)
+	done \
+	< <(ls -vd $syscpu/cpu[0-9]*)
+	idledump
+
+	test "$freqstable" = y || echo "WARNING: cpu: frequency not fixed - benchmark timings won't be stable"
+	test "$latmax" -le 10  || {
+		echo "WARNING: cpu: C-state exit-latency is max ${latmax}μs - benchmark timings won't be stable"
+		echo "WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency)"
+	}
+
 	printf "%-20s" "sw/python:";	proginfo python --version 2>&1	# https://bugs.python.org/issue18338
 	printf "%-20s" "sw/go:";	proginfo go version
 	printf "%-20s" "sw/sqlite:";	proginfo python -c \
@@ -228,6 +334,97 @@ system_info() {
 }


+# ---- benchmarking ----
+
+# cpustat ...	- run ... and print CPU C-states statistic
+cpustat() {
+	# XXX +cpufreq transition statistics (CPU_FREQ_STAT) ?
+
+	syscpu=/sys/devices/system/cpu
+	cpuv=( `ls -vd $syscpu/cpu[0-9]*` )
+	# XXX we assume cpuidle states are the same for all cpus and get list of them from cpu0
+	statev=( `ls -vd ${cpuv[0]}/cpuidle/state[0-9]* |xargs -n 1 basename` )
+
+	# get current [state]usage. usage for a state is summed across all cpus
+	statev_usage() {
+		usagev=()
+		for s in ${statev[*]}; do
+			#echo >&2 $s
+			susage=0
+			for u in `cat $syscpu/cpu[0-9]*/cpuidle/$s/usage`; do
+				#echo -e >&2 "\t$u"
+				((susage+=$u))
+			done
+			usagev+=($susage)
+		done
+		echo ${usagev[*]}
+	}
+
+	ustartv=( `statev_usage` )
+	#echo >&2 "--------"
+	#sleep 1
+	ret=0
+	out="$("$@" 2>&1)" || ret=$?
+	uendv=( `statev_usage` )
+
+	stat="#"
+	for ((i=0;i<${#statev[*]};i++)); do
+		s=${statev[$i]}
+		sname=`cat ${cpuv[0]}/cpuidle/$s/name`
+		du=$((${uendv[$i]} - ${ustartv[$i]}))
+		#stat+=" $sname(+$du)"
+		stat+=" $sname·$du"
+		#stat+=" $du·$sname"
+	done
+
+	if [ `echo "$out" | wc -l` -gt 1 ]; then
+		# multiline out - add another line
+		echo "$out"
+		echo "$stat"
+	else
+		# 1-line out	- add stats at line tail
+		echo -n "$out"
+		echo -e "\t$stat"
+	fi
+
+	return $ret
+}
+
+Nrun=5			# repeat benchmarks N time
+
+#profile=
+profile=cpustat
+
+# nrun ...	- run ... $Nrun times serially
+nrun() {
+	for i in `seq $Nrun`; do
+		$profile "$@"
+	done
+}
+
+# bench_cpu	- microbenchmark CPU
+bench_cpu() {
+	echo -ne "node:\t"; xhostname
+	echo     "cluster:"
+	nrun sh -c "python -m test.pystone |tail -1 |sed -e \
+		\"s|^This machine benchmarks at \([0-9.]\+\) pystones/second$|Benchmarkpystone 1 \1 pystone/s|\""
+
+	sizev="4096"		# 1024 $((2*1024*1024))
+	benchv="crc32 sha1"	# adler32
+	for bench in $benchv; do
+		for size in $sizev; do
+			nrun tcpu.py $bench $size
+			nrun tcpu_go $bench $size
+		done
+	done
+}
+
+
+
+# command: benchmark local cpu
+cmd_bench-cpu() {
+	bench_cpu
+}

 # command: print information about local node
 cmd_info-local() {
@@ -241,6 +438,11 @@ cmd_info() {
 	on $url ./neotest info-local
 }

+# utility: cpustat on running arbitrary command
+cmd_cpustat() {
+	cpustat "$@"
+}
+
 # ---- main driver ----

 usage() {
@@ -260,11 +462,18 @@ The commands are:
 	test-py		run NEO/py unit tests	(part of test-local)


+	bench-cpu	benchmark local cpu
+
+
 	deploy		deploy NEO & needed software for tests to remote host
 	deploy-local	deploy NEO & needed software for tests locally

 	info		print information about a node
 	info-local	print information about local deployment
+
+Additional utility commands:
+
+	cpustat		run a command and print CPU-related statistics
 EOF
 }

@@ -278,9 +487,13 @@ test-local)	f=(build                );;
 test-go)	f=(build                );;
 test-py)	f=(                     );;

+bench-cpu)	f=(build                );;
+
 info)		f=(                     );;
 info-local)	f=(      net            );;

+cpustat)	f=(                     );;
+
 -h)
 	usage
 	exit 0
@@ -295,9 +508,14 @@ esac
 for flag in ${f[*]}; do
 	case "$flag" in
 	build)
+		# make sure tcpu* is on PATH (because we could be invoked from another dir)
+		X=$(cd `dirname $0` && pwd)
+		export PATH=$X:$PATH
+
 		# rebuild go bits
 		# neo/py, wendelin.core, ... - must be pip install'ed - `neotest deploy` cares about that
 		go install -v lab.nexedi.com/kirr/neo/go/...
+		go build -o $X/tcpu_go $X/tcpu.go
 		;;

 	net)

--- a/go/neo/t/tcpu.go
+++ b/go/neo/t/tcpu.go
+// Copyright (C) 2017  Nexedi SA and Contributors.
+//                     Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+// +build ignore
+
+// tcpu - cpu-related benchmarks
+package main
+
+import (
+	"crypto/sha1"
+	"flag"
+	"fmt"
+	"hash"
+	"hash/adler32"
+	"hash/crc32"
+	"log"
+	"os"
+	"strconv"
+	"testing"
+	"time"
+)
+
+func dieusage() {
+	fmt.Fprintf(os.Stderr, "Usage: tcpu <benchmark> <block-size>\n")
+	os.Exit(1)
+}
+
+const unitv = "BKMGT" // (2^10)^i represents by corresponding char suffix
+
+// fmtsize formats size in human readable form
+func fmtsize(size int) string {
+	const order = 1<<10
+	norder := 0
+	for size != 0 && (size % order) == 0 && (norder + 1 < len(unitv)) {
+		size /= order
+		norder += 1
+	}
+
+	return fmt.Sprintf("%d%c", size, unitv[norder])
+}
+
+func prettyarg(arg string) string {
+	size, err := strconv.Atoi(arg)
+	if err != nil {
+		return arg
+	}
+	return fmtsize(size)
+}
+
+// benchit runs the benchmark for benchf
+func benchit(benchname string, bencharg string, benchf func(*testing.B, string)) {
+	// FIXME testing.Benchmark does not allow to detect whether benchmark failed.
+	// (use log.Fatal, not {t,b}.Fatal as workaround)
+	r := testing.Benchmark(func (b *testing.B) {
+		benchf(b, bencharg)
+	})
+
+	fmt.Printf("Benchmark%s/go/%s %d\t%.3f µs/op\n", benchname, prettyarg(bencharg), r.N, float64(r.T) / float64(r.N) / float64(time.Microsecond))
+
+}
+
+
+func benchHash(b *testing.B, h hash.Hash, arg string) {
+	blksize, err := strconv.Atoi(arg)
+	if err != nil {
+		log.Fatal(err)
+	}
+
+	data := make([]byte, blksize)
+	b.ResetTimer()
+
+	for i := 0; i < b.N; i++ {
+		h.Write(data)
+	}
+}
+
+func BenchmarkAdler32(b *testing.B, arg string) { benchHash(b, adler32.New(), arg) }
+func BenchmarkCrc32(b *testing.B, arg string)   { benchHash(b, crc32.NewIEEE(), arg) }
+func BenchmarkSha1(b *testing.B, arg string)    { benchHash(b, sha1.New(), arg) }
+
+
+var benchv = map[string]func(*testing.B, string) {
+	"adler32":	BenchmarkAdler32,
+	"crc32":	BenchmarkCrc32,
+	"sha1":		BenchmarkSha1,
+}
+
+
+func main() {
+	flag.Parse()	// so that test.* flags could be processed
+	argv := flag.Args()
+	if len(argv) != 2 {
+		dieusage()
+	}
+	benchname := argv[0]
+	bencharg  := argv[1]
+
+	benchf, ok := benchv[benchname]
+	if !ok {
+		log.Fatalf("Unknown benchmark %q", benchname)
+	}
+
+	benchit(benchname, bencharg, benchf)
+}
--- a/go/neo/t/tcpu.py
+++ b/go/neo/t/tcpu.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright (C) 2017-2018  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""tcpu - cpu-related benchmarks"""
+
+from __future__ import print_function
+
+import sys
+import hashlib
+from zlib import crc32, adler32
+
+from golang import testing
+
+# adler32 in hashlib interface
+class Adler32Hasher:
+    name = "adler32"
+
+    def __init__(self):
+        self.h = adler32('')
+
+    def update(self, data):
+        self.h = adler32(data, self.h)
+
+    def hexdigest(self):
+        return '%08x' % (self.h & 0xffffffff)
+
+# crc32 in hashlib interface
+class CRC32Hasher:
+    name = "crc32"
+
+    def __init__(self):
+        self.h = crc32('')
+
+    def update(self, data):
+        self.h = crc32(data, self.h)
+
+    def hexdigest(self):
+        return '%08x' % (self.h & 0xffffffff)
+
+# fmtsize formats size in human readable form
+_unitv = "BKMGT" # (2^10)^i represents by corresponding char suffix
+def fmtsize(size):
+    order = 1<<10
+    norder = 0
+    while size and (size % order) == 0 and (norder + 1 < len(_unitv)):
+        size //= order
+        norder += 1
+
+    return "%d%s" % (size, _unitv[norder])
+
+def prettyarg(arg):
+    try:
+        arg = int(arg)
+    except ValueError:
+        return arg     # return as it is - e.g. "null-4K"
+    else:
+        return fmtsize(arg)
+
+
+# benchit benchmarks benchf(bencharg)
+def benchit(benchf, bencharg):
+    def _(b):
+        benchf(b, bencharg)
+    r = testing.benchmark(_)
+
+    benchname = benchf.__name__
+    if benchname.startswith('bench_'):
+        benchname = benchname[len('bench_'):]
+
+    print('Benchmark%s/py/%s %d\t%.3f µs/op' %
+                (benchname, prettyarg(bencharg), r.N, r.T * 1E6 / r.N))
+
+
+def _bench_hasher(b, h, blksize):
+    blksize = int(blksize)
+    data = '\0'*blksize
+
+    b.reset_timer()
+
+    n = b.N
+    i = 0
+    while i < n:
+        h.update(data)
+        i += 1
+
+
+def bench_adler32(b, blksize):  _bench_hasher(b, Adler32Hasher(), blksize)
+def bench_crc32(b, blksize):    _bench_hasher(b, CRC32Hasher(), blksize)
+def bench_sha1(b, blksize):     _bench_hasher(b, hashlib.sha1(), blksize)
+
+
+def main():
+    bench    = sys.argv[1]
+    bencharg = sys.argv[2]
+
+    benchf = globals()['bench_%s' % bench]
+    benchit(benchf, bencharg)
+
+if __name__ == '__main__':
+    main()