-
Kirill Smelkov authored
Add to neotest bench-net command that performs latency measurments at ping and TCP levels. Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-net neotest@rio:9 node: cluster: deco-rio *** link latency: # deco ⇄ rio (ping 16B) PING rio (192.168.0.8) 16(44) bytes of data. --- rio ping statistics --- 25705 packets transmitted, 25705 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.080/0.097/0.220/0.011 ms, ipg/ewma 0.116/0.095 ms Benchmarkpingrtt-/16B-min 1 0.080 ms/op Benchmarkpingrtt-/16B-avg 1 0.097 ms/op # POLL·3 C1·476 C1E·60917 C3·53 C6·132 C7s·0 C8·203 C9·0 C10·141 ... *** TCP latency: # deco ⇄ rio (lat_tcp.c 1B -> lat_tcp.c -s) Benchmarktcprtt(c_c)-/1B 1 116.1743 µs/op # TCP latency using rio: 116.1743 microseconds # POLL·6 C1·892 C1E·65748 C3·80 C6·165 C7s·0 C8·339 C9·0 C10·444 Benchmarktcprtt(c_c)-/1B 1 117.2896 µs/op # TCP latency using rio: 117.2896 microseconds # POLL·4 C1·1063 C1E·67647 C3·64 C6·77 C7s·0 C8·144 C9·0 C10·209 Benchmarktcprtt(c_c)-/1B 1 117.5331 µs/op # TCP latency using rio: 117.5331 microseconds # POLL·1 C1·954 C1E·76866 C3·96 C6·88 C7s·0 C8·206 C9·0 C10·246 Benchmarktcprtt(c_c)-/1B 1 117.6509 µs/op # TCP latency using rio: 117.6509 microseconds # POLL·4 C1·731 C1E·84210 C3·103 C6·93 C7s·0 C8·180 C9·0 C10·187 Benchmarktcprtt(c_c)-/1B 1 116.8125 µs/op # TCP latency using rio: 116.8125 microseconds # POLL·9 C1·550 C1E·79544 C3·110 C6·213 C7s·0 C8·508 C9·0 C10·475 ... And its summary via benchstat: name time/op pingrtt-/16B-min 80.0µs ± 0% pingrtt-/16B-avg 97.0µs ± 0% -pingrtt/16B-min 79.0µs ± 0% -pingrtt/16B-avg 112µs ± 0% pingrtt-/1452B-min 241µs ± 0% pingrtt-/1452B-avg 303µs ± 0% -pingrtt/1452B-min 266µs ± 0% -pingrtt/1452B-avg 303µs ± 0% tcprtt(c_c)-/1B 117µs ± 1% tcprtt(c_go)-/1B 122µs ± 2% -tcprtt(c_c)/1B 117µs ± 1% -tcprtt(c_go)/1B 121µs ± 5% tcprtt(c_c)-/1400B 392µs ± 4% tcprtt(c_go)-/1400B 363µs ±18% -tcprtt(c_c)/1400B 412µs ±21% -tcprtt(c_go)/1400B 391µs ±38% tcprtt(c_c)-/1500B 271µs ±18% tcprtt(c_go)-/1500B 290µs ±21% -tcprtt(c_c)/1500B 282µs ±16% -tcprtt(c_go)/1500B 334µs ±24% tcprtt(c_c)-/4096B 711µs ± 5% tcprtt(c_go)-/4096B 737µs ± 5% -tcprtt(c_c)/4096B 740µs ± 2% -tcprtt(c_go)/4096B 711µs ± 7% Latencies here are not good because for this run on rio interrupt mitigation was not tuned (see below). By the way, analyzing ping RTT latencies on our shuttle machines (similar to rio) resulted in the following kernel patch https://git.kernel.org/linus/509708310c (released with Linux 4.15) to fix/being able to adjust interrupt mitigation on Realtek NICs. While at networking topic, teach info/info-local to show related information about node's NICs. Example lines output for deco: nic/eth0: Intel Corporation Ethernet Connection I219-LM rev 21 nic/eth0/features: rx tx sg tso !ufo gso gro !lro rxvlan txvlan !ntuple rxhash ... nic/eth0/coalesce: rxc: 3μs/0f/0μs-irq/0f-irq, txc: 0μs/0f/0μs-irq/0f-irq nic/eth0/status: up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs nic/wlan0: Intel Corporation Wireless 8260 rev 3a nic/wlan0/features: !rx !tx sg !tso !ufo gso gro !lro !rxvlan !txvlan !ntuple !rxhash ... nic/wlan0/coalesce: rxc: ?, txc: ? nic/wlan0/status: down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/wlan0: TSO not enabled - TCP latency with packets > MSS will be poor for rio: nic/eth0: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06 nic/eth0/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ... nic/eth0/coalesce: rxc: 200μs/4f/0μs-irq/0f-irq, txc: 200μs/4f/0μs-irq/0f-irq nic/eth0/status: up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/eth0: TSO not enabled - TCP latency with packets > MSS will be poor WARNING: nic/eth0: RX coalesce latency is max 200μs - that will add to networked request-reply latency nic/eth1: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06 nic/eth1/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ... nic/eth1/coalesce: rxc: 0μs/1f/0μs-irq/0f-irq, txc: 0μs/1f/0μs-irq/0f-irq nic/eth1/status: down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/eth1: TSO not enabled - TCP latency with packets > MSS will be poor The warning about "RX coalesce latency is max 200μs ..." says that on receive path eth0 will be coalescing incoming frames for up to 200μs and this way this delay will be added to overal latency. (for small frames Realtek NICs do not coalesce interrupts - see details in the kernel patch). Networked performance (raw and NEO) was not discussed in http://navytux.spb.ru/~kirr/neo.html at all, but for the reference the importance of C-states for performance was first found via this networking latency benchmarks. Links on C-states topic: http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/e8e395ae X neotest: Move network benchmarking into separate function + add `neotest bench-net` lab.nexedi.com/kirr/neo/commit/a971231c X neotest/info: Handle USB NICs lab.nexedi.com/kirr/neo/commit/5dd3d1ab X neotest: sort NIC names lab.nexedi.com/kirr/neo/commit/9888f047 X neotest: Do not crash if kernel is too old to support gro_flush_timeout lab.nexedi.com/kirr/neo/commit/3a1bdf4a X bench-remote / tcp : std benchmark output lab.nexedi.com/kirr/neo/commit/9450b6db X bench-remote / ping += std bench output lab.nexedi.com/kirr/neo/commit/68d5b015 X show gro_flush_timeout + friends lab.nexedi.com/kirr/neo/commit/4c815af9 X neotest: Show NIC features and emit warning if !TSO lab.nexedi.com/kirr/neo/commit/659ce938 X neotest: Adjust ping and TCP RR sizes to fit 1 Ethernet frame, etc... lab.nexedi.com/kirr/neo/commit/ded384cb X neotest += `lat_tcp.go -s` lab.nexedi.com/kirr/neo/commit/59d46504 X neotest += lat_tcp lab.nexedi.com/kirr/neo/commit/67fc3440 X show small (56B) and full-packet (1472B) ping link latencies
26006d7e