net/mlx5e: IPsec: Add Connect-X IPsec Tx data path offload
In the TX data path, spot packets with xfrm stack IPsec offload indication. Fill Software-Parser segment in TX descriptor so that the hardware may parse the ESP protocol, and perform TX checksum offload on the inner payload. Support GSO, by providing the trailer data and ICV placeholder so HW can fill it post encryption operation. Padding alignment cannot be performed in HW (ConnectX-6Dx) due to a bug. Software can overcome this limitation by adding NETIF_F_HW_ESP to the gso_partial_features field in netdev so the packets being aligned by the stack. l4_inner_checksum cannot be offloaded by HW for IPsec tunnel type packet. Note that for GSO SKBs, the stack does not include an ESP trailer, unlike the non-GSO case. Below is the iperf3 performance report on two server of 24 cores Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz with ConnectX6-DX. All the bandwidth test uses iperf3 TCP traffic with packet size 128KB. Each tunnel uses one iperf3 stream with one thread (option -P1). TX crypto offload shows improvements on both bandwidth and CPU utilization. ---------------------------------------------------------------------- Mode | Num tunnel | BW | Send CPU util | Recv CPU util | | (Gbps) | (Average %) | (Average %) ---------------------------------------------------------------------- Cryto offload | | | | (RX only) | 1 | 4.7 | 4.2 | 3.5 ---------------------------------------------------------------------- Cryto offload | | | | (RX only) | 24 | 15.6 | 20 | 10 ---------------------------------------------------------------------- Non-offload | 1 | 4.6 | 4 | 5 ---------------------------------------------------------------------- Non-offload | 24 | 11.9 | 16 | 12 ---------------------------------------------------------------------- Cryto offload | | | | (TX & RX) | 1 | 11.9 | 2.1 | 5.9 ---------------------------------------------------------------------- Cryto offload | | | | (TX & RX) | 24 | 38 | 9.5 | 27.5 ---------------------------------------------------------------------- Cryto offload | | | | (TX only) | 1 | 4.7 | 0.7 | 5 ---------------------------------------------------------------------- Cryto offload | | | | (TX only) | 24 | 14.5 | 6 | 20 Regression tests show no degradation on non-ipsec and non-offload-ipsec traffics. The packet rate test uses pktgen UDP to transmit on single CPU, the instructions and cycles are measured on the transmit CPU. before: ---------------------------------------------------------------------- Non-offload | 1 | 4.7 | 4.2 | 5.1 ---------------------------------------------------------------------- Non-offload | 24 | 11.2 | 14 | 15 ---------------------------------------------------------------------- Non-ipsec | 1 | 28 | 4 | 5.7 ---------------------------------------------------------------------- Non-ipsec | 24 | 68.3 | 17.8 | 39.7 ---------------------------------------------------------------------- Non-ipsec packet rate(BURST=1000 BC=5 NCPUS=1 SIZE=60) 13.56Mpps, 456 instructions/pkt, 191 cycles/pkt after: ---------------------------------------------------------------------- Non-offload | 1 | 4.69 | 4.2 | 5 ---------------------------------------------------------------------- Non-offload | 24 | 11.9 | 13.5 | 15.1 ---------------------------------------------------------------------- Non-ipsec | 1 | 29 | 3.2 | 5.5 ---------------------------------------------------------------------- Non-ipsec | 24 | 68.2 | 18.5 | 39.8 ---------------------------------------------------------------------- Non-ipsec packet rate: 13.56Mpps, 472 instructions/pkt, 191 cycles/pkt Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Showing
Please register or sign in to comment