-
Hans Westgaard Ry authored
Handling comm_channel_event in mlx4_master_comm_channel uses a double loop to determine which slaves have requested work. The search is always started at lowest slave. This leads to unfairness; lower VFs tends to be prioritized over higher VFs. The patch uses find_next_bit to determine which slaves to handle. Fairness is implemented by always starting at the next to the last start. An MPI program has been used to measure improvements. It runs 500 ibv_reg_mr, synchronizes with all other instances and then runs 500 ibv_dereg_mr. The results running 500 processes, time reported is for running 500 calls: ibv_reg_mr: Mod. Org. mlx4_1 403.356ms 424.674ms mlx4_2 403.355ms 424.674ms mlx4_3 403.354ms 424.674ms mlx4_4 403.355ms 424.674ms mlx4_5 403.357ms 424.677ms mlx4_6 403.354ms 424.676ms mlx4_7 403.357ms 424.675ms mlx4_8 403.355ms 424.675ms ibv_dereg_mr: Mod. Org. mlx4_1 116.408ms 142.818ms mlx4_2 116.434ms 142.793ms mlx4_3 116.488ms 143.247ms mlx4_4 116.679ms 143.230ms mlx4_5 112.017ms 107.204ms mlx4_6 112.032ms 107.516ms mlx4_7 112.083ms 184.195ms mlx4_8 115.089ms 190.618ms Suggested-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
79ebfb11