• Jon Paul Maloy's avatar
    tipc: redesign connection-level flow control · 10724cc7
    Jon Paul Maloy authored
    There are two flow control mechanisms in TIPC; one at link level that
    handles network congestion, burst control, and retransmission, and one
    at connection level which' only remaining task is to prevent overflow
    in the receiving socket buffer. In TIPC, the latter task has to be
    solved end-to-end because messages can not be thrown away once they
    have been accepted and delivered upwards from the link layer, i.e, we
    can never permit the receive buffer to overflow.
    
    Currently, this algorithm is message based. A counter in the receiving
    socket keeps track of number of consumed messages, and sends a dedicated
    acknowledge message back to the sender for each 256 consumed message.
    A counter at the sending end keeps track of the sent, not yet
    acknowledged messages, and blocks the sender if this number ever reaches
    512 unacknowledged messages. When the missing acknowledge arrives, the
    socket is then woken up for renewed transmission. This works well for
    keeping the message flow running, as it almost never happens that a
    sender socket is blocked this way.
    
    A problem with the current mechanism is that it potentially is very
    memory consuming. Since we don't distinguish between small and large
    messages, we have to dimension the socket receive buffer according
    to a worst-case of both. I.e., the window size must be chosen large
    enough to sustain a reasonable throughput even for the smallest
    messages, while we must still consider a scenario where all messages
    are of maximum size. Hence, the current fix window size of 512 messages
    and a maximum message size of 66k results in a receive buffer of 66 MB
    when truesize(66k) = 131k is taken into account. It is possible to do
    much better.
    
    This commit introduces an algorithm where we instead use 1024-byte
    blocks as base unit. This unit, always rounded upwards from the
    actual message size, is used when we advertise windows as well as when
    we count and acknowledge transmitted data. The advertised window is
    based on the configured receive buffer size in such a way that even
    the worst-case truesize/msgsize ratio always is covered. Since the
    smallest possible message size (from a flow control viewpoint) now is
    1024 bytes, we can safely assume this ratio to be less than four, which
    is the value we are now using.
    
    This way, we have been able to reduce the default receive buffer size
    from 66 MB to 2 MB with maintained performance.
    
    In order to keep this solution backwards compatible, we introduce a
    new capability bit in the discovery protocol, and use this throughout
    the message sending/reception path to always select the right unit.
    Acked-by: default avatarYing Xue <ying.xue@windriver.com>
    Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    10724cc7
socket.c 73.2 KB