• Stefano Brivio's avatar
    tunnels: PMTU discovery support for directly bridged IP packets · 4cb47a86
    Stefano Brivio authored
    It's currently possible to bridge Ethernet tunnels carrying IP
    packets directly to external interfaces without assigning them
    addresses and routes on the bridged network itself: this is the case
    for UDP tunnels bridged with a standard bridge or by Open vSwitch.
    
    PMTU discovery is currently broken with those configurations, because
    the encapsulation effectively decreases the MTU of the link, and
    while we are able to account for this using PMTU discovery on the
    lower layer, we don't have a way to relay ICMP or ICMPv6 messages
    needed by the sender, because we don't have valid routes to it.
    
    On the other hand, as a tunnel endpoint, we can't fragment packets
    as a general approach: this is for instance clearly forbidden for
    VXLAN by RFC 7348, section 4.3:
    
       VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
       fragment encapsulated VXLAN packets due to the larger frame size.
       The destination VTEP MAY silently discard such VXLAN fragments.
    
    The same paragraph recommends that the MTU over the physical network
    accomodates for encapsulations, but this isn't a practical option for
    complex topologies, especially for typical Open vSwitch use cases.
    
    Further, it states that:
    
       Other techniques like Path MTU discovery (see [RFC1191] and
       [RFC1981]) MAY be used to address this requirement as well.
    
    Now, PMTU discovery already works for routed interfaces, we get
    route exceptions created by the encapsulation device as they receive
    ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
    we already rebuild those messages with the appropriate MTU and route
    them back to the sender.
    
    Add the missing bits for bridged cases:
    
    - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
      to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
      RFC 4443 section 2.4 for ICMPv6. This function is already called by
      UDP tunnels
    
    - a new function generating those ICMP or ICMPv6 replies. We can't
      reuse icmp_send() and icmp6_send() as we don't see the sender as a
      valid destination. This doesn't need to be generic, as we don't
      cover any other type of ICMP errors given that we only provide an
      encapsulation function to the sender
    
    While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
    we might receive GSO buffers here, and the passed headroom already
    includes the inner MAC length, so we don't have to account for it
    a second time (that would imply three MAC headers on the wire, but
    there are just two).
    
    This issue became visible while bridging IPv6 packets with 4500 bytes
    of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
    bytes of encapsulation headroom, we would advertise MTU as 3950, and
    we would reject fragmented IPv6 datagrams of 3958 bytes size on the
    wire. We're exclusively dealing with network MTU here, though, so we
    could get Ethernet frames up to 3964 octets in that case.
    
    v2:
    - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
    - split IPv4/IPv6 functions (David Ahern)
    Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
    Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    4cb47a86
bareudp.c 20.6 KB