1. 16 May, 2022 19 commits
    • Eric Dumazet's avatar
      mlx4: support BIG TCP packets · 1169a642
      Eric Dumazet authored
      mlx4 supports LSOv2 just fine.
      
      IPv6 stack inserts a temporary Hop-by-Hop header
      with JUMBO TLV for big packets.
      
      We need to ignore the HBH header when populating TX descriptor.
      
      Tested:
      
      Before: (not enabling bigger TSO/GRO packets)
      
      ip link set dev eth0 gso_max_size 65536 gro_max_size 65536
      
      netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
      MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
      Local /Remote
      Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
      Send   Recv   Size    Size   Time    Rate     local  remote local   remote
      bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr
      
      262144 540000 70000   70000  10.00   6591.45  0.86   1.34   62.490  97.446
      262144 540000
      
      After: (enabling bigger TSO/GRO packets)
      
      ip link set dev eth0 gso_max_size 185000 gro_max_size 185000
      
      netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
      MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
      Local /Remote
      Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
      Send   Recv   Size    Size   Time    Rate     local  remote local   remote
      bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr
      
      262144 540000 70000   70000  10.00   8383.95  0.95   1.01   54.432  57.584
      262144 540000
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1169a642
    • Eric Dumazet's avatar
      veth: enable BIG TCP packets · d406099d
      Eric Dumazet authored
      Set the TSO driver limit to GSO_MAX_SIZE (512 KB).
      
      This allows the admin/user to set a GSO limit up to this value.
      
      ip link set dev veth10 gso_max_size 200000
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d406099d
    • Eric Dumazet's avatar
      net: loopback: enable BIG TCP packets · d6f938ce
      Eric Dumazet authored
      Set the driver limit to GSO_MAX_SIZE (512 KB).
      
      This allows the admin/user to set a GSO limit up to this value.
      
      Tested:
      
      ip link set dev lo gso_max_size 200000
      netperf -H ::1 -t TCP_RR -l 100 -- -r 80000,80000 &
      
      tcpdump shows :
      
      18:28:42.962116 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626480001:3626560001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
      18:28:42.962138 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
      18:28:42.962152 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626560001:3626640001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
      18:28:42.962157 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
      18:28:42.962180 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626560001:3626640001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
      18:28:42.962214 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 0
      18:28:42.962228 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626640001:3626720001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 80000
      18:28:42.962233 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626720001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179266], length 0
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6f938ce
    • Coco Li's avatar
      ipv6: Add hop-by-hop header to jumbograms in ip6_output · 80e425b6
      Coco Li authored
      Instead of simply forcing a 0 payload_len in IPv6 header,
      implement RFC 2675 and insert a custom extension header.
      
      Note that only TCP stack is currently potentially generating
      jumbograms, and that this extension header is purely local,
      it wont be sent on a physical link.
      
      This is needed so that packet capture (tcpdump and friends)
      can properly dissect these large packets.
      Signed-off-by: default avatarCoco Li <lixiaoyan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80e425b6
    • Alexander Duyck's avatar
      net: allow gro_max_size to exceed 65536 · 0fe79f28
      Alexander Duyck authored
      Allow the gro_max_size to exceed a value larger than 65536.
      
      There weren't really any external limitations that prevented this other
      than the fact that IPv4 only supports a 16 bit length field. Since we have
      the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to
      exceed this value and for IPv4 and non-TCP flows we can cap things at 65536
      via a constant rather than relying on gro_max_size.
      
      [edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows.
      Signed-off-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fe79f28
    • Eric Dumazet's avatar
      ipv6/gro: insert temporary HBH/jumbo header · 81fbc812
      Eric Dumazet authored
      Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build
      BIG TCP ipv6 packets (bigger than 64K).
      
      This patch changes ipv6_gro_complete() to insert a HBH/jumbo header
      so that resulting packet can go through IPv6/TCP stacks.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81fbc812
    • Eric Dumazet's avatar
      ipv6/gso: remove temporary HBH/jumbo header · 09f3d1a3
      Eric Dumazet authored
      ipv6 tcp and gro stacks will soon be able to build big TCP packets,
      with an added temporary Hop By Hop header.
      
      If GSO is involved for these large packets, we need to remove
      the temporary HBH header before segmentation happens.
      
      v2: perform HBH removal from ipv6_gso_segment() instead of
          skb_segment() (Alexander feedback)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09f3d1a3
    • Eric Dumazet's avatar
      ipv6: add struct hop_jumbo_hdr definition · 7c96d8ec
      Eric Dumazet authored
      Following patches will need to add and remove local IPv6 jumbogram
      options to enable BIG TCP.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c96d8ec
    • Eric Dumazet's avatar
      tcp_cubic: make hystart_ack_delay() aware of BIG TCP · 9957b38b
      Eric Dumazet authored
      hystart_ack_delay() had the assumption that a TSO packet
      would not be bigger than GSO_MAX_SIZE.
      
      This will no longer be true.
      
      We should use sk->sk_gso_max_size instead.
      
      This reduces chances of spurious Hystart ACK train detections.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9957b38b
    • Eric Dumazet's avatar
      net: limit GSO_MAX_SIZE to 524280 bytes · 34b92e8d
      Eric Dumazet authored
      Make sure we will not overflow shinfo->gso_segs
      
      Minimal TCP MSS size is 8 bytes, and shinfo->gso_segs
      is a 16bit field.
      
      TCP_MIN_GSO_SIZE is currently defined in include/net/tcp.h,
      it seems cleaner to not bring tcp details into include/linux/netdevice.h
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34b92e8d
    • Alexander Duyck's avatar
      net: allow gso_max_size to exceed 65536 · 7c4e983c
      Alexander Duyck authored
      The code for gso_max_size was added originally to allow for debugging and
      workaround of buggy devices that couldn't support TSO with blocks 64K in
      size. The original reason for limiting it to 64K was because that was the
      existing limits of IPv4 and non-jumbogram IPv6 length fields.
      
      With the addition of Big TCP we can remove this limit and allow the value
      to potentially go up to UINT_MAX and instead be limited by the tso_max_size
      value.
      
      So in order to support this we need to go through and clean up the
      remaining users of the gso_max_size value so that the values will cap at
      64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value
      so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper
      limit for GSO_MAX_SIZE.
      
      v6: (edumazet) fixed a compile error if CONFIG_IPV6=n,
                     in a new sk_trim_gso_size() helper.
                     netif_set_tso_max_size() caps the requested TSO size
                     with GSO_MAX_SIZE.
      Signed-off-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c4e983c
    • Eric Dumazet's avatar
      net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes · 89527be8
      Eric Dumazet authored
      New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS
      are used to report to user-space the device TSO limits.
      
      ip -d link sh dev eth1
      ...
         tso_max_size 65536 tso_max_segs 65535
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89527be8
    • David S. Miller's avatar
      Merge branch 'Renesas-RSZ-V2M-support' · 5cf15ce3
      David S. Miller authored
      Phil Edworthy says:
      
      ====================
      Add Renesas RZ/V2M Ethernet support
      
      The RZ/V2M Ethernet is very similar to R-Car Gen3 Ethernet-AVB, though
      some small parts are the same as R-Car Gen2.
      Other differences are:
      * It has separate data (DI), error (Line 1) and management (Line 2) irqs
        rather than one irq for all three.
      * Instead of using the High-speed peripheral bus clock for gPTP, it has
        a separate gPTP reference clock.
      
      v4:
       * Add clk_disable_unprepare() for gptp ref clk
      
      v3:
       * Really renamed irq_en_dis_regs to irq_en_dis this time
       * Modified ravb_ptp_extts() to use irq_en_dis
       * Added Reviewed-by tags
      
      v2:
       * Just net patches in this series
       * Instead of reusing ch22 and ch24 interrupt names, use the proper names
       * Renamed irq_en_dis_regs to irq_en_dis
       * Squashed use of GIC reg versus GIE/GID and got rid of separate gptp_ptm_gic feature.
       * Move err_mgmt_irqs code under multi_irqs
       * Minor editing of the commit msgs
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cf15ce3
    • Phil Edworthy's avatar
      ravb: Add support for RZ/V2M · e1154be7
      Phil Edworthy authored
      RZ/V2M Ethernet is very similar to R-Car Gen3 Ethernet-AVB, though
      some small parts are the same as R-Car Gen2.
      Other differences to R-Car Gen3 and Gen2 are:
      * It has separate data (DI), error (Line 1) and management (Line 2) irqs
        rather than one irq for all three.
      * Instead of using the High-speed peripheral bus clock for gPTP, it has a
        separate gPTP reference clock.
      Signed-off-by: default avatarPhil Edworthy <phil.edworthy@renesas.com>
      Reviewed-by: default avatarBiju Das <biju.das.jz@bp.renesas.com>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1154be7
    • Phil Edworthy's avatar
      ravb: Use separate clock for gPTP · 72069a7b
      Phil Edworthy authored
      RZ/V2M has a separate gPTP reference clock that is used when the
      AVB-DMAC Mode Register (CCC) gPTP Clock Select (CSEL) bits are
      set to "01: High-speed peripheral bus clock".
      Therefore, add a feature that allows this clock to be used for
      gPTP.
      Signed-off-by: default avatarPhil Edworthy <phil.edworthy@renesas.com>
      Reviewed-by: default avatarBiju Das <biju.das.jz@bp.renesas.com>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72069a7b
    • Phil Edworthy's avatar
      ravb: Support separate Line0 (Desc), Line1 (Err) and Line2 (Mgmt) irqs · b0265dcb
      Phil Edworthy authored
      R-Car has a combined interrupt line, ch22 = Line0_DiA | Line1_A | Line2_A.
      RZ/V2M has separate interrupt lines for each of these, so add a feature
      that allows the driver to get these interrupts and call the common handler.
      Signed-off-by: default avatarPhil Edworthy <phil.edworthy@renesas.com>
      Reviewed-by: default avatarBiju Das <biju.das.jz@bp.renesas.com>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0265dcb
    • Phil Edworthy's avatar
      ravb: Separate handling of irq enable/disable regs into feature · cb99badd
      Phil Edworthy authored
      Currently, when the HW has a single interrupt, the driver uses the
      GIC, TIC, RIC0 registers to enable and disable interrupts.
      When the HW has multiple interrupts, it uses the GIE, GID, TIE, TID,
      RIE0, RID0 registers.
      
      However, other devices, e.g. RZ/V2M, have multiple irqs and only have
      the GIC, TIC, RIC0 registers.
      Therefore, split this into a separate feature.
      Signed-off-by: default avatarPhil Edworthy <phil.edworthy@renesas.com>
      Reviewed-by: default avatarBiju Das <biju.das.jz@bp.renesas.com>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb99badd
    • Phil Edworthy's avatar
      dt-bindings: net: renesas,etheravb: Document RZ/V2M SoC · a7931ac1
      Phil Edworthy authored
      Document the Ethernet AVB IP found on RZ/V2M SoC.
      It includes the Ethernet controller (E-MAC) and Dedicated Direct memory
      access controller (DMAC) for transferring transmitted Ethernet frames
      to and received Ethernet frames from respective storage areas in the
      RAM at high speed.
      The AVB-DMAC is compliant with IEEE 802.1BA, IEEE 802.1AS timing and
      synchronization protocol, IEEE 802.1Qav real-time transfer, and the
      IEEE 802.1Qat stream reservation protocol.
      
      R-Car has a pair of combined interrupt lines:
       ch22 = Line0_DiA | Line1_A | Line2_A
       ch23 = Line0_DiB | Line1_B | Line2_B
      Line0 for descriptor interrupts (which we call dia and dib).
      Line1 for error related interrupts (which we call err_a and err_b).
      Line2 for management and gPTP related interrupts (mgmt_a and mgmt_b).
      
      RZ/V2M hardware has separate interrupt lines for each of these.
      
      It has 3 clocks; the main AXI clock, the AMBA CHI (Coherent Hub
      Interface) clock and a gPTP reference clock.
      Signed-off-by: default avatarPhil Edworthy <phil.edworthy@renesas.com>
      Reviewed-by: default avatarBiju Das <biju.das.jz@bp.renesas.com>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7931ac1
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next · 1a01a075
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      This is v2 including deadlock fix in conntrack ecache rework
      reported by Jakub Kicinski.
      
      The following patchset contains Netfilter updates for net-next,
      mostly updates to conntrack from Florian Westphal.
      
      1) Add a dedicated list for conntrack event redelivery.
      
      2) Include event redelivery list in conntrack dumps of dying type.
      
      3) Remove per-cpu dying list for event redelivery, not used anymore.
      
      4) Add netns .pre_exit to cttimeout to zap timeout objects before
         synchronize_rcu() call.
      
      5) Remove nf_ct_unconfirmed_destroy.
      
      6) Add generation id for conntrack extensions for conntrack
         timeout and helpers.
      
      7) Detach timeout policy from conntrack on cttimeout module removal.
      
      8) Remove __nf_ct_unconfirmed_destroy.
      
      9) Remove unconfirmed list.
      
      10) Remove unconditional local_bh_disable in init_conntrack().
      
      11) Consolidate conntrack iterator nf_ct_iterate_cleanup().
      
      12) Detect if ctnetlink listeners exist to short-circuit event
          path early.
      
      13) Un-inline nf_ct_ecache_ext_add().
      
      14) Add nf_conntrack_events autodetect ctnetlink listener mode
          and make it default.
      
      15) Add nf_ct_ecache_exist() to check for event cache extension.
      
      16) Extend flowtable reverse route lookup to include source, iif,
          tos and mark, from Sven Auhagen.
      
      17) Do not verify zero checksum UDP packets in nf_reject,
          from Kevin Mitchell.
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a01a075
  2. 14 May, 2022 3 commits
  3. 13 May, 2022 18 commits