1. 25 Oct, 2022 2 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nft_inner: support for inner tunnel header matching · 3a07327d
      Pablo Neira Ayuso authored
      
      This new expression allows you to match on the inner headers that are
      encapsulated by any of the existing tunneling protocols.
      
      This expression parses the inner packet to set the link, network and
      transport offsets, so the existing expressions (with a few updates) can
      be reused to match on the inner headers.
      
      The inner expression supports for different tunnel combinations such as:
      
      - ethernet frame over IPv4/IPv6 packet, eg. VxLAN.
      - IPv4/IPv6 packet over IPv4/IPv6 packet, eg. IPIP.
      - IPv4/IPv6 packet over IPv4/IPv6 + transport header, eg. GRE.
      - transport header (ESP or SCTP) over transport header (usually UDP)
      
      The following fields are used to describe the tunnel protocol:
      
      - flags, which describe how to parse the inner headers:
      
        NFT_PAYLOAD_CTX_INNER_TUN, the tunnel provides its own header.
        NFT_PAYLOAD_CTX_INNER_ETHER, the ethernet frame is available as inner header.
        NFT_PAYLOAD_CTX_INNER_NH, the network header is available as inner header.
        NFT_PAYLOAD_CTX_INNER_TH, the transport header is available as inner header.
      
      For example, VxLAN sets on all of these flags. While GRE only sets on
      NFT_PAYLOAD_CTX_INNER_NH and NFT_PAYLOAD_CTX_INNER_TH. Then, ESP over
      UDP only sets on NFT_PAYLOAD_CTX_INNER_TH.
      
      The tunnel description is composed of the following attributes:
      
      - header size: in case the tunnel comes with its own header, eg. VxLAN.
      
      - type: this provides a hint to userspace on how to delinearize the rule.
        This is useful for VxLAN and Geneve since they run over UDP, since
        transport does not provide a hint. This is also useful in case hardware
        offload is ever supported. The type is not currently interpreted by the
        kernel.
      
      - expression: currently only payload supported. Follow up patch adds
        also inner meta support which is required by autogenerated
        dependencies. The exthdr expression should be supported too
        at some point. There is a new inner_ops operation that needs to be
        set on to allow to use an existing expression from the inner expression.
      
      This patch adds a new NFT_PAYLOAD_TUN_HEADER base which allows to match
      on the tunnel header fields, eg. vxlan vni.
      
      The payload expression is embedded into nft_inner private area and this
      private data area is passed to the payload inner eval function via
      direct call.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3a07327d
    • Florian Westphal's avatar
      netfilter: nft_objref: make it builtin · d037abc2
      Florian Westphal authored
      
      nft_objref is needed to reference named objects, it makes
      no sense to disable it.
      
      Before:
         text	   data	    bss	    dec	 filename
        4014	    424	      0	   4438	 nft_objref.o
        4174	   1128	      0	   5302	 nft_objref.ko
      359351	  15276	    864	 375491	 nf_tables.ko
      After:
        text	   data	    bss	    dec	 filename
        3815	    408	      0	   4223	 nft_objref.o
      363161	  15692	    864	 379717	 nf_tables.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d037abc2
  2. 03 Oct, 2022 1 commit
  3. 11 Jul, 2022 1 commit
    • Vlad Buslov's avatar
      netfilter: nf_flow_table: count pending offload workqueue tasks · b0381776
      Vlad Buslov authored
      
      To improve hardware offload debuggability count pending 'add', 'del' and
      'stats' flow_table offload workqueue tasks. Counters are incremented before
      scheduling new task and decremented when workqueue handler finishes
      executing. These counters allow user to diagnose congestion on hardware
      offload workqueues that can happen when either CPU is starved and workqueue
      jobs are executed at lower rate than new ones are added or when
      hardware/driver can't keep up with the rate.
      
      Implement the described counters as percpu counters inside new struct
      netns_ft which is stored inside struct net. Expose them via new procfs file
      '/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack'
      file.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b0381776
  4. 18 Jan, 2022 1 commit
  5. 23 Dec, 2021 1 commit
  6. 29 Aug, 2021 1 commit
  7. 17 Jun, 2021 1 commit
  8. 07 Jun, 2021 1 commit
    • Florian Westphal's avatar
      netfilter: add new hook nfnl subsystem · e2cf17d3
      Florian Westphal authored
      
      This nfnl subsystem allows to dump the list of all active netfiler hooks,
      e.g. defrag, conntrack, nf/ip/arp/ip6tables and so on.
      
      This helps to see what kind of features are currently enabled in
      the network stack.
      
      Sample output from nft tool using this infra:
      
       $ nft list hook ip input
       family ip hook input {
         +0000000010 nft_do_chain_inet [nf_tables] # nft table firewalld INPUT
         +0000000100 nf_nat_ipv4_local_in [nf_nat]
         +2147483647 ipv4_confirm [nf_conntrack]
       }
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e2cf17d3
  9. 31 Mar, 2021 1 commit
  10. 30 Mar, 2021 2 commits
  11. 31 Oct, 2020 1 commit
  12. 08 Apr, 2020 2 commits
  13. 15 Mar, 2020 2 commits
    • Stefano Brivio's avatar
      nft_set_pipapo: Introduce AVX2-based lookup implementation · 7400b063
      Stefano Brivio authored
      
      If the AVX2 set is available, we can exploit the repetitive
      characteristic of this algorithm to provide a fast, vectorised
      version by using 256-bit wide AVX2 operations for bucket loads and
      bitwise intersections.
      
      In most cases, this implementation consistently outperforms rbtree
      set instances despite the fact they are configured to use a given,
      single, ranged data type out of the ones used for performance
      measurements by the nft_concat_range.sh kselftest.
      
      That script, injecting packets directly on the ingoing device path
      with pktgen, reports, averaged over five runs on a single AMD Epyc
      7402 thread (3.35GHz, 768 KiB L1D$, 12 MiB L2$), the figures below.
      CONFIG_RETPOLINE was not set here.
      
      Note that this is not a fair comparison over hash and rbtree set
      types: non-ranged entries (used to have a reference for hash types)
      would be matched faster than this, and matching on a single field
      only (which is the case for rbtree) is also significantly faster.
      
      However, it's not possible at the moment to choose this set type
      for non-ranged entries, and the current implementation also needs
      a few minor adjustments in order to match on less than two fields.
      
       ---------------.-----------------------------------.------------.
       AMD Epyc 7402  |          baselines, Mpps          | this patch |
        1 thread      |___________________________________|____________|
        3.35GHz       |        |        |        |        |            |
        768KiB L1D$   | netdev |  hash  | rbtree |        |            |
       ---------------|  hook  |   no   | single |        |   pipapo   |
       type   entries |  drop  | ranges | field  | pipapo |    AVX2    |
       ---------------|--------|--------|--------|--------|------------|
       net,port       |        |        |        |        |            |
                1000  |   19.0 |   10.4 |    3.8 |    4.0 | 7.5   +87% |
       ---------------|--------|--------|--------|--------|------------|
       port,net       |        |        |        |        |            |
                 100  |   18.8 |   10.3 |    5.8 |    6.3 | 8.1   +29% |
       ---------------|--------|--------|--------|--------|------------|
       net6,port      |        |        |        |        |            |
                1000  |   16.4 |    7.6 |    1.8 |    2.1 | 4.8  +128% |
       ---------------|--------|--------|--------|--------|------------|
       port,proto     |        |        |        |        |            |
               30000  |   19.6 |   11.6 |    3.9 |    0.5 | 2.6  +420% |
       ---------------|--------|--------|--------|--------|------------|
       net6,port,mac  |        |        |        |        |            |
                  10  |   16.5 |    5.4 |    4.3 |    3.4 | 4.7   +38% |
       ---------------|--------|--------|--------|--------|------------|
       net6,port,mac, |        |        |        |        |            |
       proto    1000  |   16.5 |    5.7 |    1.9 |    1.4 | 3.6   +26% |
       ---------------|--------|--------|--------|--------|------------|
       net,mac        |        |        |        |        |            |
                1000  |   19.0 |    8.4 |    3.9 |    2.5 | 6.4  +156% |
       ---------------'--------'--------'--------'--------'------------'
      
      A similar strategy could be easily reused to implement specialised
      versions for other SIMD sets, and I plan to post at least a NEON
      version at a later time.
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7400b063
    • Florian Westphal's avatar
      netfilter: nf_tables: make sets built-in · e32a4dc6
      Florian Westphal authored
      
      Placing nftables set support in an extra module is pointless:
      
      1. nf_tables needs dynamic registeration interface for sake of one module
      2. nft heavily relies on sets, e.g. even simple rule like
         "nft ... tcp dport { 80, 443 }" will not work with _SETS=n.
      
      IOW, either nftables isn't used or both nf_tables and nf_tables_set
      modules are needed anyway.
      
      With extra module:
       307K net/netfilter/nf_tables.ko
        79K net/netfilter/nf_tables_set.ko
      
         text  data  bss     dec filename
       146416  3072  545  150033 nf_tables.ko
        35496  1817    0   37313 nf_tables_set.ko
      
      This patch:
       373K net/netfilter/nf_tables.ko
      
       178563  4049  545  183157 nf_tables.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e32a4dc6
  14. 27 Jan, 2020 1 commit
    • Stefano Brivio's avatar
      nf_tables: Add set type for arbitrary concatenation of ranges · 3c4287f6
      Stefano Brivio authored
      This new set type allows for intervals in concatenated fields,
      which are expressed in the usual way, that is, simple byte
      concatenation with padding to 32 bits for single fields, and
      given as ranges by specifying start and end elements containing,
      each, the full concatenation of start and end values for the
      single fields.
      
      Ranges are expanded to composing netmasks, for each field: these
      are inserted as rules in per-field lookup tables. Bits to be
      classified are divided in 4-bit groups, and for each group, the
      lookup table contains 4^2 buckets, representing all the possible
      values of a bit group. This approach was inspired by the Grouper
      algorithm:
      	http://www.cse.usf.edu/~ligatti/projects/grouper/
      
      Matching is performed by a sequence of AND operations between
      bucket values, with buckets selected according to the value of
      packet bits, for each group. The result of this sequence tells
      us which rules matched for a given field.
      
      In order to concate...
      3c4287f6
  15. 13 Nov, 2019 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: nf_flow_table: hardware offload support · c29f74e0
      Pablo Neira Ayuso authored
      
      This patch adds the dataplane hardware offload to the flowtable
      infrastructure. Three new flags represent the hardware state of this
      flow:
      
      * FLOW_OFFLOAD_HW: This flow entry resides in the hardware.
      * FLOW_OFFLOAD_HW_DYING: This flow entry has been scheduled to be remove
        from hardware. This might be triggered by either packet path (via TCP
        RST/FIN packet) or via aging.
      * FLOW_OFFLOAD_HW_DEAD: This flow entry has been already removed from
        the hardware, the software garbage collector can remove it from the
        software flowtable.
      
      This patch supports for:
      
      * IPv4 only.
      * Aging via FLOW_CLS_STATS, no packet and byte counter synchronization
        at this stage.
      
      This patch also adds the action callback that specifies how to convert
      the flow entry into the flow_rule object that is passed to the driver.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c29f74e0
  16. 13 Sep, 2019 1 commit
    • Jeremy Sowden's avatar
      netfilter: fix coding-style errors. · b0edba2a
      Jeremy Sowden authored
      
      Several header-files, Kconfig files and Makefiles have trailing
      white-space.  Remove it.
      
      In netfilter/Kconfig, indent the type of CONFIG_NETFILTER_NETLINK_ACCT
      correctly.
      
      There are semicolons at the end of two function definitions in
      include/net/netfilter/nf_conntrack_acct.h and
      include/net/netfilter/nf_conntrack_ecache.h. Remove them.
      
      Fix indentation in nf_conntrack_l4proto.h.
      Signed-off-by: default avatarJeremy Sowden <jeremy@azazel.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b0edba2a
  17. 09 Jul, 2019 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add hardware offload support · c9626a2c
      Pablo Neira Ayuso authored
      
      This patch adds hardware offload support for nftables through the
      existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER
      classifier and the flow rule API. This hardware offload support is
      available for the NFPROTO_NETDEV family and the ingress hook.
      
      Each nftables expression has a new ->offload interface, that is used to
      populate the flow rule object that is attached to the transaction
      object.
      
      There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload
      an entire table, including all of its chains.
      
      This patch supports for basic metadata (layer 3 and 4 protocol numbers),
      5-tuple payload matching and the accept/drop actions; this also includes
      basechain hardware offload only.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9626a2c
  18. 05 Jul, 2019 1 commit
  19. 11 Apr, 2019 1 commit
  20. 08 Apr, 2019 1 commit
    • Florian Westphal's avatar
      netfilter: nf_tables: merge route type into core · c1deb065
      Florian Westphal authored
      
      very little code, so it really doesn't make sense to have extra
      modules or even a kconfig knob for this.
      
      Merge them and make functionality available unconditionally.
      The merge makes inet family route support trivial, so add it
      as well here.
      
      Before:
         text	   data	    bss	    dec	    hex	filename
          835	    832	      0	   1667	    683 nft_chain_route_ipv4.ko
          870	    832	      0	   1702	    6a6	nft_chain_route_ipv6.ko
       111568	   2556	    529	 114653	  1bfdd	nf_tables.ko
      
      After:
         text	   data	    bss	    dec	    hex	filename
       113133	   2556	    529	 116218	  1c5fa	nf_tables.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c1deb065
  21. 01 Mar, 2019 1 commit
    • Florian Westphal's avatar
      netfilter: nf_tables: merge ipv4 and ipv6 nat chain types · db8ab388
      Florian Westphal authored
      
      Merge the ipv4 and ipv6 nat chain type. This is the last
      missing piece which allows to provide inet family support
      for nat in a follow patch.
      
      The kconfig knobs for ipv4/ipv6 nat chain are removed, the
      nat chain type will be built unconditionally if NFT_NAT
      expression is enabled.
      
      Before:
         text	   data	    bss	    dec	    hex	filename
         1576     896       0    2472     9a8 nft_chain_nat_ipv4.ko
         1697     896       0    2593     a21 nft_chain_nat_ipv6.ko
      
      After:
         text	   data	    bss	    dec	    hex	filename
         1832     896       0    2728     aa8 nft_chain_nat.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      db8ab388
  22. 27 Feb, 2019 1 commit
  23. 18 Jan, 2019 1 commit
    • Florian Westphal's avatar
      netfilter: conntrack: gre: switch module to be built-in · 22fc4c4c
      Florian Westphal authored
      
      This makes the last of the modular l4 trackers 'bool'.
      
      After this, all infrastructure to handle dynamic l4 protocol registration
      becomes obsolete and can be removed in followup patches.
      
      Old:
      302824 net/netfilter/nf_conntrack.ko
       21504 net/netfilter/nf_conntrack_proto_gre.ko
      
      New:
      313728 net/netfilter/nf_conntrack.ko
      
      Old:
         text	   data	    bss	    dec	    hex	filename
         6281	   1732	      4	   8017	   1f51	nf_conntrack_proto_gre.ko
       108356	  20613	    236	 129205	  1f8b5	nf_conntrack.ko
      New:
       112095	  21381	    240	 133716	  20a54	nf_conntrack.ko
      
      The size increase is only temporary.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      22fc4c4c
  24. 17 Dec, 2018 3 commits
    • Florian Westphal's avatar
      netfilter: nat: remove nf_nat_l4proto struct · 5cbabeec
      Florian Westphal authored
      
      This removes the (now empty) nf_nat_l4proto struct, all its instances
      and all the no longer needed runtime (un)register functionality.
      
      nf_nat_need_gre() can be axed as well: the module that calls it (to
      load the no-longer-existing nat_gre module) also calls other nat core
      functions. GRE nat is now always available if kernel is built with it.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5cbabeec
    • Florian Westphal's avatar
      netfilter: nat: remove l4proto->manip_pkt · faec18db
      Florian Westphal authored
      
      This removes the last l4proto indirection, the two callers, the l3proto
      packet mangling helpers for ipv4 and ipv6, now call the
      nf_nat_l4proto_manip_pkt() helper.
      
      nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though
      they contain no functionality anymore to not clutter this patch.
      
      Next patch will remove the empty files and the nf_nat_l4proto
      struct.
      
      nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the
      other nat manip functionality as well, not just udp and udplite.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      faec18db
    • Florian Westphal's avatar
      netfilter: nat: remove l4proto->nlattr_to_range · 76b90019
      Florian Westphal authored
      
      all protocols did set this to nf_nat_l4proto_nlattr_to_range, so
      just call it directly.
      
      The important difference is that we'll now also call it for
      protocols that we don't support (i.e., nf_nat_proto_unknown did
      not provide .nlattr_to_range).
      
      However, there should be no harm, even icmp provided this callback.
      If we don't implement a specific l4nat for this, nothing would make
      use of this information, so adding a big switch/case construct listing
      all supported l4protocols seems a bit pointless.
      
      This change leaves a single function pointer in the l4proto struct.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      76b90019
  25. 17 Sep, 2018 1 commit
  26. 03 Aug, 2018 1 commit
  27. 30 Jul, 2018 3 commits
  28. 17 Jul, 2018 1 commit
    • Florian Westphal's avatar
      netfilter: conntrack: remove l3proto abstraction · a0ae2562
      Florian Westphal authored
      
      This unifies ipv4 and ipv6 protocol trackers and removes the l3proto
      abstraction.
      
      This gets rid of all l3proto indirect calls and the need to do
      a lookup on the function to call for l3 demux.
      
      It increases module size by only a small amount (12kbyte), so this reduces
      size because nf_conntrack.ko is useless without either nf_conntrack_ipv4
      or nf_conntrack_ipv6 module.
      
      before:
         text    data     bss     dec     hex filename
         7357    1088       0    8445    20fd nf_conntrack_ipv4.ko
         7405    1084       4    8493    212d nf_conntrack_ipv6.ko
        72614   13689     236   86539   1520b nf_conntrack.ko
       19K nf_conntrack_ipv4.ko
       19K nf_conntrack_ipv6.ko
      179K nf_conntrack.ko
      
      after:
         text    data     bss     dec     hex filename
        79277   13937     236   93450   16d0a nf_conntrack.ko
        191K nf_conntrack.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a0ae2562
  29. 16 Jul, 2018 1 commit
  30. 06 Jul, 2018 1 commit
  31. 02 Jun, 2018 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add connlimit support · 290180e2
      Pablo Neira Ayuso authored
      
      This features which allows you to limit the maximum number of
      connections per arbitrary key. The connlimit expression is stateful,
      therefore it can be used from meters to dynamically populate a set, this
      provides a mapping to the iptables' connlimit match. This patch also
      comes that allows you define static connlimit policies.
      
      This extension depends on the nf_conncount infrastructure.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      290180e2
  32. 01 Jun, 2018 1 commit