1. 08 Apr, 2020 2 commits
  2. 15 Mar, 2020 2 commits
    • Stefano Brivio's avatar
      nft_set_pipapo: Introduce AVX2-based lookup implementation · 7400b063
      Stefano Brivio authored
      
      If the AVX2 set is available, we can exploit the repetitive
      characteristic of this algorithm to provide a fast, vectorised
      version by using 256-bit wide AVX2 operations for bucket loads and
      bitwise intersections.
      
      In most cases, this implementation consistently outperforms rbtree
      set instances despite the fact they are configured to use a given,
      single, ranged data type out of the ones used for performance
      measurements by the nft_concat_range.sh kselftest.
      
      That script, injecting packets directly on the ingoing device path
      with pktgen, reports, averaged over five runs on a single AMD Epyc
      7402 thread (3.35GHz, 768 KiB L1D$, 12 MiB L2$), the figures below.
      CONFIG_RETPOLINE was not set here.
      
      Note that this is not a fair comparison over hash and rbtree set
      types: non-ranged entries (used to have a reference for hash types)
      would be matched faster than this, and matching on a single field
      only (which is the case for rbtree) is also significantly faster.
      
      However, it's not possible at the moment to choose this set type
      for non-ranged entries, and the current implementation also needs
      a few minor adjustments in order to match on less than two fields.
      
       ---------------.-----------------------------------.------------.
       AMD Epyc 7402  |          baselines, Mpps          | this patch |
        1 thread      |___________________________________|____________|
        3.35GHz       |        |        |        |        |            |
        768KiB L1D$   | netdev |  hash  | rbtree |        |            |
       ---------------|  hook  |   no   | single |        |   pipapo   |
       type   entries |  drop  | ranges | field  | pipapo |    AVX2    |
       ---------------|--------|--------|--------|--------|------------|
       net,port       |        |        |        |        |            |
                1000  |   19.0 |   10.4 |    3.8 |    4.0 | 7.5   +87% |
       ---------------|--------|--------|--------|--------|------------|
       port,net       |        |        |        |        |            |
                 100  |   18.8 |   10.3 |    5.8 |    6.3 | 8.1   +29% |
       ---------------|--------|--------|--------|--------|------------|
       net6,port      |        |        |        |        |            |
                1000  |   16.4 |    7.6 |    1.8 |    2.1 | 4.8  +128% |
       ---------------|--------|--------|--------|--------|------------|
       port,proto     |        |        |        |        |            |
               30000  |   19.6 |   11.6 |    3.9 |    0.5 | 2.6  +420% |
       ---------------|--------|--------|--------|--------|------------|
       net6,port,mac  |        |        |        |        |            |
                  10  |   16.5 |    5.4 |    4.3 |    3.4 | 4.7   +38% |
       ---------------|--------|--------|--------|--------|------------|
       net6,port,mac, |        |        |        |        |            |
       proto    1000  |   16.5 |    5.7 |    1.9 |    1.4 | 3.6   +26% |
       ---------------|--------|--------|--------|--------|------------|
       net,mac        |        |        |        |        |            |
                1000  |   19.0 |    8.4 |    3.9 |    2.5 | 6.4  +156% |
       ---------------'--------'--------'--------'--------'------------'
      
      A similar strategy could be easily reused to implement specialised
      versions for other SIMD sets, and I plan to post at least a NEON
      version at a later time.
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7400b063
    • Florian Westphal's avatar
      netfilter: nf_tables: make sets built-in · e32a4dc6
      Florian Westphal authored
      
      Placing nftables set support in an extra module is pointless:
      
      1. nf_tables needs dynamic registeration interface for sake of one module
      2. nft heavily relies on sets, e.g. even simple rule like
         "nft ... tcp dport { 80, 443 }" will not work with _SETS=n.
      
      IOW, either nftables isn't used or both nf_tables and nf_tables_set
      modules are needed anyway.
      
      With extra module:
       307K net/netfilter/nf_tables.ko
        79K net/netfilter/nf_tables_set.ko
      
         text  data  bss     dec filename
       146416  3072  545  150033 nf_tables.ko
        35496  1817    0   37313 nf_tables_set.ko
      
      This patch:
       373K net/netfilter/nf_tables.ko
      
       178563  4049  545  183157 nf_tables.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e32a4dc6
  3. 27 Jan, 2020 1 commit
    • Stefano Brivio's avatar
      nf_tables: Add set type for arbitrary concatenation of ranges · 3c4287f6
      Stefano Brivio authored
      This new set type allows for intervals in concatenated fields,
      which are expressed in the usual way, that is, simple byte
      concatenation with padding to 32 bits for single fields, and
      given as ranges by specifying start and end elements containing,
      each, the full concatenation of start and end values for the
      single fields.
      
      Ranges are expanded to composing netmasks, for each field: these
      are inserted as rules in per-field lookup tables. Bits to be
      classified are divided in 4-bit groups, and for each group, the
      lookup table contains 4^2 buckets, representing all the possible
      values of a bit group. This approach was inspired by the Grouper
      algorithm:
      	http://www.cse.usf.edu/~ligatti/projects/grouper/
      
      Matching is performed by a sequence of AND operations between
      bucket values, with buckets selected according to the value of
      packet bits, for each group. The result of this sequence tells
      us which rules matched for a given field.
      
      In order to concatenate several ranged fields, per-field rules
      are mapped using mapping arrays, one per field, that specify
      which rules should be considered while matching the next field.
      The mapping array for the last field contains a reference to
      the element originally inserted.
      
      The notes in nft_set_pipapo.c cover the algorithm in deeper
      detail.
      
      A pure hash-based approach is of no use here, as ranges need
      to be classified. An implementation based on "proxying" the
      existing red-black tree set type, creating a tree for each
      field, was considered, but deemed impractical due to the fact
      that elements would need to be shared between trees, at least
      as long as we want to keep UAPI changes to a minimum.
      
      A stand-alone implementation of this algorithm is available at:
      	https://pipapo.lameexcu.se
      
      
      together with notes about possible future optimisations
      (in pipapo.c).
      
      This algorithm was designed with data locality in mind, and can
      be highly optimised for SIMD instruction sets, as the bulk of
      the matching work is done with repetitive, simple bitwise
      operations.
      
      At this point, without further optimisations, nft_concat_range.sh
      reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB
      L2$):
      
      TEST: performance
        net,port                                                      [ OK ]
          baseline (drop from netdev hook):              10190076pps
          baseline hash (non-ranged entries):             6179564pps
          baseline rbtree (match on first field only):    2950341pps
          set with  1000 full, ranged entries:            2304165pps
        port,net                                                      [ OK ]
          baseline (drop from netdev hook):              10143615pps
          baseline hash (non-ranged entries):             6135776pps
          baseline rbtree (match on first field only):    4311934pps
          set with   100 full, ranged entries:            4131471pps
        net6,port                                                     [ OK ]
          baseline (drop from netdev hook):               9730404pps
          baseline hash (non-ranged entries):             4809557pps
          baseline rbtree (match on first field only):    1501699pps
          set with  1000 full, ranged entries:            1092557pps
        port,proto                                                    [ OK ]
          baseline (drop from netdev hook):              10812426pps
          baseline hash (non-ranged entries):             6929353pps
          baseline rbtree (match on first field only):    3027105pps
          set with 30000 full, ranged entries:             284147pps
        net6,port,mac                                                 [ OK ]
          baseline (drop from netdev hook):               9660114pps
          baseline hash (non-ranged entries):             3778877pps
          baseline rbtree (match on first field only):    3179379pps
          set with    10 full, ranged entries:            2082880pps
        net6,port,mac,proto                                           [ OK ]
          baseline (drop from netdev hook):               9718324pps
          baseline hash (non-ranged entries):             3799021pps
          baseline rbtree (match on first field only):    1506689pps
          set with  1000 full, ranged entries:             783810pps
        net,mac                                                       [ OK ]
          baseline (drop from netdev hook):              10190029pps
          baseline hash (non-ranged entries):             5172218pps
          baseline rbtree (match on first field only):    2946863pps
          set with  1000 full, ranged entries:            1279122pps
      
      v4:
       - fix build for 32-bit architectures: 64-bit division needs
         div_u64() (kbuild test robot <lkp@intel.com>)
      v3:
       - rework interface for field length specification,
         NFT_SET_SUBKEY disappears and information is stored in
         description
       - remove scratch area to store closing element of ranges,
         as elements now come with an actual attribute to specify
         the upper range limit (Pablo Neira Ayuso)
       - also remove pointer to 'start' element from mapping table,
         closing key is now accessible via extension data
       - use bytes right away instead of bits for field lengths,
         this way we can also double the inner loop of the lookup
         function to take care of upper and lower bits in a single
         iteration (minor performance improvement)
       - make it clearer that set operations are actually atomic
         API-wise, but we can't e.g. implement flush() as one-shot
         action
       - fix type for 'dup' in nft_pipapo_insert(), check for
         duplicates only in the next generation, and in general take
         care of differentiating generation mask cases depending on
         the operation (Pablo Neira Ayuso)
       - report C implementation matching rate in commit message, so
         that AVX2 implementation can be compared (Pablo Neira Ayuso)
      v2:
       - protect access to scratch maps in nft_pipapo_lookup() with
         local_bh_disable/enable() (Florian Westphal)
       - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's
         already implied (Florian Westphal)
       - explain why partial allocation failures don't need handling
         in pipapo_realloc_scratch(), rename 'm' to clone and update
         related kerneldoc to make it clear we're not operating on
         the live copy (Florian Westphal)
       - add expicit check for priv->start_elem in
         nft_pipapo_insert() to avoid ending up in nft_pipapo_walk()
         with a NULL start element, and also zero it out in every
         operation that might make it invalid, so that insertion
         doesn't proceed with an invalid element (Florian Westphal)
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3c4287f6
  4. 13 Nov, 2019 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: nf_flow_table: hardware offload support · c29f74e0
      Pablo Neira Ayuso authored
      
      This patch adds the dataplane hardware offload to the flowtable
      infrastructure. Three new flags represent the hardware state of this
      flow:
      
      * FLOW_OFFLOAD_HW: This flow entry resides in the hardware.
      * FLOW_OFFLOAD_HW_DYING: This flow entry has been scheduled to be remove
        from hardware. This might be triggered by either packet path (via TCP
        RST/FIN packet) or via aging.
      * FLOW_OFFLOAD_HW_DEAD: This flow entry has been already removed from
        the hardware, the software garbage collector can remove it from the
        software flowtable.
      
      This patch supports for:
      
      * IPv4 only.
      * Aging via FLOW_CLS_STATS, no packet and byte counter synchronization
        at this stage.
      
      This patch also adds the action callback that specifies how to convert
      the flow entry into the flow_rule object that is passed to the driver.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c29f74e0
  5. 13 Sep, 2019 1 commit
    • Jeremy Sowden's avatar
      netfilter: fix coding-style errors. · b0edba2a
      Jeremy Sowden authored
      
      Several header-files, Kconfig files and Makefiles have trailing
      white-space.  Remove it.
      
      In netfilter/Kconfig, indent the type of CONFIG_NETFILTER_NETLINK_ACCT
      correctly.
      
      There are semicolons at the end of two function definitions in
      include/net/netfilter/nf_conntrack_acct.h and
      include/net/netfilter/nf_conntrack_ecache.h. Remove them.
      
      Fix indentation in nf_conntrack_l4proto.h.
      Signed-off-by: default avatarJeremy Sowden <jeremy@azazel.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b0edba2a
  6. 09 Jul, 2019 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add hardware offload support · c9626a2c
      Pablo Neira Ayuso authored
      
      This patch adds hardware offload support for nftables through the
      existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER
      classifier and the flow rule API. This hardware offload support is
      available for the NFPROTO_NETDEV family and the ingress hook.
      
      Each nftables expression has a new ->offload interface, that is used to
      populate the flow rule object that is attached to the transaction
      object.
      
      There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload
      an entire table, including all of its chains.
      
      This patch supports for basic metadata (layer 3 and 4 protocol numbers),
      5-tuple payload matching and the accept/drop actions; this also includes
      basechain hardware offload only.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9626a2c
  7. 05 Jul, 2019 1 commit
  8. 11 Apr, 2019 1 commit
  9. 08 Apr, 2019 1 commit
    • Florian Westphal's avatar
      netfilter: nf_tables: merge route type into core · c1deb065
      Florian Westphal authored
      
      very little code, so it really doesn't make sense to have extra
      modules or even a kconfig knob for this.
      
      Merge them and make functionality available unconditionally.
      The merge makes inet family route support trivial, so add it
      as well here.
      
      Before:
         text	   data	    bss	    dec	    hex	filename
          835	    832	      0	   1667	    683 nft_chain_route_ipv4.ko
          870	    832	      0	   1702	    6a6	nft_chain_route_ipv6.ko
       111568	   2556	    529	 114653	  1bfdd	nf_tables.ko
      
      After:
         text	   data	    bss	    dec	    hex	filename
       113133	   2556	    529	 116218	  1c5fa	nf_tables.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c1deb065
  10. 01 Mar, 2019 1 commit
    • Florian Westphal's avatar
      netfilter: nf_tables: merge ipv4 and ipv6 nat chain types · db8ab388
      Florian Westphal authored
      
      Merge the ipv4 and ipv6 nat chain type. This is the last
      missing piece which allows to provide inet family support
      for nat in a follow patch.
      
      The kconfig knobs for ipv4/ipv6 nat chain are removed, the
      nat chain type will be built unconditionally if NFT_NAT
      expression is enabled.
      
      Before:
         text	   data	    bss	    dec	    hex	filename
         1576     896       0    2472     9a8 nft_chain_nat_ipv4.ko
         1697     896       0    2593     a21 nft_chain_nat_ipv6.ko
      
      After:
         text	   data	    bss	    dec	    hex	filename
         1832     896       0    2728     aa8 nft_chain_nat.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      db8ab388
  11. 27 Feb, 2019 1 commit
  12. 18 Jan, 2019 1 commit
    • Florian Westphal's avatar
      netfilter: conntrack: gre: switch module to be built-in · 22fc4c4c
      Florian Westphal authored
      
      This makes the last of the modular l4 trackers 'bool'.
      
      After this, all infrastructure to handle dynamic l4 protocol registration
      becomes obsolete and can be removed in followup patches.
      
      Old:
      302824 net/netfilter/nf_conntrack.ko
       21504 net/netfilter/nf_conntrack_proto_gre.ko
      
      New:
      313728 net/netfilter/nf_conntrack.ko
      
      Old:
         text	   data	    bss	    dec	    hex	filename
         6281	   1732	      4	   8017	   1f51	nf_conntrack_proto_gre.ko
       108356	  20613	    236	 129205	  1f8b5	nf_conntrack.ko
      New:
       112095	  21381	    240	 133716	  20a54	nf_conntrack.ko
      
      The size increase is only temporary.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      22fc4c4c
  13. 17 Dec, 2018 3 commits
    • Florian Westphal's avatar
      netfilter: nat: remove nf_nat_l4proto struct · 5cbabeec
      Florian Westphal authored
      
      This removes the (now empty) nf_nat_l4proto struct, all its instances
      and all the no longer needed runtime (un)register functionality.
      
      nf_nat_need_gre() can be axed as well: the module that calls it (to
      load the no-longer-existing nat_gre module) also calls other nat core
      functions. GRE nat is now always available if kernel is built with it.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5cbabeec
    • Florian Westphal's avatar
      netfilter: nat: remove l4proto->manip_pkt · faec18db
      Florian Westphal authored
      
      This removes the last l4proto indirection, the two callers, the l3proto
      packet mangling helpers for ipv4 and ipv6, now call the
      nf_nat_l4proto_manip_pkt() helper.
      
      nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though
      they contain no functionality anymore to not clutter this patch.
      
      Next patch will remove the empty files and the nf_nat_l4proto
      struct.
      
      nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the
      other nat manip functionality as well, not just udp and udplite.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      faec18db
    • Florian Westphal's avatar
      netfilter: nat: remove l4proto->nlattr_to_range · 76b90019
      Florian Westphal authored
      
      all protocols did set this to nf_nat_l4proto_nlattr_to_range, so
      just call it directly.
      
      The important difference is that we'll now also call it for
      protocols that we don't support (i.e., nf_nat_proto_unknown did
      not provide .nlattr_to_range).
      
      However, there should be no harm, even icmp provided this callback.
      If we don't implement a specific l4nat for this, nothing would make
      use of this information, so adding a big switch/case construct listing
      all supported l4protocols seems a bit pointless.
      
      This change leaves a single function pointer in the l4proto struct.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      76b90019
  14. 17 Sep, 2018 1 commit
  15. 03 Aug, 2018 1 commit
  16. 30 Jul, 2018 3 commits
  17. 17 Jul, 2018 1 commit
    • Florian Westphal's avatar
      netfilter: conntrack: remove l3proto abstraction · a0ae2562
      Florian Westphal authored
      
      This unifies ipv4 and ipv6 protocol trackers and removes the l3proto
      abstraction.
      
      This gets rid of all l3proto indirect calls and the need to do
      a lookup on the function to call for l3 demux.
      
      It increases module size by only a small amount (12kbyte), so this reduces
      size because nf_conntrack.ko is useless without either nf_conntrack_ipv4
      or nf_conntrack_ipv6 module.
      
      before:
         text    data     bss     dec     hex filename
         7357    1088       0    8445    20fd nf_conntrack_ipv4.ko
         7405    1084       4    8493    212d nf_conntrack_ipv6.ko
        72614   13689     236   86539   1520b nf_conntrack.ko
       19K nf_conntrack_ipv4.ko
       19K nf_conntrack_ipv6.ko
      179K nf_conntrack.ko
      
      after:
         text    data     bss     dec     hex filename
        79277   13937     236   93450   16d0a nf_conntrack.ko
        191K nf_conntrack.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a0ae2562
  18. 16 Jul, 2018 1 commit
  19. 06 Jul, 2018 1 commit
  20. 02 Jun, 2018 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add connlimit support · 290180e2
      Pablo Neira Ayuso authored
      
      This features which allows you to limit the maximum number of
      connections per arbitrary key. The connlimit expression is stateful,
      therefore it can be used from meters to dynamically populate a set, this
      provides a mapping to the iptables' connlimit match. This patch also
      comes that allows you define static connlimit policies.
      
      This extension depends on the nf_conncount infrastructure.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      290180e2
  21. 01 Jun, 2018 1 commit
  22. 28 May, 2018 1 commit
  23. 06 May, 2018 1 commit
  24. 26 Apr, 2018 3 commits
  25. 24 Apr, 2018 1 commit
  26. 21 Apr, 2018 1 commit
  27. 30 Mar, 2018 1 commit
  28. 08 Jan, 2018 5 commits