1. 14 Oct, 2021 11 commits
    • Florian Westphal's avatar
      netfilter: ipvs: remove unneeded input wrappers · 540ff44b
      Florian Westphal authored
      After earlier patch ip_vs_hook_in can be used directly.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      540ff44b
    • Florian Westphal's avatar
      netfilter: ipvs: remove unneeded output wrappers · 8a9941b4
      Florian Westphal authored
      After earlier patch we can use ip_vs_out_hook directly.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8a9941b4
    • Florian Westphal's avatar
      netfilter: ipvs: prepare for hook function reduction · 9dd43a5f
      Florian Westphal authored
      ipvs has multiple one-line wrappers for hooks, compact them.
      
      To avoid a large patch make the two most common helpers use the same
      function signature as hooks.
      
      Next patches can then remove the oneline wrappers.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9dd43a5f
    • Florian Westphal's avatar
      netfilter: ebtables: allow use of ebt_do_table as hookfn · f0d6764f
      Florian Westphal authored
      This is possible now that the xt_table structure is passed via *priv.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f0d6764f
    • Florian Westphal's avatar
      netfilter: ip6tables: allow use of ip6t_do_table as hookfn · 44b5990e
      Florian Westphal authored
      This is possible now that the xt_table structure is passed via *priv.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      44b5990e
    • Florian Westphal's avatar
      netfilter: arp_tables: allow use of arpt_do_table as hookfn · e8d225b6
      Florian Westphal authored
      This is possible now that the xt_table structure is passed in via *priv.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e8d225b6
    • Florian Westphal's avatar
      netfilter: iptables: allow use of ipt_do_table as hookfn · 8844e010
      Florian Westphal authored
      This is possible now that the xt_table structure is passed in via *priv.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8844e010
    • Pablo Neira Ayuso's avatar
      af_packet: Introduce egress hook · 0d7308c0
      Pablo Neira Ayuso authored
      Add egress hook for AF_PACKET sockets that have the PACKET_QDISC_BYPASS
      socket option set to on, which allows packets to escape without being
      filtered in the egress path.
      
      This patch only updates the AF_PACKET path, it does not update
      dev_direct_xmit() so the XDP infrastructure has a chance to bypass
      Netfilter.
      
      [lukas: acquire rcu_read_lock, fix typos, rebase]
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0d7308c0
    • Lukas Wunner's avatar
      netfilter: Introduce egress hook · 42df6e1d
      Lukas Wunner authored
      Support classifying packets with netfilter on egress to satisfy user
      requirements such as:
      * outbound security policies for containers (Laura)
      * filtering and mangling intra-node Direct Server Return (DSR) traffic
        on a load balancer (Laura)
      * filtering locally generated traffic coming in through AF_PACKET,
        such as local ARP traffic generated for clustering purposes or DHCP
        (Laura; the AF_PACKET plumbing is contained in a follow-up commit)
      * L2 filtering from ingress and egress for AVB (Audio Video Bridging)
        and gPTP with nftables (Pablo)
      * in the future: in-kernel NAT64/NAT46 (Pablo)
      
      The egress hook introduced herein complements the ingress hook added by
      commit e687ad60 ("netfilter: add netfilter ingress hook after
      handle_ing() under unique static key").  A patch for nftables to hook up
      egress rules from user space has been submitted separately, so users may
      immediately take advantage of the feature.
      
      Alternatively or in addition to netfilter, packets can be classified
      with traffic control (tc).  On ingress, packets are classified first by
      tc, then by netfilter.  On egress, the order is reversed for symmetry.
      Conceptually, tc and netfilter can be thought of as layers, with
      netfilter layered above tc.
      
      Traffic control is capable of redirecting packets to another interface
      (man 8 tc-mirred).  E.g., an ingress packet may be redirected from the
      host namespace to a container via a veth connection:
      tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container)
      
      In this case, netfilter egress classifying is not performed when leaving
      the host namespace!  That's because the packet is still on the tc layer.
      If tc redirects the packet to a physical interface in the host namespace
      such that it leaves the system, the packet is never subjected to
      netfilter egress classifying.  That is only logical since it hasn't
      passed through netfilter ingress classifying either.
      
      Packets can alternatively be redirected at the netfilter layer using
      nft fwd.  Such a packet *is* subjected to netfilter egress classifying
      since it has reached the netfilter layer.
      
      Internally, the skb->nf_skip_egress flag controls whether netfilter is
      invoked on egress by __dev_queue_xmit().  Because __dev_queue_xmit() may
      be called recursively by tunnel drivers such as vxlan, the flag is
      reverted to false after sch_handle_egress().  This ensures that
      netfilter is applied both on the overlay and underlying network.
      
      Interaction between tc and netfilter is possible by setting and querying
      skb->mark.
      
      If netfilter egress classifying is not enabled on any interface, it is
      patched out of the data path by way of a static_key and doesn't make a
      performance difference that is discernible from noise:
      
      Before:             1537 1538 1538 1537 1538 1537 Mb/sec
      After:              1536 1534 1539 1539 1539 1540 Mb/sec
      Before + tc accept: 1418 1418 1418 1419 1419 1418 Mb/sec
      After  + tc accept: 1419 1424 1418 1419 1422 1420 Mb/sec
      Before + tc drop:   1620 1619 1619 1619 1620 1620 Mb/sec
      After  + tc drop:   1616 1624 1625 1624 1622 1619 Mb/sec
      
      When netfilter egress classifying is enabled on at least one interface,
      a minimal performance penalty is incurred for every egress packet, even
      if the interface it's transmitted over doesn't have any netfilter egress
      rules configured.  That is caused by checking dev->nf_hooks_egress
      against NULL.
      
      Measurements were performed on a Core i7-3615QM.  Commands to reproduce:
      ip link add dev foo type dummy
      ip link set dev foo up
      modprobe pktgen
      echo "add_device foo" > /proc/net/pktgen/kpktgend_3
      samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1
      
      Accept all traffic with tc:
      tc qdisc add dev foo clsact
      tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'
      
      Drop all traffic with tc:
      tc qdisc add dev foo clsact
      tc filter add dev foo egress bpf da bytecode '1,6 0 0 2,'
      
      Apply this patch when measuring packet drops to avoid errors in dmesg:
      https://lore.kernel.org/netdev/a73dda33-57f4-95d8-ea51-ed483abd6a7a@iogearbox.net/Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: Laura García Liébana <nevola@gmail.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      42df6e1d
    • Lukas Wunner's avatar
      netfilter: Generalize ingress hook include file · 17d20784
      Lukas Wunner authored
      Prepare for addition of a netfilter egress hook by generalizing the
      ingress hook include file.
      
      No functional change intended.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      17d20784
    • Lukas Wunner's avatar
      netfilter: Rename ingress hook include file · 7463acfb
      Lukas Wunner authored
      Prepare for addition of a netfilter egress hook by renaming
      <linux/netfilter_ingress.h> to <linux/netfilter_netdev.h>.
      
      The egress hook also necessitates a refactoring of the include file,
      but that is done in a separate commit to ease reviewing.
      
      No functional change intended.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7463acfb
  2. 07 Oct, 2021 29 commits