1. 15 May, 2017 10 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: revisit chain/object refcounting from elements · 59105446
      Pablo Neira Ayuso authored
      Andreas reports that the following incremental update using our commit
      protocol doesn't work.
      
       # nft -f incremental-update.nft
       delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 }
       delete chain ip filter CIn_1
       ... Error: Could not process rule: Device or resource busy
      
      The existing code is not well-integrated into the commit phase protocol,
      since element deletions do not result in refcount decrement from the
      preparation phase. This results in bogus EBUSY errors like the one
      above.
      
      Two new functions come with this patch:
      
      * nft_set_elem_activate() function is used from the abort path, to
        restore the set element refcounting on objects that occurred from
        the preparation phase.
      
      * nft_set_elem_deactivate() that is called from nft_del_setelem() to
        decrement set element refcounting on objects from the preparation
        phase in the commit protocol.
      
      The nft_data_uninit() has been renamed to nft_data_release() since this
      function does not uninitialize any data store in the data register,
      instead just releases the references to objects. Moreover, a new
      function nft_data_hold() has been introduced to be used from
      nft_set_elem_activate().
      Reported-by: default avatarAndreas Schultz <aschultz@tpip.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      59105446
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: missing sanitization in data from userspace · 71df14b0
      Pablo Neira Ayuso authored
      Do not assume userspace always sends us NFT_DATA_VALUE for bitwise and
      cmp expressions. Although NFT_DATA_VERDICT does not make any sense, it
      is still possible to handcraft a netlink message using this incorrect
      data type.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      71df14b0
    • Liping Zhang's avatar
      netfilter: nf_tables: can't assume lock is acquired when dumping set elems · fa803605
      Liping Zhang authored
      When dumping the elements related to a specified set, we may invoke the
      nf_tables_dump_set with the NFNL_SUBSYS_NFTABLES lock not acquired. So
      we should use the proper rcu operation to avoid race condition, just
      like other nft dump operations.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fa803605
    • Eric Leblond's avatar
      netfilter: synproxy: fix conntrackd interaction · 87e94dbc
      Eric Leblond authored
      This patch fixes the creation of connection tracking entry from
      netlink when synproxy is used. It was missing the addition of
      the synproxy extension.
      
      This was causing kernel crashes when a conntrack entry created by
      conntrackd was used after the switch of traffic from active node
      to the passive node.
      Signed-off-by: default avatarEric Leblond <eric@regit.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      87e94dbc
    • Willem de Bruijn's avatar
      netfilter: xtables: zero padding in data_to_user · 324318f0
      Willem de Bruijn authored
      When looking up an iptables rule, the iptables binary compares the
      aligned match and target data (XT_ALIGN). In some cases this can
      exceed the actual data size to include padding bytes.
      
      Before commit f77bc5b2 ("iptables: use match, target and data
      copy_to_user helpers") the malloc()ed bytes were overwritten by the
      kernel with kzalloced contents, zeroing the padding and making the
      comparison succeed. After this patch, the kernel copies and clears
      only data, leaving the padding bytes undefined.
      
      Extend the clear operation from data size to aligned data size to
      include the padding bytes, if any.
      
      Padding bytes can be observed in both match and target, and the bug
      triggered, by issuing a rule with match icmp and target ACCEPT:
      
        iptables -t mangle -A INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
        iptables -t mangle -D INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
      
      Fixes: f77bc5b2 ("iptables: use match, target and data copy_to_user helpers")
      Reported-by: default avatarPaul Moore <pmoore@redhat.com>
      Reported-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      324318f0
    • Pablo Neira Ayuso's avatar
      Merge tag 'ipvs-fixes-for-v4.12' of http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs · ff1e4300
      Pablo Neira Ayuso authored
      Simon Horman says:
      
      ====================
      IPVS Fixes for v4.12
      
      please consider this fix to IPVS for v4.12.
      
      * It is a fix from Julian Anastasov to only SNAT SNAT packet replies only for
        NATed connections
      
      My understanding is that this fix is appropriate for 4.9.25, 4.10.13, 4.11
      as well as the nf tree. Julian has separately posted backports for other
      -stable kernels; please see:
      
      * [PATCH 3.2.88,3.4.113 -stable 1/3] ipvs: SNAT packet replies only for
              NATed connections
      * [PATCH 3.10.105,3.12.73,3.16.43,4.1.39 -stable 2/3] ipvs: SNAT packet
              replies only for NATed connections
      * [PATCH 4.4.65 -stable 3/3] ipvs: SNAT packet replies only for NATed
              connections
      ====================
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ff1e4300
    • Liping Zhang's avatar
      netfilter: nfnl_cthelper: reject del request if helper obj is in use · 9338d7b4
      Liping Zhang authored
      We can still delete the ct helper even if it is in use, this will cause
      a use-after-free error. In more detail, I mean:
        # nfct helper add ssdp inet udp
        # iptables -t raw -A OUTPUT -p udp -j CT --helper ssdp
        # nfct helper delete ssdp //--> oops, succeed!
        BUG: unable to handle kernel paging request at 000026ca
        IP: 0x26ca
        [...]
        Call Trace:
         ? ipv4_helper+0x62/0x80 [nf_conntrack_ipv4]
         nf_hook_slow+0x21/0xb0
         ip_output+0xe9/0x100
         ? ip_fragment.constprop.54+0xc0/0xc0
         ip_local_out+0x33/0x40
         ip_send_skb+0x16/0x80
         udp_send_skb+0x84/0x240
         udp_sendmsg+0x35d/0xa50
      
      So add reference count to fix this issue, if ct helper is used by
      others, reject the delete request.
      
      Apply this patch:
        # nfct helper delete ssdp
        nfct v1.4.3: netlink error: Device or resource busy
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9338d7b4
    • Liping Zhang's avatar
      netfilter: introduce nf_conntrack_helper_put helper function · d91fc59c
      Liping Zhang authored
      And convert module_put invocation to nf_conntrack_helper_put, this is
      prepared for the followup patch, which will add a refcnt for cthelper,
      so we can reject the deleting request when cthelper is in use.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d91fc59c
    • Liping Zhang's avatar
      netfilter: don't setup nat info for confirmed ct · d110a394
      Liping Zhang authored
      We cannot setup nat info if the ct has been confirmed already, else,
      different cpu may race to handle the same ct. In extreme situation,
      we may hit the "BUG_ON(nf_nat_initialized(ct, maniptype))" in the
      nf_nat_setup_info.
      
      Also running the following commands will easily hit NF_CT_ASSERT in
      nf_conntrack_alter_reply:
        # nft flush ruleset
        # ping -c 2 -W 1 1.1.1.111 &
        # nft add table t
        # nft add chain t c {type nat hook postrouting priority 0 \;}
        # nft add rule t c snat to 4.5.6.7
        WARNING: CPU: 1 PID: 10065 at net/netfilter/nf_conntrack_core.c:1472
        nf_conntrack_alter_reply+0x9a/0x1a0 [nf_conntrack]
        [...]
        Call Trace:
         nf_nat_setup_info+0xad/0x840 [nf_nat]
         ? deactivate_slab+0x65d/0x6c0
         nft_nat_eval+0xcd/0x100 [nft_nat]
         nft_do_chain+0xff/0x5d0 [nf_tables]
         ? mark_held_locks+0x6f/0xa0
         ? __local_bh_enable_ip+0x70/0xa0
         ? trace_hardirqs_on_caller+0x11f/0x190
         ? ipt_do_table+0x310/0x610
         ? trace_hardirqs_on+0xd/0x10
         ? __local_bh_enable_ip+0x70/0xa0
         ? ipt_do_table+0x32b/0x610
         ? __lock_acquire+0x2ac/0x1580
         ? ipt_do_table+0x32b/0x610
         nft_nat_do_chain+0x65/0x80 [nft_chain_nat_ipv4]
         nf_nat_ipv4_fn+0x1ae/0x240 [nf_nat_ipv4]
         nf_nat_ipv4_out+0x4a/0xf0 [nf_nat_ipv4]
         nft_nat_ipv4_out+0x15/0x20 [nft_chain_nat_ipv4]
         nf_hook_slow+0x2c/0xf0
         ip_output+0x154/0x270
      
      So for the confirmed ct, just ignore it and return NF_ACCEPT.
      
      Fixes: 9a08ecfe ("netfilter: don't attach a nat extension by default")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d110a394
    • Matthias Kaehlcke's avatar
      netfilter: ctnetlink: Make some parameters integer to avoid enum mismatch · a2b7cbdd
      Matthias Kaehlcke authored
      Not all parameters passed to ctnetlink_parse_tuple() and
      ctnetlink_exp_dump_tuple() match the enum type in the signatures of these
      functions. Since this is intended change the argument type of to be an
      unsigned integer value.
      Signed-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a2b7cbdd
  2. 12 May, 2017 22 commits
  3. 11 May, 2017 8 commits
    • David S. Miller's avatar
      bpf: Provide a linux/types.h override for bpf selftests. · 0a5539f6
      David S. Miller authored
      We do not want to use the architecture's type.h header when
      building BPF programs which are always 64-bit.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a5539f6
    • David S. Miller's avatar
      Merge branch 'bpf-pkt-ptr-align' · 228b0324
      David S. Miller authored
      David S. Miller says:
      
      ====================
      bpf: Add alignment tracker to verifier.
      
      First we add the alignment tracking logic to the verifier.
      
      Next, we work on building up infrastructure to facilitate regression
      testing of this facility.
      
      Finally, we add the "test_align" test case.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      228b0324
    • David S. Miller's avatar
    • David S. Miller's avatar
      bpf: Add bpf_verify_program() to the library. · 91045f5e
      David S. Miller authored
      This allows a test case to load a BPF program and unconditionally
      acquire the verifier log.
      
      It also allows specification of the strict alignment flag.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      91045f5e
    • David S. Miller's avatar
      bpf: Add strict alignment flag for BPF_PROG_LOAD. · e07b98d9
      David S. Miller authored
      Add a new field, "prog_flags", and an initial flag value
      BPF_F_STRICT_ALIGNMENT.
      
      When set, the verifier will enforce strict pointer alignment
      regardless of the setting of CONFIG_EFFICIENT_UNALIGNED_ACCESS.
      
      The verifier, in this mode, will also use a fixed value of "2" in
      place of NET_IP_ALIGN.
      
      This facilitates test cases that will exercise and validate this part
      of the verifier even when run on architectures where alignment doesn't
      matter.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e07b98d9
    • David S. Miller's avatar
      bpf: Do per-instruction state dumping in verifier when log_level > 1. · c5fc9692
      David S. Miller authored
      If log_level > 1, do a state dump every instruction and emit it in
      a more compact way (without a leading newline).
      
      This will facilitate more sophisticated test cases which inspect the
      verifier log for register state.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c5fc9692
    • David S. Miller's avatar
      bpf: Track alignment of register values in the verifier. · d1174416
      David S. Miller authored
      Currently if we add only constant values to pointers we can fully
      validate the alignment, and properly check if we need to reject the
      program on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures.
      
      However, once an unknown value is introduced we only allow byte sized
      memory accesses which is too restrictive.
      
      Add logic to track the known minimum alignment of register values,
      and propagate this state into registers containing pointers.
      
      The most common paradigm that makes use of this new logic is computing
      the transport header using the IP header length field.  For example:
      
      	struct ethhdr *ep = skb->data;
      	struct iphdr *iph = (struct iphdr *) (ep + 1);
      	struct tcphdr *th;
       ...
      	n = iph->ihl;
      	th = ((void *)iph + (n * 4));
      	port = th->dest;
      
      The existing code will reject the load of th->dest because it cannot
      validate that the alignment is at least 2 once "n * 4" is added the
      the packet pointer.
      
      In the new code, the register holding "n * 4" will have a reg->min_align
      value of 4, because any value multiplied by 4 will be at least 4 byte
      aligned.  (actually, the eBPF code emitted by the compiler in this case
      is most likely to use a shift left by 2, but the end result is identical)
      
      At the critical addition:
      
      	th = ((void *)iph + (n * 4));
      
      The register holding 'th' will start with reg->off value of 14.  The
      pointer addition will transform that reg into something that looks like:
      
      	reg->aux_off = 14
      	reg->aux_off_align = 4
      
      Next, the verifier will look at the th->dest load, and it will see
      a load offset of 2, and first check:
      
      	if (reg->aux_off_align % size)
      
      which will pass because aux_off_align is 4.  reg_off will be computed:
      
      	reg_off = reg->off;
       ...
      		reg_off += reg->aux_off;
      
      plus we have off==2, and it will thus check:
      
      	if ((NET_IP_ALIGN + reg_off + off) % size != 0)
      
      which evaluates to:
      
      	if ((NET_IP_ALIGN + 14 + 2) % size != 0)
      
      On strict alignment architectures, NET_IP_ALIGN is 2, thus:
      
      	if ((2 + 14 + 2) % size != 0)
      
      which passes.
      
      These pointer transformations and checks work regardless of whether
      the constant offset or the variable with known alignment is added
      first to the pointer register.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d1174416
    • Daniel Borkmann's avatar
      bpf, arm64: fix faulty emission of map access in tail calls · d8b54110
      Daniel Borkmann authored
      Shubham was recently asking on netdev why in arm64 JIT we don't multiply
      the index for accessing the tail call map by 8. That led me into testing
      out arm64 JIT wrt tail calls and it turned out I got a NULL pointer
      dereference on the tail call.
      
      The buggy access is at:
      
        prog = array->ptrs[index];
        if (prog == NULL)
            goto out;
      
        [...]
        00000060:  d2800e0a  mov x10, #0x70 // #112
        00000064:  f86a682a  ldr x10, [x1,x10]
        00000068:  f862694b  ldr x11, [x10,x2]
        0000006c:  b40000ab  cbz x11, 0x00000080
        [...]
      
      The code triggering the crash is f862694b. x1 at the time contains the
      address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning,
      above we load the pointer to the program at map slot 0 into x10. x10
      can then be NULL if the slot is not occupied, which we later on try to
      access with a user given offset in x2 that is the map index.
      
      Fix this by emitting the following instead:
      
        [...]
        00000060:  d2800e0a  mov x10, #0x70 // #112
        00000064:  8b0a002a  add x10, x1, x10
        00000068:  d37df04b  lsl x11, x2, #3
        0000006c:  f86b694b  ldr x11, [x10,x11]
        00000070:  b40000ab  cbz x11, 0x00000084
        [...]
      
      This basically adds the offset to ptrs to the base address of the bpf
      array we got and we later on access the map with an index * 8 offset
      relative to that. The tail call map itself is basically one large area
      with meta data at the head followed by the array of prog pointers.
      This makes tail calls working again, tested on Cavium ThunderX ARMv8.
      
      Fixes: ddb55992 ("arm64: bpf: implement bpf_tail_call() helper")
      Reported-by: default avatarShubham Bansal <illusionist.neo@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8b54110