1. 15 May, 2017 2 commits
    • Liping Zhang's avatar
      netfilter: don't setup nat info for confirmed ct · d110a394
      Liping Zhang authored
      We cannot setup nat info if the ct has been confirmed already, else,
      different cpu may race to handle the same ct. In extreme situation,
      we may hit the "BUG_ON(nf_nat_initialized(ct, maniptype))" in the
      nf_nat_setup_info.
      
      Also running the following commands will easily hit NF_CT_ASSERT in
      nf_conntrack_alter_reply:
        # nft flush ruleset
        # ping -c 2 -W 1 1.1.1.111 &
        # nft add table t
        # nft add chain t c {type nat hook postrouting priority 0 \;}
        # nft add rule t c snat to 4.5.6.7
        WARNING: CPU: 1 PID: 10065 at net/netfilter/nf_conntrack_core.c:1472
        nf_conntrack_alter_reply+0x9a/0x1a0 [nf_conntrack]
        [...]
        Call Trace:
         nf_nat_setup_info+0xad/0x840 [nf_nat]
         ? deactivate_slab+0x65d/0x6c0
         nft_nat_eval+0xcd/0x100 [nft_nat]
         nft_do_chain+0xff/0x5d0 [nf_tables]
         ? mark_held_locks+0x6f/0xa0
         ? __local_bh_enable_ip+0x70/0xa0
         ? trace_hardirqs_on_caller+0x11f/0x190
         ? ipt_do_table+0x310/0x610
         ? trace_hardirqs_on+0xd/0x10
         ? __local_bh_enable_ip+0x70/0xa0
         ? ipt_do_table+0x32b/0x610
         ? __lock_acquire+0x2ac/0x1580
         ? ipt_do_table+0x32b/0x610
         nft_nat_do_chain+0x65/0x80 [nft_chain_nat_ipv4]
         nf_nat_ipv4_fn+0x1ae/0x240 [nf_nat_ipv4]
         nf_nat_ipv4_out+0x4a/0xf0 [nf_nat_ipv4]
         nft_nat_ipv4_out+0x15/0x20 [nft_chain_nat_ipv4]
         nf_hook_slow+0x2c/0xf0
         ip_output+0x154/0x270
      
      So for the confirmed ct, just ignore it and return NF_ACCEPT.
      
      Fixes: 9a08ecfe ("netfilter: don't attach a nat extension by default")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d110a394
    • Matthias Kaehlcke's avatar
      netfilter: ctnetlink: Make some parameters integer to avoid enum mismatch · a2b7cbdd
      Matthias Kaehlcke authored
      Not all parameters passed to ctnetlink_parse_tuple() and
      ctnetlink_exp_dump_tuple() match the enum type in the signatures of these
      functions. Since this is intended change the argument type of to be an
      unsigned integer value.
      Signed-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a2b7cbdd
  2. 12 May, 2017 22 commits
  3. 11 May, 2017 16 commits
    • David S. Miller's avatar
      bpf: Provide a linux/types.h override for bpf selftests. · 0a5539f6
      David S. Miller authored
      We do not want to use the architecture's type.h header when
      building BPF programs which are always 64-bit.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a5539f6
    • David S. Miller's avatar
      Merge branch 'bpf-pkt-ptr-align' · 228b0324
      David S. Miller authored
      David S. Miller says:
      
      ====================
      bpf: Add alignment tracker to verifier.
      
      First we add the alignment tracking logic to the verifier.
      
      Next, we work on building up infrastructure to facilitate regression
      testing of this facility.
      
      Finally, we add the "test_align" test case.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      228b0324
    • David S. Miller's avatar
    • David S. Miller's avatar
      bpf: Add bpf_verify_program() to the library. · 91045f5e
      David S. Miller authored
      This allows a test case to load a BPF program and unconditionally
      acquire the verifier log.
      
      It also allows specification of the strict alignment flag.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      91045f5e
    • David S. Miller's avatar
      bpf: Add strict alignment flag for BPF_PROG_LOAD. · e07b98d9
      David S. Miller authored
      Add a new field, "prog_flags", and an initial flag value
      BPF_F_STRICT_ALIGNMENT.
      
      When set, the verifier will enforce strict pointer alignment
      regardless of the setting of CONFIG_EFFICIENT_UNALIGNED_ACCESS.
      
      The verifier, in this mode, will also use a fixed value of "2" in
      place of NET_IP_ALIGN.
      
      This facilitates test cases that will exercise and validate this part
      of the verifier even when run on architectures where alignment doesn't
      matter.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e07b98d9
    • David S. Miller's avatar
      bpf: Do per-instruction state dumping in verifier when log_level > 1. · c5fc9692
      David S. Miller authored
      If log_level > 1, do a state dump every instruction and emit it in
      a more compact way (without a leading newline).
      
      This will facilitate more sophisticated test cases which inspect the
      verifier log for register state.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c5fc9692
    • David S. Miller's avatar
      bpf: Track alignment of register values in the verifier. · d1174416
      David S. Miller authored
      Currently if we add only constant values to pointers we can fully
      validate the alignment, and properly check if we need to reject the
      program on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures.
      
      However, once an unknown value is introduced we only allow byte sized
      memory accesses which is too restrictive.
      
      Add logic to track the known minimum alignment of register values,
      and propagate this state into registers containing pointers.
      
      The most common paradigm that makes use of this new logic is computing
      the transport header using the IP header length field.  For example:
      
      	struct ethhdr *ep = skb->data;
      	struct iphdr *iph = (struct iphdr *) (ep + 1);
      	struct tcphdr *th;
       ...
      	n = iph->ihl;
      	th = ((void *)iph + (n * 4));
      	port = th->dest;
      
      The existing code will reject the load of th->dest because it cannot
      validate that the alignment is at least 2 once "n * 4" is added the
      the packet pointer.
      
      In the new code, the register holding "n * 4" will have a reg->min_align
      value of 4, because any value multiplied by 4 will be at least 4 byte
      aligned.  (actually, the eBPF code emitted by the compiler in this case
      is most likely to use a shift left by 2, but the end result is identical)
      
      At the critical addition:
      
      	th = ((void *)iph + (n * 4));
      
      The register holding 'th' will start with reg->off value of 14.  The
      pointer addition will transform that reg into something that looks like:
      
      	reg->aux_off = 14
      	reg->aux_off_align = 4
      
      Next, the verifier will look at the th->dest load, and it will see
      a load offset of 2, and first check:
      
      	if (reg->aux_off_align % size)
      
      which will pass because aux_off_align is 4.  reg_off will be computed:
      
      	reg_off = reg->off;
       ...
      		reg_off += reg->aux_off;
      
      plus we have off==2, and it will thus check:
      
      	if ((NET_IP_ALIGN + reg_off + off) % size != 0)
      
      which evaluates to:
      
      	if ((NET_IP_ALIGN + 14 + 2) % size != 0)
      
      On strict alignment architectures, NET_IP_ALIGN is 2, thus:
      
      	if ((2 + 14 + 2) % size != 0)
      
      which passes.
      
      These pointer transformations and checks work regardless of whether
      the constant offset or the variable with known alignment is added
      first to the pointer register.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d1174416
    • Daniel Borkmann's avatar
      bpf, arm64: fix faulty emission of map access in tail calls · d8b54110
      Daniel Borkmann authored
      Shubham was recently asking on netdev why in arm64 JIT we don't multiply
      the index for accessing the tail call map by 8. That led me into testing
      out arm64 JIT wrt tail calls and it turned out I got a NULL pointer
      dereference on the tail call.
      
      The buggy access is at:
      
        prog = array->ptrs[index];
        if (prog == NULL)
            goto out;
      
        [...]
        00000060:  d2800e0a  mov x10, #0x70 // #112
        00000064:  f86a682a  ldr x10, [x1,x10]
        00000068:  f862694b  ldr x11, [x10,x2]
        0000006c:  b40000ab  cbz x11, 0x00000080
        [...]
      
      The code triggering the crash is f862694b. x1 at the time contains the
      address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning,
      above we load the pointer to the program at map slot 0 into x10. x10
      can then be NULL if the slot is not occupied, which we later on try to
      access with a user given offset in x2 that is the map index.
      
      Fix this by emitting the following instead:
      
        [...]
        00000060:  d2800e0a  mov x10, #0x70 // #112
        00000064:  8b0a002a  add x10, x1, x10
        00000068:  d37df04b  lsl x11, x2, #3
        0000006c:  f86b694b  ldr x11, [x10,x11]
        00000070:  b40000ab  cbz x11, 0x00000084
        [...]
      
      This basically adds the offset to ptrs to the base address of the bpf
      array we got and we later on access the map with an index * 8 offset
      relative to that. The tail call map itself is basically one large area
      with meta data at the head followed by the array of prog pointers.
      This makes tail calls working again, tested on Cavium ThunderX ARMv8.
      
      Fixes: ddb55992 ("arm64: bpf: implement bpf_tail_call() helper")
      Reported-by: default avatarShubham Bansal <illusionist.neo@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8b54110
    • Ivan Khoronzhuk's avatar
      net: ethernet: ti: netcp_core: return error while dma channel open issue · 5b6cb43b
      Ivan Khoronzhuk authored
      Fix error path while dma open channel issue. Also, no need to check output
      on NULL if it's never returned.
      Signed-off-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b6cb43b
    • David S. Miller's avatar
      Merge branch 's390-net-fixes' · dc319c4b
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/net fixes
      
      some qeth fixes for -net, the OSM/OSN one being the most crucial.
      Please also queue these up for stable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc319c4b
    • Ursula Braun's avatar
      s390/qeth: add missing hash table initializations · ebccc739
      Ursula Braun authored
      commit 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      added new hash tables, but missed to initialize them.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Reviewed-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebccc739
    • Julian Wiedmann's avatar
      s390/qeth: avoid null pointer dereference on OSN · 25e2c341
      Julian Wiedmann authored
      Access card->dev only after checking whether's its valid.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25e2c341
    • Julian Wiedmann's avatar
      s390/qeth: unbreak OSM and OSN support · 2d2ebb3e
      Julian Wiedmann authored
      commit b4d72c08 ("qeth: bridgeport support - basic control")
      broke the support for OSM and OSN devices as follows:
      
      As OSM and OSN are L2 only, qeth_core_probe_device() does an early
      setup by loading the l2 discipline and calling qeth_l2_probe_device().
      In this context, adding the l2-specific bridgeport sysfs attributes
      via qeth_l2_create_device_attributes() hits a BUG_ON in fs/sysfs/group.c,
      since the basic sysfs infrastructure for the device hasn't been
      established yet.
      
      Note that OSN actually has its own unique sysfs attributes
      (qeth_osn_devtype), so the additional attributes shouldn't be created
      at all.
      For OSM, add a new qeth_l2_devtype that contains all the common
      and l2-specific sysfs attributes.
      When qeth_core_probe_device() does early setup for OSM or OSN, assign
      the corresponding devtype so that the ccwgroup probe code creates the
      full set of sysfs attributes.
      This allows us to skip qeth_l2_create_device_attributes() in case
      of an early setup.
      
      Any device that can't do early setup will initially have only the
      generic sysfs attributes, and when it's probed later
      qeth_l2_probe_device() adds the l2-specific attributes.
      
      If an early-setup device is removed (by calling ccwgroup_ungroup()),
      device_unregister() will - using the devtype - delete the
      l2-specific attributes before qeth_l2_remove_device() is called.
      So make sure to not remove them twice.
      
      What complicates the issue is that qeth_l2_probe_device() and
      qeth_l2_remove_device() is also called on a device when its
      layer2 attribute changes (ie. its layer mode is switched).
      For early-setup devices this wouldn't work properly - we wouldn't
      remove the l2-specific attributes when switching to L3.
      But switching the layer mode doesn't actually make any sense;
      we already decided that the device can only operate in L2!
      So just refuse to switch the layer mode on such devices. Note that
      OSN doesn't have a layer2 attribute, so we only need to special-case
      OSM.
      
      Based on an initial patch by Ursula Braun.
      
      Fixes: b4d72c08 ("qeth: bridgeport support - basic control")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d2ebb3e
    • Ursula Braun's avatar
      s390/qeth: handle sysfs error during initialization · 9111e788
      Ursula Braun authored
      When setting up the device from within the layer discipline's
      probe routine, creating the layer-specific sysfs attributes can fail.
      Report this error back to the caller, and handle it by
      releasing the layer discipline.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      [jwi: updated commit msg, moved an OSN change to a subsequent patch]
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9111e788
    • Jon Mason's avatar
      mdio: mux: Correct mdio_mux_init error path issues · b6016166
      Jon Mason authored
      There is a potential unnecessary refcount decrement on error path of
      put_device(&pb->mii_bus->dev), as it is possible to avoid the
      of_mdio_find_bus() call if mux_bus is specified by the calling function.
      
      The same put_device() is not called in the error path if the
      devm_kzalloc of pb fails.  This caused the variable used in the
      put_device() to be changed, as the pb pointer was obviously not set up.
      
      There is an unnecessary of_node_get() on child_bus_node if the
      of_mdiobus_register() is successful, as the
      for_each_available_child_of_node() automatically increments this.
      Thus the refcount on this node will always be +1 more than it should be.
      
      There is no of_node_put() on child_bus_node if the of_mdiobus_register()
      call fails.
      
      Finally, it is lacking devm_kfree() of pb in the error path.  While this
      might not be technically necessary, it was present in other parts of the
      function.  So, I am adding it where necessary to make it uniform.
      Signed-off-by: default avatarJon Mason <jon.mason@broadcom.com>
      Fixes: f20e6657 ("mdio: mux: Enhanced MDIO mux framework for integrated multiplexers")
      Fixes: 0ca2997d ("netdev/of/phy: Add MDIO bus multiplexer support.")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6016166
    • WANG Cong's avatar
      ipv6/dccp: do not inherit ipv6_mc_list from parent · 83eaddab
      WANG Cong authored
      Like commit 657831ff ("dccp/tcp: do not inherit mc_list from parent")
      we should clear ipv6_mc_list etc. for IPv6 sockets too.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83eaddab