1. 13 Nov, 2016 13 commits
    • Yotam Gigi's avatar
      mlxsw: spectrum: Fix refcount bug on span entries · 2d644d4c
      Yotam Gigi authored
      When binding port to a newly created span entry, its refcount is
      initialized to zero even though it has a bound port. That leads
      to unexpected behaviour when the user tries to delete that port
      from the span entry.
      
      Fix this by initializing the reference count to 1.
      
      Also add a warning to put function.
      
      Fixes: 763b4b70 ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d644d4c
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fixes' · a055450a
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: 2 bug fixes.
      
      Bug fixes in bnxt_setup_tc() and VF vitual link state.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a055450a
    • Michael Chan's avatar
      bnxt_en: Fix VF virtual link state. · 73b9bad6
      Michael Chan authored
      If the physical link is down and the VF virtual link is set to "enable",
      the current code does not always work.  If the link is down but the
      cable is attached, the firmware returns LINK_SIGNAL instead of
      NO_LINK.  The current code is treating LINK_SIGNAL as link up.
      The fix is to treat link as down when the link_status != LINK.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73b9bad6
    • Michael Chan's avatar
      bnxt_en: Fix ring arithmetic in bnxt_setup_tc(). · 3ffb6a39
      Michael Chan authored
      The logic is missing the check on whether the tx and rx rings are sharing
      completion rings or not.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ffb6a39
    • Mike Frysinger's avatar
      Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" · 7b5b74ef
      Mike Frysinger authored
      This reverts commit cf00713a ("include/uapi/linux/atm_zatm.h: include
      linux/time.h").
      
      This attempted to fix userspace breakage that no longer existed when
      the patch was merged.  Almost one year earlier, commit 70ba07b6
      ("atm: remove 'struct zatm_t_hist'") deleted the struct in question.
      
      After this patch was merged, we now have to deal with people being
      unable to include this header in conjunction with standard C library
      headers like stdlib.h (which linux-atm does).  Example breakage:
      x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -I./../q2931 -I./../saal \
      	-I.  -DCPPFLAGS_TEST  -I../../src/include -O2 -march=native -pipe -g \
      	-frecord-gcc-switches -freport-bug -Wimplicit-function-declaration \
      	-Wnonnull -Wstrict-aliasing -Wparentheses -Warray-bounds \
      	-Wfree-nonheap-object -Wreturn-local-addr -fno-strict-aliasing -Wall \
      	-Wshadow -Wpointer-arith -Wwrite-strings -Wstrict-prototypes -c zntune.c
      In file included from /usr/include/linux/atm_zatm.h:17:0,
                       from zntune.c:17:
      /usr/include/linux/time.h:9:8: error: redefinition of ‘struct timespec’
       struct timespec {
              ^
      In file included from /usr/include/sys/select.h:43:0,
                       from /usr/include/sys/types.h:219,
                       from /usr/include/stdlib.h:314,
                       from zntune.c:9:
      /usr/include/time.h:120:8: note: originally defined here
       struct timespec
              ^
      Signed-off-by: default avatarMike Frysinger <vapier@gentoo.org>
      Acked-by: default avatarMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b5b74ef
    • Eric Dumazet's avatar
      tcp: take care of truncations done by sk_filter() · ac6e7800
      Eric Dumazet authored
      With syzkaller help, Marco Grassi found a bug in TCP stack,
      crashing in tcp_collapse()
      
      Root cause is that sk_filter() can truncate the incoming skb,
      but TCP stack was not really expecting this to happen.
      It probably was expecting a simple DROP or ACCEPT behavior.
      
      We first need to make sure no part of TCP header could be removed.
      Then we need to adjust TCP_SKB_CB(skb)->end_seq
      
      Many thanks to syzkaller team and Marco for giving us a reproducer.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMarco Grassi <marco.gra@gmail.com>
      Reported-by: default avatarVladis Dronov <vdronov@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac6e7800
    • Stephen Suryaputra Lin's avatar
      ipv4: use new_gw for redirect neigh lookup · 969447f2
      Stephen Suryaputra Lin authored
      In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
      and then the state of the neigh for the new_gw is checked. If the state
      isn't valid then the redirected route is deleted. This behavior is
      maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
      is assigned to peer->redirect_learned.a4 before calling
      ipv4_neigh_lookup().
      
      After commit 5943634f ("ipv4: Maintain redirect and PMTU info in
      struct rtable again."), ipv4_neigh_lookup() is performed without the
      rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
      isn't zero, the function uses it as the key. The neigh is most likely
      valid since the old_gw is the one that sends the ICMP redirect message.
      Then the new_gw is assigned to fib_nh_exception. The problem is: the
      new_gw ARP may never gets resolved and the traffic is blackholed.
      
      So, use the new_gw for neigh lookup.
      
      Changes from v1:
       - use __ipv4_neigh_lookup instead (per Eric Dumazet).
      
      Fixes: 5943634f ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
      Signed-off-by: default avatarStephen Suryaputra Lin <ssurya@ieee.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      969447f2
    • Guenter Roeck's avatar
      r8152: Fix error path in open function · ca0a7531
      Guenter Roeck authored
      If usb_submit_urb() called from the open function fails, the following
      crash may be observed.
      
      r8152 8-1:1.0 eth0: intr_urb submit failed: -19
      ...
      r8152 8-1:1.0 eth0: v1.08.3
      Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b7b
      pgd = ffffffc0e7305000
      [6b6b6b6b6b6b6b7b] *pgd=0000000000000000, *pud=0000000000000000
      Internal error: Oops: 96000004 [#1] PREEMPT SMP
      ...
      PC is at notifier_chain_register+0x2c/0x58
      LR is at blocking_notifier_chain_register+0x54/0x70
      ...
      Call trace:
      [<ffffffc0002407f8>] notifier_chain_register+0x2c/0x58
      [<ffffffc000240bdc>] blocking_notifier_chain_register+0x54/0x70
      [<ffffffc00026991c>] register_pm_notifier+0x24/0x2c
      [<ffffffbffc183200>] rtl8152_open+0x3dc/0x3f8 [r8152]
      [<ffffffc000808000>] __dev_open+0xac/0x104
      [<ffffffc0008082f8>] __dev_change_flags+0xb0/0x148
      [<ffffffc0008083c4>] dev_change_flags+0x34/0x70
      [<ffffffc000818344>] do_setlink+0x2c8/0x888
      [<ffffffc0008199d4>] rtnl_newlink+0x328/0x644
      [<ffffffc000819e98>] rtnetlink_rcv_msg+0x1a8/0x1d4
      [<ffffffc0008373c8>] netlink_rcv_skb+0x68/0xd0
      [<ffffffc000817990>] rtnetlink_rcv+0x2c/0x3c
      [<ffffffc000836d1c>] netlink_unicast+0x16c/0x234
      [<ffffffc00083720c>] netlink_sendmsg+0x340/0x364
      [<ffffffc0007e85d0>] sock_sendmsg+0x48/0x60
      [<ffffffc0007e9c30>] SyS_sendto+0xe0/0x120
      [<ffffffc0007e9cb0>] SyS_send+0x40/0x4c
      [<ffffffc000203e34>] el0_svc_naked+0x24/0x28
      
      Clean up error handling to avoid registering the notifier if the open
      function is going to fail.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca0a7531
    • Baruch Siach's avatar
      net: bpqether.h: remove if_ether.h guard · 10b21768
      Baruch Siach authored
      __LINUX_IF_ETHER_H is not defined anywhere, and if_ether.h can keep itself from
      double inclusion, though it uses a single underscore prefix.
      Signed-off-by: default avatarBaruch Siach <baruch@tkos.co.il>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10b21768
    • Eric Dumazet's avatar
      net: __skb_flow_dissect() must cap its return value · 34fad54c
      Eric Dumazet authored
      After Tom patch, thoff field could point past the end of the buffer,
      this could fool some callers.
      
      If an skb was provided, skb->len should be the upper limit.
      If not, hlen is supposed to be the upper limit.
      
      Fixes: a6e544b0 ("flow_dissector: Jump to exit code in __skb_flow_dissect")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: Yibin Yang <yibyang@cisco.com
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34fad54c
    • David S. Miller's avatar
      Merge branch 'fix-bpf_redirect' · 79774d6b
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      bpf: Fix bpf_redirect to an ipip/ip6tnl dev
      
      This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an
      ipip/ip6tnl.  The current problem is IP-EthHdr-IP is sent out instead of
      IP-IP.
      
      Patch 1 adds a dev->type test similar to dev_is_mac_header_xmit()
      in act_mirred.c which is only available in net-next.  We can consider to
      refactor it once this patch is pulled into net-next from net.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79774d6b
    • Martin KaFai Lau's avatar
      bpf: Add test for bpf_redirect to ipip/ip6tnl · 90e02896
      Martin KaFai Lau authored
      The test creates two netns, ns1 and ns2.  The host (the default netns)
      has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
      
          ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
      
      The test is to have ns1 pinging VIPs configured at the loopback
      interface in ns2.
      
      The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
      at lo@ns2). [Note: 0x66 => 102].
      
      At ns1, the VIPs are routed _via_ the host.
      
      At the host, bpf programs are installed at the veth to redirect packets
      from a veth to the ipip/ip6tnl.  The test is configured in a way so
      that both ingress and egress can be tested.
      
      At ns2, the ipip/ip6tnl dev is configured with the local and remote address
      specified.  The return path is routed to the dev ipip/ip6tnl.
      
      During egress test, the host also locally tests pinging the VIPs to ensure
      that bpf_redirect at egress also works for the direct egress (i.e. not
      forwarding from dev ve1 to ve2).
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90e02896
    • Martin KaFai Lau's avatar
      bpf: Fix bpf_redirect to an ipip/ip6tnl dev · 4e3264d2
      Martin KaFai Lau authored
      If the bpf program calls bpf_redirect(dev, 0) and dev is
      an ipip/ip6tnl, it currently includes the mac header.
      e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
      of IP-IP.
      
      The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
      is not needed because the ethhdr should have been pulled once already
      and then got pushed back just before calling the bpf_prog.
      At egress, this patch calls skb_postpull_rcsum().
      
      If bpf_redirect(dev, BPF_F_INGRESS) is called,
      it also fails now because it calls dev_forward_skb() which
      eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
      will set skb->type = PACKET_OTHERHOST because the mac address
      does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
      will eventually cause the ip_rcv() errors out.  To fix this,
      ____dev_forward_skb() is added.
      
      Joint work with Daniel Borkmann.
      
      Fixes: cfc7381b ("ip_tunnel: add collect_md mode to IPIP tunnel")
      Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e3264d2
  2. 10 Nov, 2016 11 commits
  3. 09 Nov, 2016 12 commits
  4. 08 Nov, 2016 4 commits
    • Liping Zhang's avatar
      netfilter: nf_tables: fix oops when inserting an element into a verdict map · 58c78e10
      Liping Zhang authored
      Dalegaard says:
       The following ruleset, when loaded with 'nft -f bad.txt'
       ----snip----
       flush ruleset
       table ip inlinenat {
         map sourcemap {
           type ipv4_addr : verdict;
         }
      
         chain postrouting {
           ip saddr vmap @sourcemap accept
         }
       }
       add chain inlinenat test
       add element inlinenat sourcemap { 100.123.10.2 : jump test }
       ----snip----
      
       results in a kernel oops:
       BUG: unable to handle kernel paging request at 0000000000001344
       IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
       [...]
       Call Trace:
        [<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables]
        [<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables]
        [<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables]
        [<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables]
        [<ffffffff8132c630>] ? nla_strcmp+0x40/0x50
        [<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
        [<ffffffff8132c400>] ? nla_validate+0x60/0x80
        [<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]
      
      Because we forget to fill the net pointer in bind_ctx, so dereferencing
      it may cause kernel crash.
      Reported-by: default avatarDalegaard <dalegaard@gmail.com>
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      58c78e10
    • Florian Westphal's avatar
      netfilter: conntrack: refine gc worker heuristics · e0df8cae
      Florian Westphal authored
      Nicolas Dichtel says:
        After commit b87a2f91 ("netfilter: conntrack: add gc worker to
        remove timed-out entries"), netlink conntrack deletion events may be
        sent with a huge delay.
      
      Nicolas further points at this line:
      
        goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
      
      and indeed, this isn't optimal at all.  Rationale here was to ensure that
      we don't block other work items for too long, even if
      nf_conntrack_htable_size is huge.  But in order to have some guarantee
      about maximum time period where a scan of the full conntrack table
      completes we should always use a fixed slice size, so that once every
      N scans the full table has been examined at least once.
      
      We also need to balance this vs. the case where the system is either idle
      (i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
      from packet path).
      
      So, after some discussion with Nicolas:
      
      1. want hard guarantee that we scan entire table at least once every X s
      -> need to scan fraction of table (get rid of upper bound)
      
      2. don't want to eat cycles on idle or very busy system
      -> increase interval if we did not evict any entries
      
      3. don't want to block other worker items for too long
      -> make fraction really small, and prefer small scan interval instead
      
      4. Want reasonable short time where we detect timed-out entry when
      system went idle after a burst of traffic, while not doing scans
      all the time.
      -> Store next gc scan in worker, increasing delays when no eviction
      happened and shrinking delay when we see timed out entries.
      
      The old gc interval is turned into a max number, scans can now happen
      every jiffy if stale entries are present.
      
      Longest possible time period until an entry is evicted is now 2 minutes
      in worst case (entry expires right after it was deemed 'not expired').
      Reported-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e0df8cae
    • Florian Westphal's avatar
      netfilter: conntrack: fix CT target for UNSPEC helpers · 6114cc51
      Florian Westphal authored
      Thomas reports its not possible to attach the H.245 helper:
      
      iptables -t raw -A PREROUTING -p udp -j CT --helper H.245
      iptables: No chain/target/match by that name.
      xt_CT: No such helper "H.245"
      
      This is because H.245 registers as NFPROTO_UNSPEC, but the CT target
      passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get.
      
      We should treat UNSPEC as wildcard and ignore the l3num instead.
      Reported-by: default avatarThomas Woerner <twoerner@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6114cc51
    • Florian Westphal's avatar
      netfilter: connmark: ignore skbs with magic untracked conntrack objects · fb9c9649
      Florian Westphal authored
      The (percpu) untracked conntrack entries can end up with nonzero connmarks.
      
      The 'untracked' conntrack objects are merely a way to distinguish INVALID
      (i.e. protocol connection tracker says payload doesn't meet some
      requirements or packet was never seen by the connection tracking code)
      from packets that are intentionally not tracked (some icmpv6 types such as
      neigh solicitation, or by using 'iptables -j CT --notrack' option).
      
      Untracked conntrack objects are implementation detail, we might as well use
      invalid magic address instead to tell INVALID and UNTRACKED apart.
      
      Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.
      Reported-by: default avatarXU Tianwen <evan.xu.tianwen@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fb9c9649