1. 08 Apr, 2014 2 commits
    • Daniel Borkmann's avatar
      net: sctp: wake up all assocs if sndbuf policy is per socket · 52c35bef
      Daniel Borkmann authored
      SCTP charges chunks for wmem accounting via skb->truesize in
      sctp_set_owner_w(), and sctp_wfree() respectively as the
      reverse operation. If a sender runs out of wmem, it needs to
      wait via sctp_wait_for_sndbuf(), and gets woken up by a call
      to __sctp_write_space() mostly via sctp_wfree().
      
      __sctp_write_space() is being called per association. Although
      we assign sk->sk_write_space() to sctp_write_space(), which
      is then being done per socket, it is only used if send space
      is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
      is set and therefore not invoked in sock_wfree().
      
      Commit 4c3a5bda ("sctp: Don't charge for data in sndbuf
      again when transmitting packet") fixed an issue where in case
      sctp_packet_transmit() manages to queue up more than sndbuf
      bytes, sctp_wait_for_sndbuf() will never be woken up again
      unless it is interrupted by a signal. However, a still
      remaining issue is that if net.sctp.sndbuf_policy=0, that is
      accounting per socket, and one-to-many sockets are in use,
      the reclaimed write space from sctp_wfree() is 'unfairly'
      handed back on the server to the association that is the lucky
      one to be woken up again via __sctp_write_space(), while
      the remaining associations are never be woken up again
      (unless by a signal).
      
      The effect disappears with net.sctp.sndbuf_policy=1, that
      is wmem accounting per association, as it guarantees a fair
      share of wmem among associations.
      
      Therefore, if we have reclaimed memory in case of per socket
      accounting, wake all related associations to a socket in a
      fair manner, that is, traverse the socket association list
      starting from the current neighbour of the association and
      issue a __sctp_write_space() to everyone until we end up
      waking ourselves. This guarantees that no association is
      preferred over another and even if more associations are
      taken into the one-to-many session, all receivers will get
      messages from the server and are not stalled forever on
      high load. This setting still leaves the advantage of per
      socket accounting in touch as an association can still use
      up global limits if unused by others.
      
      Fixes: 4eb701df ("[SCTP] Fix SCTP sendbuffer accouting.")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52c35bef
    • Dan Carpenter's avatar
      isdnloop: several buffer overflows · 7563487c
      Dan Carpenter authored
      There are three buffer overflows addressed in this patch.
      
      1) In isdnloop_fake_err() we add an 'E' to a 60 character string and
      then copy it into a 60 character buffer.  I have made the destination
      buffer 64 characters and I'm changed the sprintf() to a snprintf().
      
      2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60
      character buffer so we have 54 characters.  The ->eazlist[] is 11
      characters long.  I have modified the code to return if the source
      buffer is too long.
      
      3) In isdnloop_command() the cbuf[] array was 60 characters long but the
      max length of the string then can be up to 79 characters.  I made the
      cbuf array 80 characters long and changed the sprintf() to snprintf().
      I also removed the temporary "dial" buffer and changed it to use "p"
      directly.
      
      Unfortunately, we pass the "cbuf" string from isdnloop_command() to
      isdnloop_writecmd() which truncates anything over 60 characters to make
      it fit in card->omsg[].  (It can accept values up to 255 characters so
      long as there is a '\n' character every 60 characters).  For now I have
      just fixed the memory corruption bug and left the other problems in this
      driver alone.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7563487c
  2. 07 Apr, 2014 8 commits
  3. 06 Apr, 2014 1 commit
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · d80e773f
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      The following patchset contains Netfilter fixes for your net tree, they
      are:
      
      * Use 16-bits offset and length fields instead of 8-bits in the conntrack
        extension to avoid an overflow when many conntrack extension are used,
        from Andrey Vagin.
      
      * Allow to use cgroup match from LOCAL_IN, there is no apparent reason
        for not allowing this, from Alexey Perevalov.
      
      * Fix build of the connlimit match after recent changes to let it scale
        up that result in a divide by zero compilation error in UP, from
        Florian Westphal.
      
      * Move the lock out of the structure connlimit_data to avoid a false
        sharing spotted by Eric Dumazet and Jesper D. Brouer, this needed as
        part of the recent connlimit scalability improvements, also from
        Florian Westphal.
      
      * Add missing module aliases in xt_osf to fix loading of rules using
        this match, from Kirill Tkhai.
      
      * Restrict set names in nf_tables to 15 characters instead of silently
        trimming them off, from me.
      
      * Fix wrong format in nf_tables request module call for chain types,
        spotted by Florian Westphal, patch from me.
      
      * Fix crash in xtables when it fails to copy the counters back to userspace
        after having replaced the table already.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d80e773f
  4. 05 Apr, 2014 1 commit
    • Thomas Graf's avatar
      netfilter: Can't fail and free after table replacement · c58dd2dd
      Thomas Graf authored
      All xtables variants suffer from the defect that the copy_to_user()
      to copy the counters to user memory may fail after the table has
      already been exchanged and thus exposed. Return an error at this
      point will result in freeing the already exposed table. Any
      subsequent packet processing will result in a kernel panic.
      
      We can't copy the counters before exposing the new tables as we
      want provide the counter state after the old table has been
      unhooked. Therefore convert this into a silent error.
      
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c58dd2dd
  5. 04 Apr, 2014 4 commits
  6. 03 Apr, 2014 24 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: fix wrong format in request_module() · 2fec6bb6
      Pablo Neira Ayuso authored
      The intended format in request_module is %.*s instead of %*.s.
      Reported-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2fec6bb6
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: set names cannot be larger than 15 bytes · a9bdd836
      Pablo Neira Ayuso authored
      Currently, nf_tables trims off the set name if it exceeeds 15
      bytes, so explicitly reject set names that are too large.
      Reported-by: default avatarGiuseppe Longo <giuseppelng@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a9bdd836
    • Andrey Vagin's avatar
      netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len · 223b02d9
      Andrey Vagin authored
      "len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
      case it can contain all extensions. Bellow you can find sizes for all
      types of extensions. Their sum is definitely bigger than 256.
      
      nf_ct_ext_types[0]->len = 24
      nf_ct_ext_types[1]->len = 32
      nf_ct_ext_types[2]->len = 24
      nf_ct_ext_types[3]->len = 32
      nf_ct_ext_types[4]->len = 152
      nf_ct_ext_types[5]->len = 2
      nf_ct_ext_types[6]->len = 16
      nf_ct_ext_types[7]->len = 8
      
      I have seen "len" up to 280 and my host has crashes w/o this patch.
      
      The right way to fix this problem is reducing the size of the ecache
      extension (4) and Florian is going to do this, but these changes will
      be quite large to be appropriate for a stable tree.
      
      Fixes: 5b423f6a (netfilter: nf_conntrack: fix racy timer handling with reliable)
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      223b02d9
    • Kirill Tkhai's avatar
      netfilter: Add {ipt,ip6t}_osf aliases for xt_osf · b8ddd9ea
      Kirill Tkhai authored
      There are no these aliases, so kernel can not request appropriate
      match table:
      
      $ iptables -I INPUT -p tcp -m osf --genre Windows --ttl 2 -j DROP
      iptables: No chain/target/match by that name.
      
      setsockopt() requests ipt_osf module, which is not present. Add
      the aliases.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b8ddd9ea
    • Alexey Perevalov's avatar
      netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks · a00e7634
      Alexey Perevalov authored
      This simple modification allows iptables to work with INPUT chain
      in combination with cgroup module. It could be useful for counting
      ingress traffic per cgroup with nfacct netfilter module. There
      were no problems to count the egress traffic that way formerly.
      
      It's possible to get classified sk_buff after PREROUTING, due to
      socket lookup being done in early_demux (tcp_v4_early_demux). Also
      it works for udp as well.
      
      Trivial usage example, assuming we're in the same shell every step
      and we have enough permissions:
      
      1) Classic net_cls cgroup initialization:
      
        mkdir /sys/fs/cgroup/net_cls
        mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
      
      2) Set up cgroup for interesting application:
      
        mkdir /sys/fs/cgroup/net_cls/wget
        echo 1 > /sys/fs/cgroup/net_cls/wget/net_cls.classid
        echo $BASHPID > /sys/fs/cgroup/net_cls/wget/cgroup.procs
      
      3) Create kernel counters:
      
        nfacct add wget-cgroup-in
        iptables -A INPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-in
      
        nfacct add wget-cgroup-out
        iptables -A OUTPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-out
      
      4) Network usage:
      
        wget https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.14-rc6.tar.xz
      
      5) Check results:
      
        nfacct list
      
      Cgroup approach is being used for the DataUsage (counting & blocking
      traffic) feature for Samsung's modification of the Tizen OS.
      Signed-off-by: default avatarAlexey Perevalov <a.perevalov@samsung.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a00e7634
    • Florian Westphal's avatar
      netfilter: connlimit: move lock array out of struct connlimit_data · e00b437b
      Florian Westphal authored
      Eric points out that the locks can be global.
      Moreover, both Jesper and Eric note that using only 32 locks increases
      false sharing as only two cache lines are used.
      
      This increases locks to 256 (16 cache lines assuming 64byte cacheline and
      4 bytes per spinlock).
      Suggested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e00b437b
    • Florian Westphal's avatar
      netfilter: connlimit: fix UP build · e5ac6eaf
      Florian Westphal authored
      cannot use ARRAY_SIZE() if spinlock_t is empty struct.
      
      Fixes: 1442e750 ("netfilter: connlimit: use keyed locks")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e5ac6eaf
    • Eric Dumazet's avatar
      net-gro: reset skb->truesize in napi_reuse_skb() · e33d0ba8
      Eric Dumazet authored
      Recycling skb always had been very tough...
      
      This time it appears GRO layer can accumulate skb->truesize
      adjustments made by drivers when they attach a fragment to skb.
      
      skb_gro_receive() can only subtract from skb->truesize the used part
      of a fragment.
      
      I spotted this problem seeing TcpExtPruneCalled and
      TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where
      TCP receive window should be sized properly to accept traffic coming
      from a driver not overshooting skb->truesize.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e33d0ba8
    • Philipp Zabel's avatar
      net: Micrel KSZ8864RMN 4-port managed switch support · 240a12d5
      Philipp Zabel authored
      This patch adds support for the Micrel KSZ8864RMN switch to the spi_ks8995
      driver. The KSZ8864RMN switch has a wider 256-byte register space.
      Signed-off-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      240a12d5
    • Erik Hugne's avatar
      tipc: fix regression bug where node events are not being generated · a5e7ac5c
      Erik Hugne authored
      Commit 5902385a ("tipc: obsolete
      the remote management feature") introduces a regression where node
      topology events are not being generated because the publication
      that triggers this: {0, <z.c.n>, <z.c.n>} is no longer available.
      This will break applications that rely on node events to discover
      when nodes join/leave a cluster.
      
      We fix this by advertising the node publication when TIPC enters
      networking mode, and withdraws it upon shutdown.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5e7ac5c
    • françois romieu's avatar
      sxgbe: fix driver probe error path and driver removal leaks · d9bd6461
      françois romieu authored
      sxgbe_drv_probe:  mdio and priv->hw leaks
      sxgbe_drv_remove: clk and priv->hw leaks
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Acked-by: default avatarByungho An <bh74.an@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9bd6461
    • françois romieu's avatar
    • Jiri Pirko's avatar
      net: add busy_poll device feature · d0290214
      Jiri Pirko authored
      Currently there is no way how to find out if a device supports busy
      polling. So add a feature and make it dependent on ndo_busy_poll
      existence.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0290214
    • Daniel Borkmann's avatar
      packet: fix packet_direct_xmit for BQL enabled drivers · 8e2f1a63
      Daniel Borkmann authored
      Currently, in packet_direct_xmit() we test the assigned netdevice queue
      for netif_xmit_frozen_or_stopped() before doing an ndo_start_xmit().
      
      This can have the side-effect that BQL enabled drivers which make use
      of netdev_tx_sent_queue() internally, set __QUEUE_STATE_STACK_XOFF from
      within the stack and would not fully fill the device's TX ring from
      packet sockets with PACKET_QDISC_BYPASS enabled.
      
      Instead, use a test without BQL bit so that bursts can be absorbed
      into the NICs TX ring. Fix and code suggested by Eric Dumazet, thanks!
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e2f1a63
    • Daniel Borkmann's avatar
      packet: report tx_dropped in packet_direct_xmit · 0f97ede4
      Daniel Borkmann authored
      Since commit 015f0688 ("net: net: add a core netdev->tx_dropped
      counter"), we can now account for TX drops from within the core
      stack instead of drivers.
      
      Therefore, fix packet_direct_xmit() and increase drop count when we
      encounter a problem before driver's xmit function was called (we do
      not want to doubly account for it).
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f97ede4
    • Zoltan Kiss's avatar
      xen-netback: Grant copy the header instead of map and memcpy · bdab8275
      Zoltan Kiss authored
      An old inefficiency of the TX path that we are grant mapping the first slot,
      and then copy the header part to the linear area. Instead, doing a grant copy
      for that header straight on is more reasonable. Especially because there are
      ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
      not touched in Dom0. In the original way the memcpy ruined that.
      The key changes:
      - the vif has a tx_copy_ops array again
      - xenvif_tx_build_gops sets up the grant copy operations
      - we don't have to figure out whether the header and first frag are on the same
        grant mapped page or not
      Note, we only grant copy PKT_PROT_LEN bytes from the first slot, the rest (if
      any) will be on the first frag, which is grant mapped. If the first slot is
      smaller than PKT_PROT_LEN, then we grant copy that, and later __pskb_pull_tail
      will pull more from the frags (if any)
      Signed-off-by: default avatarZoltan Kiss <zoltan.kiss@citrix.com>
      Reviewed-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdab8275
    • Zoltan Kiss's avatar
      xen-netback: Rename map ops · 9074ce24
      Zoltan Kiss authored
      Rename identifiers to state explicitly that they refer to map ops.
      Signed-off-by: default avatarZoltan Kiss <zoltan.kiss@citrix.com>
      Reviewed-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9074ce24
    • Josh Boyer's avatar
      net: qlcnic: include irq.h for irq definitions · acdd32be
      Josh Boyer authored
      The qlcnic driver fails to build on ARM with errors like:
      
      In file included from drivers/net/ethernet/qlogic/qlcnic/qlcnic.h:36:0,
                       from drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c:8:
      drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h:585:1: error: unknown type name 'irqreturn_t'
       irqreturn_t qlcnic_83xx_clear_legacy_intr(struct qlcnic_adapter *);
       ^
      
      Nothing in the driver is explicitly including the irq definitions, so we
      add an include of linux/irq.h to pick them up.
      Signed-off-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acdd32be
    • Josh Boyer's avatar
      net: enic: include irq.h for irqreturn_t definitions · fef1f07c
      Josh Boyer authored
      The enic driver fails to build on ARM with:
      
      In file included from drivers/net/ethernet/cisco/enic/enic_res.c:40:0:
      drivers/net/ethernet/cisco/enic/enic.h:48:2: error: expected specifier-qualifier-list before 'irqreturn_t'
        irqreturn_t (*isr)(int, void *);
        ^
      
      Nothing in the driver is explicitly including the irq definitions, so we add
      an include of linux/irq.h to pick them up.
      Signed-off-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fef1f07c
    • Josh Boyer's avatar
      net: bnx2x: include irq.h for irqreturn_t definitions · df1efc2d
      Josh Boyer authored
      The bnx2x driver fails to build on ARM with:
      
      In file included from drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:28:0:
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h:243:1: error: unknown type name 'irqreturn_t'
       irqreturn_t bnx2x_msix_sp_int(int irq, void *dev_instance);
       ^
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h:251:1: error: unknown type name 'irqreturn_t'
       irqreturn_t bnx2x_interrupt(int irq, void *dev_instance);
       ^
      
      Nothing in bnx2x_link.c or bnx2x_cmn.h is explicitly including the irq
      definitions, so we add an include of linux/irq.h to pick them up.
      Signed-off-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df1efc2d
    • YOSHIFUJI Hideaki / 吉藤英明's avatar
      isdnloop: Validate NUL-terminated strings from user. · 77bc6bed
      YOSHIFUJI Hideaki / 吉藤英明 authored
      Return -EINVAL unless all of user-given strings are correctly
      NUL-terminated.
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77bc6bed
    • Alexei Starovoitov's avatar
      net: ti: fix CPTS driver build on arm · 79eb9d28
      Alexei Starovoitov authored
      fix build errors:
      drivers/net/ethernet/ti/cpts.c:266:12: error: 'ETH_HLEN' undeclared (first use in this function)
      drivers/net/ethernet/ti/cpts.c:276:23: error: 'VLAN_HLEN' undeclared (first use in this function)
      
      Fixes: 408eccce ("net: ptp: move PTP classifier in its own file")
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Suggested-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79eb9d28
    • Mike Rapoport's avatar
      net: vxlan: fix crash when interface is created with no group · 5933a7bb
      Mike Rapoport authored
      If the vxlan interface is created without explicit group definition,
      there are corner cases which may cause kernel panic.
      
      For instance, in the following scenario:
      
      node A:
      $ ip link add dev vxlan42  address 2c:c2:60:00:10:20 type vxlan id 42
      $ ip addr add dev vxlan42 10.0.0.1/24
      $ ip link set up dev vxlan42
      $ arp -i vxlan42 -s 10.0.0.2 2c:c2:60:00:01:02
      $ bridge fdb add dev vxlan42 to 2c:c2:60:00:01:02 dst <IPv4 address>
      $ ping 10.0.0.2
      
      node B:
      $ ip link add dev vxlan42 address 2c:c2:60:00:01:02 type vxlan id 42
      $ ip addr add dev vxlan42 10.0.0.2/24
      $ ip link set up dev vxlan42
      $ arp -i vxlan42 -s 10.0.0.1 2c:c2:60:00:10:20
      
      node B crashes:
      
       vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address)
       vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address)
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000046
       IP: [<ffffffff8143c459>] ip6_route_output+0x58/0x82
       PGD 7bd89067 PUD 7bd4e067 PMD 0
       Oops: 0000 [#1] SMP
       Modules linked in:
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc8-hvx-xen-00019-g97a5221f-dirty #154
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff88007c774f50 ti: ffff88007c79c000 task.ti: ffff88007c79c000
       RIP: 0010:[<ffffffff8143c459>]  [<ffffffff8143c459>] ip6_route_output+0x58/0x82
       RSP: 0018:ffff88007fd03668  EFLAGS: 00010282
       RAX: 0000000000000000 RBX: ffffffff8186a000 RCX: 0000000000000040
       RDX: 0000000000000000 RSI: ffff88007b0e4a80 RDI: ffff88007fd03754
       RBP: ffff88007fd03688 R08: ffff88007b0e4a80 R09: 0000000000000000
       R10: 0200000a0100000a R11: 0001002200000000 R12: ffff88007fd03740
       R13: ffff88007b0e4a80 R14: ffff88007b0e4a80 R15: ffff88007bba0c50
       FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000046 CR3: 000000007bb60000 CR4: 00000000000006e0
       Stack:
        0000000000000000 ffff88007fd037a0 ffffffff8186a000 ffff88007fd03740
        ffff88007fd036c8 ffffffff814320bb 0000000000006e49 ffff88007b8b7360
        ffff88007bdbf200 ffff88007bcbc000 ffff88007b8b7000 ffff88007b8b7360
       Call Trace:
        <IRQ>
        [<ffffffff814320bb>] ip6_dst_lookup_tail+0x2d/0xa4
        [<ffffffff814322a5>] ip6_dst_lookup+0x10/0x12
        [<ffffffff81323b4e>] vxlan_xmit_one+0x32a/0x68c
        [<ffffffff814a325a>] ? _raw_spin_unlock_irqrestore+0x12/0x14
        [<ffffffff8104c551>] ? lock_timer_base.isra.23+0x26/0x4b
        [<ffffffff8132451a>] vxlan_xmit+0x66a/0x6a8
        [<ffffffff8141a365>] ? ipt_do_table+0x35f/0x37e
        [<ffffffff81204ba2>] ? selinux_ip_postroute+0x41/0x26e
        [<ffffffff8139d0c1>] dev_hard_start_xmit+0x2ce/0x3ce
        [<ffffffff8139d491>] __dev_queue_xmit+0x2d0/0x392
        [<ffffffff813b380f>] ? eth_header+0x28/0xb5
        [<ffffffff8139d569>] dev_queue_xmit+0xb/0xd
        [<ffffffff813a5aa6>] neigh_resolve_output+0x134/0x152
        [<ffffffff813db741>] ip_finish_output2+0x236/0x299
        [<ffffffff813dc074>] ip_finish_output+0x98/0x9d
        [<ffffffff813dc749>] ip_output+0x62/0x67
        [<ffffffff813da9f2>] dst_output+0xf/0x11
        [<ffffffff813dc11c>] ip_local_out+0x1b/0x1f
        [<ffffffff813dcf1b>] ip_send_skb+0x11/0x37
        [<ffffffff813dcf70>] ip_push_pending_frames+0x2f/0x33
        [<ffffffff813ff732>] icmp_push_reply+0x106/0x115
        [<ffffffff813ff9e4>] icmp_reply+0x142/0x164
        [<ffffffff813ffb3b>] icmp_echo.part.16+0x46/0x48
        [<ffffffff813c1d30>] ? nf_iterate+0x43/0x80
        [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52
        [<ffffffff813ffb62>] icmp_echo+0x25/0x27
        [<ffffffff814005f7>] icmp_rcv+0x1d2/0x20a
        [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52
        [<ffffffff813d810d>] ip_local_deliver_finish+0xd6/0x14f
        [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52
        [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53
        [<ffffffff813d82bf>] ip_local_deliver+0x4a/0x4f
        [<ffffffff813d7f7b>] ip_rcv_finish+0x253/0x26a
        [<ffffffff813d7d28>] ? inet_add_protocol+0x3e/0x3e
        [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53
        [<ffffffff813d856a>] ip_rcv+0x2a6/0x2ec
        [<ffffffff8139a9a0>] __netif_receive_skb_core+0x43e/0x478
        [<ffffffff812a346f>] ? virtqueue_poll+0x16/0x27
        [<ffffffff8139aa2f>] __netif_receive_skb+0x55/0x5a
        [<ffffffff8139aaaa>] process_backlog+0x76/0x12f
        [<ffffffff8139add8>] net_rx_action+0xa2/0x1ab
        [<ffffffff81047847>] __do_softirq+0xca/0x1d1
        [<ffffffff81047ace>] irq_exit+0x3e/0x85
        [<ffffffff8100b98b>] do_IRQ+0xa9/0xc4
        [<ffffffff814a37ad>] common_interrupt+0x6d/0x6d
        <EOI>
        [<ffffffff810378db>] ? native_safe_halt+0x6/0x8
        [<ffffffff810110c7>] default_idle+0x9/0xd
        [<ffffffff81011694>] arch_cpu_idle+0x13/0x1c
        [<ffffffff8107480d>] cpu_startup_entry+0xbc/0x137
        [<ffffffff8102e741>] start_secondary+0x1a0/0x1a5
       Code: 24 14 e8 f1 e5 01 00 31 d2 a8 32 0f 95 c2 49 8b 44 24 2c 49 0b 44 24 24 74 05 83 ca 04 eb 1c 4d 85 ed 74 17 49 8b 85 a8 02 00 00 <66> 8b 40 46 66 c1 e8 07 83 e0 07 c1 e0 03 09 c2 4c 89 e6 48 89
       RIP  [<ffffffff8143c459>] ip6_route_output+0x58/0x82
        RSP <ffff88007fd03668>
       CR2: 0000000000000046
       ---[ end trace 4612329caab37efd ]---
      
      When vxlan interface is created without explicit group definition, the
      default_dst protocol family is initialiazed to AF_UNSPEC and the driver
      assumes IPv4 configuration. On the other side, the default_dst protocol
      family is used to differentiate between IPv4 and IPv6 cases and, since,
      AF_UNSPEC != AF_INET, the processing takes the IPv6 path.
      
      Making the IPv4 assumption explicit by settting default_dst protocol
      family to AF_INET4 and preventing mixing of IPv4 and IPv6 addresses in
      snooped fdb entries fixes the corner case crashes.
      Signed-off-by: default avatarMike Rapoport <mike.rapoport@ravellosystems.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5933a7bb
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · cd6362be
      Linus Torvalds authored
      Pull networking updates from David Miller:
       "Here is my initial pull request for the networking subsystem during
        this merge window:
      
         1) Support for ESN in AH (RFC 4302) from Fan Du.
      
         2) Add full kernel doc for ethtool command structures, from Ben
            Hutchings.
      
         3) Add BCM7xxx PHY driver, from Florian Fainelli.
      
         4) Export computed TCP rate information in netlink socket dumps, from
            Eric Dumazet.
      
         5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
            Dichtel.
      
         6) Convert many drivers to pci_enable_msix_range(), from Alexander
            Gordeev.
      
         7) Record SKB timestamps more efficiently, from Eric Dumazet.
      
         8) Switch to microsecond resolution for TCP round trip times, also
            from Eric Dumazet.
      
         9) Clean up and fix 6lowpan fragmentation handling by making use of
            the existing inet_frag api for it's implementation.
      
        10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.
      
        11) Auto size SKB lengths when composing netlink messages based upon
            past message sizes used, from Eric Dumazet.
      
        12) qdisc dumps can take a long time, add a cond_resched(), From Eric
            Dumazet.
      
        13) Sanitize netpoll core and drivers wrt.  SKB handling semantics.
            Get rid of never-used-in-tree netpoll RX handling.  From Eric W
            Biederman.
      
        14) Support inter-address-family and namespace changing in VTI tunnel
            driver(s).  From Steffen Klassert.
      
        15) Add Altera TSE driver, from Vince Bridgers.
      
        16) Optimizing csum_replace2() so that it doesn't adjust the checksum
            by checksumming the entire header, from Eric Dumazet.
      
        17) Expand BPF internal implementation for faster interpreting, more
            direct translations into JIT'd code, and much cleaner uses of BPF
            filtering in non-socket ocntexts.  From Daniel Borkmann and Alexei
            Starovoitov"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
        netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
        net: Add a test to see if a skb is freeable in irq context
        qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
        net: ptp: move PTP classifier in its own file
        net: sxgbe: make "core_ops" static
        net: sxgbe: fix logical vs bitwise operation
        net: sxgbe: sxgbe_mdio_register() frees the bus
        Call efx_set_channels() before efx->type->dimension_resources()
        xen-netback: disable rogue vif in kthread context
        net/mlx4: Set proper build dependancy with vxlan
        be2net: fix build dependency on VxLAN
        mac802154: make csma/cca parameters per-wpan
        mac802154: allow only one WPAN to be up at any given time
        net: filter: minor: fix kdoc in __sk_run_filter
        netlink: don't compare the nul-termination in nla_strcmp
        can: c_can: Avoid led toggling for every packet.
        can: c_can: Simplify TX interrupt cleanup
        can: c_can: Store dlc private
        can: c_can: Reduce register access
        can: c_can: Make the code readable
        ...
      cd6362be