1. 22 Jan, 2014 40 commits
    • Nikolay Aleksandrov's avatar
      bonding: add infrastructure for an option API · 09117362
      Nikolay Aleksandrov authored
      This patch adds the necessary basic infrastructure to support
      centralized and unified option manipulation API for the bonding. The new
      structure bond_option will be used to describe each option with its
      dependencies on modes which will be checked automatically thus removing a
      lot of duplicated code. Also automatic range checking is added for
      some options. Currently the option setting function requires RTNL to
      be acquired prior to calling it, since many options already rely on RTNL
      it seemed like the best choice to protect all against common race
      conditions.
      In order to add an option the following steps need to be done:
      1. Add an entry BOND_OPT_<option> to bond_options.h so it gets a unique id
         and a bit corresponding to the id
      2. Add a bond_option entry to the bond_opts[] array in bond_options.c which
         describes the option, its dependencies and its manipulation function
      3. Add code to export the option through sysfs and/or as a module parameter
         (the sysfs export will be made automatically in the future)
      
      The options can have different flags set, currently the following are
      supported:
      BOND_OPTFLAG_NOSLAVES - require that the bond device has no slaves prior
                              to setting the option
      BOND_OPTFLAG_IFDOWN - require that the bond device is down prior to
                            setting the option
      BOND_OPTFLAG_RAWVAL - don't parse the value but return it raw for the
                            option to parse
      
      There's a new value structure to describe different types of values
      which can have the following flags:
      BOND_VALFLAG_DEFAULT - marks the default option (permanent string alias
                             to this option is "default")
      BOND_VALFLAG_MIN - the minimum value that this option can have
      BOND_VALFLAG_MAX - the maximum value that this option can have
      
      An example would be nice here, so if we have an option which can have
      the values "off"(2), "special"(4, default) and supports a range, say
      16 - 32, it should be defined as follows:
      "off", 2,
      "special", 4, BOND_VALFLAG_DEFAULT,
      "rangemin", 16, BOND_VALFLAG_MIN,
      "rangemax", 32, BOND_VALFLAG_MAX
      So we have the valid intervals: [2, 2], [4, 4], [16, 32]
      Also the valid strings: "off" = 2, "special" and "default" = 4
                              "rangemin" = 16, "rangemax" = 32
      
      BOND_VALFLAG_(MIN|MAX) can be used to specify a valid range for an
      option, if MIN is omitted then 0 is considered as a minimum. If an
      exact match is found in the values[] table it will be returned,
      otherwise the range is tried (if available).
      
      The option parameter passing is done by using a special structure called
      bond_opt_value which can take either a string or a value to parse. One
      of the bond_opt_init(val|str) macros should be used depending on which
      one does the user want to parse (string or value). Then a call to
      __bond_opt_set should be done under RTNL.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09117362
    • David S. Miller's avatar
      Merge branch 'reciprocal' · 374d1125
      David S. Miller authored
      Hannes Frederic Sowa says:
      
      ====================
      reciprocal_divide update
      
      This patch is on top of aee636c4 ("bpf: do not use reciprocal
      divide") from Eric that sits in net tree. It will not create a merge
      conflict, but it depends on this one, so we suggest, if possible, to
      merge net into net-next.
      
      We are proposing this change with only small modifications from the
      v2 version, namely updating the name of trim to reciprocal_scale
      (as commented on by Ben Hutchings and Eric Dumazet, thanks!).
      
      We thought about introducing the reciprocal_divide algorithm in
      parallel to the one already used by the kernel but faced organizational
      issues, leading us to the conclusion that it is best to just replace
      the old one: We could not come up with names for the different
      implementations and also with a way to describe the differences to
      guide developers which one to choose in which situation. This is
      because we cannot specify the correct semantics for the version
      which is currently used by the kernel. Altough it seems to not be
      causing problems in the kernel, we cannot surely say so in the
      case of flex_array for the future. Current usage seems ok, but
      future users could run into problems.
      
      Changelog:
      
      v1->v2:
       - changed name to prandom_u32_max in p1
       - changed name to trim in p2
       - reworked code in p3
      v2->v3:
       - p1 and p3 stays unchanged, only small update in commit
         message in p3
       - changed name to reciprocal_scale in p2
       - fixed kernel doc format
      v3->v4:
       - pseduo -> pseudo (thanks to Tilman Schmidt)
      v4->v5:
       - fix pseduo -> pseudo for real now, sorry for the noise
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      374d1125
    • Hannes Frederic Sowa's avatar
      reciprocal_divide: update/correction of the algorithm · 809fa972
      Hannes Frederic Sowa authored
      Jakub Zawadzki noticed that some divisions by reciprocal_divide()
      were not correct [1][2], which he could also show with BPF code
      after divisions are transformed into reciprocal_value() for runtime
      invariance which can be passed to reciprocal_divide() later on;
      reverse in BPF dump ended up with a different, off-by-one K in
      some situations.
      
      This has been fixed by Eric Dumazet in commit aee636c4
      ("bpf: do not use reciprocal divide"). This follow-up patch
      improves reciprocal_value() and reciprocal_divide() to work in
      all cases by using Granlund and Montgomery method, so that also
      future use is safe and without any non-obvious side-effects.
      Known problems with the old implementation were that division by 1
      always returned 0 and some off-by-ones when the dividend and divisor
      where very large. This seemed to not be problematic with its
      current users, as far as we can tell. Eric Dumazet checked for
      the slab usage, we cannot surely say so in the case of flex_array.
      Still, in order to fix that, we propose an extension from the
      original implementation from commit 6a2d7a95 resp. [3][4],
      by using the algorithm proposed in "Division by Invariant Integers
      Using Multiplication" [5], Torbjörn Granlund and Peter L.
      Montgomery, that is, pseudocode for q = n/d where q, n, d is in
      u32 universe:
      
      1) Initialization:
      
        int l = ceil(log_2 d)
        uword m' = floor((1<<32)*((1<<l)-d)/d)+1
        int sh_1 = min(l,1)
        int sh_2 = max(l-1,0)
      
      2) For q = n/d, all uword:
      
        uword t = (n*m')>>32
        q = (t+((n-t)>>sh_1))>>sh_2
      
      The assembler implementation from Agner Fog [6] also helped a lot
      while implementing. We have tested the implementation on x86_64,
      ppc64, i686, s390x; on x86_64/haswell we're still half the latency
      compared to normal divide.
      
      Joint work with Daniel Borkmann.
      
        [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
        [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
        [3] https://gmplib.org/~tege/division-paper.pdf
        [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
        [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
        [6] http://www.agner.org/optimize/asmlib.zipReported-by: default avatarJakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      809fa972
    • Daniel Borkmann's avatar
      net: introduce reciprocal_scale helper and convert users · 89770b0a
      Daniel Borkmann authored
      As David Laight suggests, we shouldn't necessarily call this
      reciprocal_divide() when users didn't requested a reciprocal_value();
      lets keep the basic idea and call it reciprocal_scale(). More
      background information on this topic can be found in [1].
      
      Joint work with Hannes Frederic Sowa.
      
        [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.htmlSuggested-by: default avatarDavid Laight <david.laight@aculab.com>
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89770b0a
    • Daniel Borkmann's avatar
      random32: add prandom_u32_max and convert open coded users · f337db64
      Daniel Borkmann authored
      Many functions have open coded a function that returns a random
      number in range [0,N-1]. Under the assumption that we have a PRNG
      such as taus113 with being well distributed in [0, ~0U] space,
      we can implement such a function as uword t = (n*m')>>32, where
      m' is a random number obtained from PRNG, n the right open interval
      border and t our resulting random number, with n,m',t in u32 universe.
      
      Lets go with Joe and simply call it prandom_u32_max(), although
      technically we have an right open interval endpoint, but that we
      have documented. Other users can further be migrated to the new
      prandom_u32_max() function later on; for now, we need to make sure
      to migrate reciprocal_divide() users for the reciprocal_divide()
      follow-up fixup since their function signatures are going to change.
      
      Joint work with Hannes Frederic Sowa.
      
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f337db64
    • Moni Shoua's avatar
      net/mlx4_core: Remove unnecessary validation for port number · 6cd28f04
      Moni Shoua authored
      This is a fix to a regression introduced by commit:
      "982290a7 net/mlx4_core: Check port number for validity
      before accessing data"
      
      IPoIB could not attach to multicast group and we get this in dmesg:
      [144214.145008] ib0: failed to attach to multicast group, ret = -22
      [144214.145016] ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
      [144214.145019] ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
      
      The cause to the problem is because port is extracted from gid[5].
      Which is only valid for Ethernet.
      Removed this validation in mlx4_qp_attach_common(), which is accessed
      from both Ethernet and IB flows.
      Error flow for bad port value in Ethernet is already exists in that
      function.
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.co.il>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cd28f04
    • Somnath Kotur's avatar
      be2net: Fix be_vlan_add/rem_vid() routines · a6b74e01
      Somnath Kotur authored
      The current logic to put interface into VLAN Promiscous mode is not correct.
      We should increment "adapter->vlans_added" before calling be_vid_config().
      Also removed some unwanted log messages.
      Signed-off-by: default avatarKalesh AP <kalesh.purayil@emulex.com>
      Signed-off-by: default avatarSomnath Kotur <somnath.kotur@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6b74e01
    • Ariel Elior's avatar
      bnx2x: Fix VF flr flow · 076d1329
      Ariel Elior authored
      When a VF originating from a given PF is flr-ed, that PF gets an interrupt
      from the chip management and takes a part in the flr process.
      
      This patch fixes several corner cases in which the driver performs its part
      of the flr flow out-of-order, causing the FW to assert due to badly timed
      messages received from the driver.
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      076d1329
    • David S. Miller's avatar
    • Daniel Borkmann's avatar
      net: filter: let bpf_tell_extensions return SKF_AD_MAX · 37692299
      Daniel Borkmann authored
      Michal Sekletar added in commit ea02f941 ("net: introduce
      SO_BPF_EXTENSIONS") a facility where user space can enquire
      the BPF ancillary instruction set, which is imho a step into
      the right direction for letting user space high-level to BPF
      optimizers make an informed decision for possibly using these
      extensions.
      
      The original rationale was to return through a getsockopt(2)
      a bitfield of which instructions are supported and which
      are not, as of right now, we just return 0 to indicate a
      base support for SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET.
      Limitations of this approach are that this API which we need
      to maintain for a long time can only support a maximum of 32
      extensions, and needs to be additionally maintained/updated
      when each new extension that comes in.
      
      I thought about this a bit more and what we can do here to
      overcome this is to just return SKF_AD_MAX. Since we never
      remove any extension since we cannot break user space and
      always linearly increase SKF_AD_MAX on each newly added
      extension, user space can make a decision on what extensions
      are supported in the whole set of extensions and which aren't,
      by just checking which of them from the whole set have an
      offset < SKF_AD_MAX of the underlying kernel.
      
      Since SKF_AD_MAX must be updated each time we add new ones,
      we don't need to introduce an additional enum and got
      maintenance for free. At some point in time when
      SO_BPF_EXTENSIONS becomes ubiquitous for most kernels, then
      an application can simply make use of this and easily be run
      on newer or older underlying kernels without needing to be
      recompiled, of course. Since that is for 3.14, it's not too
      late to do this change.
      
      Cc: Michal Sekletar <msekleta@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarMichal Sekletar <msekleta@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37692299
    • David S. Miller's avatar
      net: Fix some fallout from the etner_addr_copy() changes. · 9be68c1a
      David S. Miller authored
      net/appletalk/aarp.c: In function ‘__aarp_send_query’:
      net/appletalk/aarp.c:137:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]
       ...
      net/atm/lec.c: In function ‘send_to_lecd’:
      net/atm/lec.c:524:3: warning: passing argument 1 of ‘ether_addr_copy’ from incompatible pointer type [enabled by default]
      In file included from net/atm/lec.c:17:0:
      include/linux/etherdevice.h:227:20: note: expected ‘u8 *’ but argument is of type ‘unsigned char (*)[6]’
       ...
      net/caif/caif_usb.c: In function ‘cfusbl_create’:
      net/caif/caif_usb.c:108:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9be68c1a
    • David S. Miller's avatar
      Merge branch 'sctp' · 656edac6
      David S. Miller authored
      Wang Weidong says:
      
      ====================
      sctp: remove some macro locking wrappers
      
      In sctp.h we can find some macro locking wrappers. As Neil point out that:
      
      "Its because in the origional implementation of the sctp protocol, there was a
      user space test harness which built the kernel module for userspace execution to
      cary our some unit testing on the code.  It did so by redefining some of those
      locking macros to user space friendly code.  IIRC we haven't use those unit
      tests in years, and so should be removing them, not adding them to other
      locations."
      
      So I remove them.
      ====================
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      656edac6
    • wangweidong's avatar
      sctp: remove macros sctp_bh_[un]lock_sock · 5bc1d1b4
      wangweidong authored
      Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user
      space friendly code which we haven't use in years, so removing them.
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bc1d1b4
    • wangweidong's avatar
      sctp: remove macros sctp_{lock|release}_sock · 048ed4b6
      wangweidong authored
      Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
      code which we haven't use in years, so removing them.
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      048ed4b6
    • wangweidong's avatar
      sctp: remove macros sctp_read_[un]lock · 1b0de194
      wangweidong authored
      Redefined read_[un]lock to sctp_read_[un]lock for user space
      friendly code which we haven't use in years, and the macros
      we never used, so removing them.
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b0de194
    • wangweidong's avatar
      sctp: remove macros sctp_write_[un]_lock · 387602df
      wangweidong authored
      Redefined write_[un]lock to sctp_write_[un]lock for user space
      friendly code which we haven't use in years, so removing them.
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      387602df
    • wangweidong's avatar
      sctp: remove macros sctp_spin_[un]lock · 3c8e43ba
      wangweidong authored
      Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly
      code which we haven't use in years, so removing them.
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c8e43ba
    • wangweidong's avatar
      sctp: remove macros sctp_local_bh_{disable|enable} · 79b91130
      wangweidong authored
      Redefined local_bh_{disable|enable} to sctp_local_bh_{disable|enable}
      for user space friendly code which we haven't use in years, so removing them.
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79b91130
    • wangweidong's avatar
      sctp: remove macros sctp_spin_[un]lock_irqrestore · 940287ee
      wangweidong authored
      Redefined spin_[un]lock_irqstore to sctp_spin_[un]lock_irqrestore for user
      space friendly code which we haven't use in years, so removing them.
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      940287ee
    • Joe Perches's avatar
      dsa: Use ether_addr_copy · d08f161a
      Joe Perches authored
      Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
      save some cycles on arm and powerpc.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d08f161a
    • Joe Perches's avatar
      pktgen: Use ether_addr_copy · 9ea08b12
      Joe Perches authored
      Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
      save some cycles on arm and powerpc.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ea08b12
    • Joe Perches's avatar
      netpoll: Use ether_addr_copy · c62326ab
      Joe Perches authored
      Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
      save some cycles on arm and powerpc.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c62326ab
    • Joe Perches's avatar
      caif_usb: Use ether_addr_copy · 34b2cff4
      Joe Perches authored
      Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
      save some cycles on arm and powerpc.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34b2cff4
    • Joe Perches's avatar
      atm: Use ether_addr_copy · 116e853f
      Joe Perches authored
      Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
      save some cycles on arm and powerpc.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      116e853f
    • Joe Perches's avatar
      appletalk: Use ether_addr_copy · 90ccb6aa
      Joe Perches authored
      Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
      save some cycles on arm and powerpc.
      
      Convert struct aarp_entry.hwaddr[6] to hwaddr[ETH_ALEN].
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90ccb6aa
    • Joe Perches's avatar
      8021q: Use ether_addr_copy · 07fc67be
      Joe Perches authored
      Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
      save some cycles on arm and powerpc.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07fc67be
    • David S. Miller's avatar
      Merge branch 'gro_udp_encap' · 70fccb7e
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      net: Add GRO support for UDP encapsulating protocols
      
      This series adds GRO handlers for protocols that do UDP encapsulation, with the
      intent of being able to coalesce packets which encapsulate packets belonging to
      the same TCP session.
      
      For GRO purposes, the destination UDP port takes the role of the ether type
      field in the ethernet header or the next protocol in the IP header.
      
      The UDP GRO handler will only attempt to coalesce packets whose destination
      port is registered to have gro handler.
      
      The patches done against net-next 75e4364f "net: stmmac: fix NULL pointer
      dereference in stmmac_get_tx_hwtstamp"
      
      Or.
      
      v4 --> v5 changes:
        - followed Eric's directives to avoid using atomic get/put ops on the
          udp gro receive and complete callbacks and instead keep the rcu_read_lock
          when calling the next handler on the chain.
      
      v3 --> v4 changes:
      
        - applied feedback from Tom on some micro-optimizations that save
          branches and goto directives in the udp gro logic
      
       - applied feedback from Eric on correct RCU programming for the
         add/remove flow of the upper protocols udp gro handlers
      
      v2 --> v3 changes:
      
       - moved to use linked list to store the udp gro handlers, this solves the
         problem of consuming 512KB of memory for the handlers.
      
       - use a mark on the skb GRO CB data to disallow running the udp gro_receive twice
         on a packet, this solves the problem of udp encapsulated packets whose inner VM
         packet is udp and happen to carry a port which has registered offloads - and flush it.
      
       - invoke the udp offload protocol registration and de-registration from the vxlan driver
         in a sleepable context
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70fccb7e
    • Or Gerlitz's avatar
      net: Add GRO support for vxlan traffic · dc01e7d3
      Or Gerlitz authored
      Add GRO handlers for vxlann, by using the UDP GRO infrastructure.
      
      For single TCP session that goes through vxlan tunneling I got nice
      improvement from 6.8Gbs to 11.5Gbs
      
      --> UDP/VXLAN GRO disabled
      $ netperf  -H 192.168.52.147 -c -C
      
      $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  65536  65536    10.00      6799.75   12.54    24.79    0.604   1.195
      
      --> UDP/VXLAN GRO enabled
      
      $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  65536  65536    10.00      11562.72   24.90    20.34    0.706   0.577
      Signed-off-by: default avatarShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc01e7d3
    • Or Gerlitz's avatar
      net: Export gro_find_by_type helpers · e27a2f83
      Or Gerlitz authored
      Export the gro_find_receive/complete_by_type helpers to they can be invoked
      by the gro callbacks of encapsulation protocols such as vxlan.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e27a2f83
    • Or Gerlitz's avatar
      net: Add GRO support for UDP encapsulating protocols · b582ef09
      Or Gerlitz authored
      Add GRO handlers for protocols that do UDP encapsulation, with the intent of
      being able to coalesce packets which encapsulate packets belonging to
      the same TCP session.
      
      For GRO purposes, the destination UDP port takes the role of the ether type
      field in the ethernet header or the next protocol in the IP header.
      
      The UDP GRO handler will only attempt to coalesce packets whose destination
      port is registered to have gro handler.
      
      Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive
      code twice on a packet. This solves the problem of udp encapsulated packets whose
      inner VM packet is udp and happen to carry a port which has registered offloads.
      Signed-off-by: default avatarShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b582ef09
    • Vince Bridgers's avatar
      stmmac: Fix kernel crashes for jumbo frames · 2618abb7
      Vince Bridgers authored
      These changes correct the following issues with jumbo frames on the
      stmmac driver:
      
      1) The Synopsys EMAC can be configured to support different FIFO
      sizes at core configuration time. There's no way to query the
      controller and know the FIFO size, so the driver needs to get this
      information from the device tree in order to know how to correctly
      handle MTU changes and setting up dma buffers. The default
      max-frame-size is as currently used, which is the size of a jumbo
      frame.
      
      2) The driver was enabling Jumbo frames by default, but was not allocating
      dma buffers of sufficient size to handle the maximum possible packet
      size that could be received. This led to memory corruption since DMAs were
      occurring beyond the extent of the allocated receive buffers for certain types
      of network traffic.
      
      kernel BUG at net/core/skbuff.c:126!
      Internal error: Oops - BUG: 0 [#1] SMP ARM
      Modules linked in:
      CPU: 0 PID: 563 Comm: sockperf Not tainted 3.13.0-rc6-01523-gf7111b9 #31
      task: ef35e580 ti: ef252000 task.ti: ef252000
      PC is at skb_panic+0x60/0x64
      LR is at skb_panic+0x60/0x64
      pc : [<c03c7c3c>]    lr : [<c03c7c3c>]    psr: 60000113
      sp : ef253c18  ip : 60000113  fp : 00000000
      r10: ef3a5400  r9 : 00000ebc  r8 : ef3a546c
      r7 : ee59f000  r6 : ee59f084  r5 : ee59ff40  r4 : ee59f140
      r3 : 000003e2  r2 : 00000007  r1 : c0b9c420  r0 : 0000007d
      Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 10c5387d  Table: 2e8ac04a  DAC: 00000015
      Process sockperf (pid: 563, stack limit = 0xef252248)
      Stack: (0xef253c18 to 0xef254000)
      3c00:                                                       00000ebc ee59f000
      3c20: ee59f084 ee59ff40 ee59f140 c04a9cd8 ee8c50c0 00000ebc ee59ff40 00000000
      3c40: ee59f140 c02d0ef0 00000056 ef1eda80 ee8c50c0 00000ebc 22bbef29 c0318f8c
      3c60: 00000056 ef3a547c ffe2c716 c02c9c90 c0ba1298 ef3a5838 ef3a5838 ef3a5400
      3c80: 000020c0 ee573840 000055cb ef3f2050 c053f0e0 c0319214 22b9b085 22d92813
      3ca0: 00001c80 004b8e00 ef3a5400 ee573840 ef3f2064 22d92813 ef3f2064 000055cb
      3cc0: ef3f2050 c031a19c ef252000 00000000 00000000 c0561bc0 00000000 ff00ffff
      3ce0: c05621c0 ef3a5400 ef3f2064 ee573840 00000020 ef3f2064 000055cb ef3f2050
      3d00: c053f0e0 c031cad0 c053e740 00000e60 00000000 00000000 ee573840 ef3a5400
      3d20: ef0a6e00 00000000 ef3f2064 c032507c 00010000 00000020 c0561bc0 c0561bc0
      3d40: ee599850 c032799c 00000000 ee573840 c055a380 ef3a5400 00000000 ef3f2064
      3d60: ef3f2050 c032799c 0101c7c0 2b6755cb c059a280 c030e4d8 000055cb ffffffff
      3d80: ee574fc0 c055a380 ee574000 ee573840 00002b67 ee573840 c03fe9c4 c053fa68
      3da0: c055a380 00001f6f 00000000 ee573840 c053f0e0 c0304fdc ef0a6e01 ef3f2050
      3dc0: ee573858 ef031000 ee573840 c03055d8 c0ba0c40 ef000f40 00100100 c053f0dc
      3de0: c053ffdc c053f0f0 00000008 00000000 ef031000 c02da948 00001140 00000000
      3e00: c0563c78 ef253e5f 00000020 ee573840 00000020 c053f0f0 ef313400 ee573840
      3e20: c053f0e0 00000000 00000000 c05380c0 ef313400 00001000 00000015 c02df280
      3e40: ee574000 ef001e00 00000000 00001080 00000042 005cd980 ef031500 ef031500
      3e60: 00000000 c02df824 ef031500 c053e390 c0541084 f00b1e00 c05925e8 c02df864
      3e80: 00001f5c ef031440 c053e390 c0278524 00000002 00000000 c0b9eb48 c02df280
      3ea0: ee8c7180 00000100 c0542ca8 00000015 00000040 ef031500 ef031500 ef031500
      3ec0: c027803c ef252000 00000040 000000ec c05380c0 c0b9eb40 c0b9eb48 c02df940
      3ee0: ef060780 ffffa4dd c0564a9c c056343c 002e80a8 00000080 ef031500 00000001
      3f00: c053808c ef252000 fffec100 00000003 00000004 002e80a8 0000000c c00258f0
      3f20: 002e80a8 c005e704 00000005 00000100 c05634d0 c0538080 c05333e0 00000000
      3f40: 0000000a c0565580 c05380c0 ffffa4dc c05434f4 00400100 00000004 c0534cd4
      3f60: 00000098 00000000 fffec100 002e80a8 00000004 002e80a8 002a20e0 c0025da8
      3f80: c0534cd4 c000f020 fffec10c c053ea60 ef253fb0 c0008530 0000ffe2 b6ef67f4
      3fa0: 40000010 ffffffff 00000124 c0012f3c 0000ffe2 002e80f0 0000ffe2 00004000
      3fc0: becb6338 becb6334 00000004 00000124 002e80a8 00000004 002e80a8 002a20e0
      3fe0: becb6300 becb62f4 002773bb b6ef67f4 40000010 ffffffff 00000000 00000000
      [<c03c7c3c>] (skb_panic+0x60/0x64) from [<c02d0ef0>] (skb_put+0x4c/0x50)
      [<c02d0ef0>] (skb_put+0x4c/0x50) from [<c0318f8c>] (tcp_collapse+0x314/0x3ec)
      [<c0318f8c>] (tcp_collapse+0x314/0x3ec) from [<c0319214>]
      (tcp_try_rmem_schedule+0x1b0/0x3c4)
      [<c0319214>] (tcp_try_rmem_schedule+0x1b0/0x3c4) from [<c031a19c>]
      (tcp_data_queue+0x480/0xe6c)
      [<c031a19c>] (tcp_data_queue+0x480/0xe6c) from [<c031cad0>]
      (tcp_rcv_established+0x180/0x62c)
      [<c031cad0>] (tcp_rcv_established+0x180/0x62c) from [<c032507c>]
      (tcp_v4_do_rcv+0x13c/0x31c)
      [<c032507c>] (tcp_v4_do_rcv+0x13c/0x31c) from [<c032799c>]
      (tcp_v4_rcv+0x718/0x73c)
      [<c032799c>] (tcp_v4_rcv+0x718/0x73c) from [<c0304fdc>]
      (ip_local_deliver+0x98/0x274)
      [<c0304fdc>] (ip_local_deliver+0x98/0x274) from [<c03055d8>]
      (ip_rcv+0x420/0x758)
      [<c03055d8>] (ip_rcv+0x420/0x758) from [<c02da948>]
      (__netif_receive_skb_core+0x44c/0x5bc)
      [<c02da948>] (__netif_receive_skb_core+0x44c/0x5bc) from [<c02df280>]
      (netif_receive_skb+0x48/0xb4)
      [<c02df280>] (netif_receive_skb+0x48/0xb4) from [<c02df824>]
      (napi_gro_flush+0x70/0x94)
      [<c02df824>] (napi_gro_flush+0x70/0x94) from [<c02df864>]
      (napi_complete+0x1c/0x34)
      [<c02df864>] (napi_complete+0x1c/0x34) from [<c0278524>]
      (stmmac_poll+0x4e8/0x5c8)
      [<c0278524>] (stmmac_poll+0x4e8/0x5c8) from [<c02df940>]
      (net_rx_action+0xc4/0x1e4)
      [<c02df940>] (net_rx_action+0xc4/0x1e4) from [<c00258f0>]
      (__do_softirq+0x12c/0x2e8)
      [<c00258f0>] (__do_softirq+0x12c/0x2e8) from [<c0025da8>] (irq_exit+0x78/0xac)
      [<c0025da8>] (irq_exit+0x78/0xac) from [<c000f020>] (handle_IRQ+0x44/0x90)
      [<c000f020>] (handle_IRQ+0x44/0x90) from [<c0008530>]
      (gic_handle_irq+0x2c/0x5c)
      [<c0008530>] (gic_handle_irq+0x2c/0x5c) from [<c0012f3c>]
      (__irq_usr+0x3c/0x60)
      
      3) The driver was setting the dma buffer size after allocating dma buffers,
      which caused a system panic when changing the MTU.
      
      BUG: Bad page state in process ifconfig  pfn:2e850
      page:c0b72a00 count:0 mapcount:0 mapping:  (null) index:0x0
      page flags: 0x200(arch_1)
      Modules linked in:
      CPU: 0 PID: 566 Comm: ifconfig Not tainted 3.13.0-rc6-01523-gf7111b9 #29
      [<c001547c>] (unwind_backtrace+0x0/0xf8) from [<c00122dc>]
      (show_stack+0x10/0x14)
      [<c00122dc>] (show_stack+0x10/0x14) from [<c03c793c>] (dump_stack+0x70/0x88)
      [<c03c793c>] (dump_stack+0x70/0x88) from [<c00b2620>] (bad_page+0xc8/0x118)
      [<c00b2620>] (bad_page+0xc8/0x118) from [<c00b302c>]
      (get_page_from_freelist+0x744/0x870)
      [<c00b302c>] (get_page_from_freelist+0x744/0x870) from [<c00b40f4>]
      (__alloc_pages_nodemask+0x118/0x86c)
      [<c00b40f4>] (__alloc_pages_nodemask+0x118/0x86c) from [<c00b4858>]
      (__get_free_pages+0x10/0x54)
      [<c00b4858>] (__get_free_pages+0x10/0x54) from [<c00cba1c>]
      (kmalloc_order_trace+0x24/0xa0)
      [<c00cba1c>] (kmalloc_order_trace+0x24/0xa0) from [<c02d199c>]
      (__kmalloc_reserve.isra.21+0x24/0x70)
      [<c02d199c>] (__kmalloc_reserve.isra.21+0x24/0x70) from [<c02d240c>]
      (__alloc_skb+0x68/0x13c)
      [<c02d240c>] (__alloc_skb+0x68/0x13c) from [<c02d3930>]
      (__netdev_alloc_skb+0x3c/0xe8)
      [<c02d3930>] (__netdev_alloc_skb+0x3c/0xe8) from [<c0279378>]
      (stmmac_open+0x63c/0x1024)
      [<c0279378>] (stmmac_open+0x63c/0x1024) from [<c02e18cc>]
      (__dev_open+0xa0/0xfc)
      [<c02e18cc>] (__dev_open+0xa0/0xfc) from [<c02e1b40>]
      (__dev_change_flags+0x94/0x158)
      [<c02e1b40>] (__dev_change_flags+0x94/0x158) from [<c02e1c24>]
      (dev_change_flags+0x18/0x48)
      [<c02e1c24>] (dev_change_flags+0x18/0x48) from [<c0337bc0>]
      (devinet_ioctl+0x638/0x700)
      [<c0337bc0>] (devinet_ioctl+0x638/0x700) from [<c02c7aec>]
      (sock_ioctl+0x64/0x290)
      [<c02c7aec>] (sock_ioctl+0x64/0x290) from [<c0100890>]
      (do_vfs_ioctl+0x78/0x5b8)
      [<c0100890>] (do_vfs_ioctl+0x78/0x5b8) from [<c0100e0c>] (SyS_ioctl+0x3c/0x5c)
      [<c0100e0c>] (SyS_ioctl+0x3c/0x5c) from [<c000e760>]
      
      The fixes have been verified using reproducible, automated testing.
      Signed-off-by: default avatarVince Bridgers <vbridgers2013@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2618abb7
    • Vince Bridgers's avatar
      dts: Add a binding for Synopsys emac max-frame-size · 369ea818
      Vince Bridgers authored
      This change adds a parameter for the Synopsys 10/100/1000
      stmmac Ethernet driver to configure the maximum frame
      size supported by the EMAC driver. Synopsys allows the FIFO
      sizes to be configured when the cores are built for a particular
      device, but do not provide a way for the driver to read
      information from the device about the maximum MTU size
      supported as limited by the device's FIFO size.
      Signed-off-by: default avatarVince Bridgers <vbridgers2013@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      369ea818
    • Dan Carpenter's avatar
      rxrpc: out of bound read in debug code · 08d4d217
      Dan Carpenter authored
      Smatch complains because we are using an untrusted index into the
      rxrpc_acks[] array.  It's just a read and it's only in the debug code,
      but it's simple enough to add a check and fix it.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08d4d217
    • Yegor Yefremov's avatar
      8021q: update description · 2fa053a0
      Yegor Yefremov authored
      Replace deprecated 'vconfig' tool with 'ip' from 'iproute2'. Add
      some beautifications like replacing 'ethernet' with 'Ethernet' and
      removing unneeded spaces.
      Signed-off-by: default avatarYegor Yefremov <yegorslists@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fa053a0
    • Hannes Frederic Sowa's avatar
      ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts · 82b276cd
      Hannes Frederic Sowa authored
      Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
      connecting and binding to them. sendmsg logic does already check msg->name
      for this but must trust already connected sockets which could be set up
      for connection to ipv4 address family.
      
      Per-socket flag ipv6only is of no use here, as it is under users control
      by setsockopt.
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82b276cd
    • FX Le Bail's avatar
      446fab59
    • Peter Pan(潘卫平)'s avatar
      tcp: delete redundant calls of tcp_mtup_init() · 4d83e177
      Peter Pan(潘卫平) authored
      As tcp_rcv_state_process() has already calls tcp_mtup_init() for non-fastopen
      sock, we can delete the redundant calls of tcp_mtup_init() in
      tcp_{v4,v6}_syn_recv_sock().
      Signed-off-by: default avatarWeiping Pan <panweiping3@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d83e177
    • Daniel Borkmann's avatar
      packet: fix a couple of cppcheck warnings · f0d4eb29
      Daniel Borkmann authored
      Doesn't bring much, but also doesn't hurt us to fix 'em:
      
      1) In tpacket_rcv() flush dcache page we can restirct the scope
         for start and end and remove one layer of indent.
      
      2) In tpacket_destruct_skb() we can restirct the scope for ph.
      
      3) In alloc_one_pg_vec_page() we can remove the NULL assignment
         and change spacing a bit.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0d4eb29
    • Duan Jiong's avatar
      ipv4: remove the useless argument from ip_tunnel_hash() · 967680e0
      Duan Jiong authored
      Since commit c5441932("GRE: Refactor GRE tunneling code")
      introduced function ip_tunnel_hash(), the argument itn is no
      longer in use, so remove it.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      967680e0
    • stephen hemminger's avatar