1. 09 Jan, 2017 24 commits
    • Yotam Gigi's avatar
      mlxsw: spectrum: Make the add_matchall_tc_entry symmetric · 65acb5d0
      Yotam Gigi authored
      Currently, the mlxsw spectrum driver only supports offloading the matchall
      classifier together with the mirred action. To allow more matchall tc
      offloads, make the code symmetric so that it can be easily extended later
      on for other actions.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65acb5d0
    • Elad Raz's avatar
      mlxsw: cmd: Fix API name comments for event-queues · 16985271
      Elad Raz authored
      Probably some copy-paste error from "int_msix" that caused "int_" prefix to
      appear in the comments for all "eq_" APIs.
      Signed-off-by: default avatarElad Raz <eladr@mellanox.com>
      Acked-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16985271
    • Elad Raz's avatar
      mlxsw: Fix mlxsw_i2c_write return value · 36ca68bf
      Elad Raz authored
      The "err" variable is been checked, return always 0.
      Signed-off-by: default avatarElad Raz <eladr@mellanox.com>
      Acked-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36ca68bf
    • Ivan Khoronzhuk's avatar
      net: ethernet: ti: cpsw: extend limits for cpsw_get/set_ringparam · f89d21b9
      Ivan Khoronzhuk authored
      Allow to set number of descs close to possible values. In case of
      minimum limit it's equal to number of channels to be able to set
      at least one desc per channel. For maximum limit leave enough descs
      number for tx channels.
      Signed-off-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f89d21b9
    • Alexandru Moise's avatar
      cls_u32: don't bother explicitly initializing ->divisor to zero · 58fa118f
      Alexandru Moise authored
      This struct member is already initialized to zero upon root_ht's
      allocation via kzalloc().
      Signed-off-by: default avatarAlexandru Moise <00moses.alexander00@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58fa118f
    • David S. Miller's avatar
      Merge branch 'siphash' · 17650f22
      David S. Miller authored
      Jason A. Donenfeld says:
      
      ====================
      Introduce The SipHash PRF
      
      This patch series introduces SipHash into the kernel. SipHash is a
      cryptographically secure PRF, which serves a variety of functions, and is
      introduced in patch #1. The following patch #2 introduces HalfSipHash,
      an optimization suitable for hash tables only. Finally, the last two patches
      in this series show two usages of the introduced siphash function family.
      It is expected that after this initial introduction, other usages will follow.
      
      Please read the extensive descriptions in patch #1 and patch #2 of what these
      functions do and the various levels of assurances. They're products of intense
      cryptographic research, and I believe they're suitable for the uses outlined
      herein.
      
      The use of SipHash is not limited to the networking subsystem -- indeed I
      would like to use it in other places too in the kernel. But after discussing
      with a few on this list and at Linus' suggestion, the initial import of these
      functions is coming through the networking tree. After these are merged, it
      will then be easier to expand use elsewhere.
      
      Changes v2->v3:
        - hsiphash keys now simply use an unsigned long, in order to avoid
          a cluttered ifdef and make it a bit more clear what's happening.
        - A typo in the documentation has been fixed.
        - The documentation has been augmented with an example relating to struct
          packing and passing.
        - The net_secret variable is now __read_mostly.
      
      Hopefully this is the last of the required revisions, and v3 can be merged
      into net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17650f22
    • Jason A. Donenfeld's avatar
      syncookies: use SipHash in place of SHA1 · fe62d05b
      Jason A. Donenfeld authored
      SHA1 is slower and less secure than SipHash, and so replacing syncookie
      generation with SipHash makes natural sense. Some BSDs have been doing
      this for several years in fact.
      
      The speedup should be similar -- and even more impressive -- to the
      speedup from the sequence number fix in this series.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe62d05b
    • Jason A. Donenfeld's avatar
      secure_seq: use SipHash in place of MD5 · 7cd23e53
      Jason A. Donenfeld authored
      This gives a clear speed and security improvement. Siphash is both
      faster and is more solid crypto than the aging MD5.
      
      Rather than manually filling MD5 buffers, for IPv6, we simply create
      a layout by a simple anonymous struct, for which gcc generates
      rather efficient code. For IPv4, we pass the values directly to the
      short input convenience functions.
      
      64-bit x86_64:
      [    1.683628] secure_tcpv6_sequence_number_md5# cycles: 99563527
      [    1.717350] secure_tcp_sequence_number_md5# cycles: 92890502
      [    1.741968] secure_tcpv6_sequence_number_siphash# cycles: 67825362
      [    1.762048] secure_tcp_sequence_number_siphash# cycles: 67485526
      
      32-bit x86:
      [    1.600012] secure_tcpv6_sequence_number_md5# cycles: 103227892
      [    1.634219] secure_tcp_sequence_number_md5# cycles: 94732544
      [    1.669102] secure_tcpv6_sequence_number_siphash# cycles: 96299384
      [    1.700165] secure_tcp_sequence_number_siphash# cycles: 86015473
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cd23e53
    • Jason A. Donenfeld's avatar
      siphash: implement HalfSipHash1-3 for hash tables · 1ae2324f
      Jason A. Donenfeld authored
      HalfSipHash, or hsiphash, is a shortened version of SipHash, which
      generates 32-bit outputs using a weaker 64-bit key. It has *much* lower
      security margins, and shouldn't be used for anything too sensitive, but
      it could be used as a hashtable key function replacement, if the output
      is never exposed, and if the security requirement is not too high.
      
      The goal is to make this something that performance-critical jhash users
      would be willing to use.
      
      On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias
      SipHash1-3 to HalfSipHash1-3 on those systems.
      
      64-bit x86_64:
      [    0.509409] test_siphash:     SipHash2-4 cycles: 4049181
      [    0.510650] test_siphash:     SipHash1-3 cycles: 2512884
      [    0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920
      [    0.512904] test_siphash:    JenkinsHash cycles:  978267
      So, we map hsiphash() -> SipHash1-3
      
      32-bit x86:
      [    0.509868] test_siphash:     SipHash2-4 cycles: 14812892
      [    0.513601] test_siphash:     SipHash1-3 cycles:  9510710
      [    0.515263] test_siphash: HalfSipHash1-3 cycles:  3856157
      [    0.515952] test_siphash:    JenkinsHash cycles:  1148567
      So, we map hsiphash() -> HalfSipHash1-3
      
      hsiphash() is roughly 3 times slower than jhash(), but comes with a
      considerable security improvement.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ae2324f
    • Jason A. Donenfeld's avatar
      siphash: add cryptographically secure PRF · 2c956a60
      Jason A. Donenfeld authored
      SipHash is a 64-bit keyed hash function that is actually a
      cryptographically secure PRF, like HMAC. Except SipHash is super fast,
      and is meant to be used as a hashtable keyed lookup function, or as a
      general PRF for short input use cases, such as sequence numbers or RNG
      chaining.
      
      For the first usage:
      
      There are a variety of attacks known as "hashtable poisoning" in which an
      attacker forms some data such that the hash of that data will be the
      same, and then preceeds to fill up all entries of a hashbucket. This is
      a realistic and well-known denial-of-service vector. Currently
      hashtables use jhash, which is fast but not secure, and some kind of
      rotating key scheme (or none at all, which isn't good). SipHash is meant
      as a replacement for jhash in these cases.
      
      There are a modicum of places in the kernel that are vulnerable to
      hashtable poisoning attacks, either via userspace vectors or network
      vectors, and there's not a reliable mechanism inside the kernel at the
      moment to fix it. The first step toward fixing these issues is actually
      getting a secure primitive into the kernel for developers to use. Then
      we can, bit by bit, port things over to it as deemed appropriate.
      
      While SipHash is extremely fast for a cryptographically secure function,
      it is likely a bit slower than the insecure jhash, and so replacements
      will be evaluated on a case-by-case basis based on whether or not the
      difference in speed is negligible and whether or not the current jhash usage
      poses a real security risk.
      
      For the second usage:
      
      A few places in the kernel are using MD5 or SHA1 for creating secure
      sequence numbers, syn cookies, port numbers, or fast random numbers.
      SipHash is a faster and more fitting, and more secure replacement for MD5
      in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
      obvious and straight-forward, and so is submitted along with this patch
      series. There shouldn't be much of a debate over its efficacy.
      
      Dozens of languages are already using this internally for their hash
      tables and PRFs. Some of the BSDs already use this in their kernels.
      SipHash is a widely known high-speed solution to a widely known set of
      problems, and it's time we catch-up.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c956a60
    • David Ahern's avatar
      net: ipv4: remove disable of bottom half in inet_rtm_getroute · eafea739
      David Ahern authored
      Nothing about the route lookup requires bottom half to be disabled.
      Remove the local_bh_disable ... local_bh_enable around ip_route_input.
      This appears to be a vestige of days gone by as it has been there
      since the beginning of git time.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eafea739
    • Philippe Reynes's avatar
      net: intel: e100: use new api ethtool_{get|set}_link_ksettings · 6b0c06e0
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b0c06e0
    • Philippe Reynes's avatar
      net: ibm: ibmvnic: use new api ethtool_{get|set}_link_ksettings · 8a43379f
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a43379f
    • Philippe Reynes's avatar
      net: ibm: ibmveth: use new api ethtool_{get|set}_link_ksettings · 9ce8c2df
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ce8c2df
    • Philippe Reynes's avatar
      net: ibm: emac: use new api ethtool_{get|set}_link_ksettings · e4ccf764
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4ccf764
    • Philippe Reynes's avatar
      net: ibm: ehea: use new api ethtool_{get|set}_link_ksettings · cecf62d6
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cecf62d6
    • yuan linyu's avatar
      net: change init_inodecache() return void · 1e911632
      yuan linyu authored
      sock_init() call it but not check it's return value,
      so change it to void return and add an internal BUG_ON() check.
      Signed-off-by: default avataryuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e911632
    • David S. Miller's avatar
      Merge branch 'tc-skb-diet' · 4289e60c
      David S. Miller authored
      Willem de Bruijn says:
      
      ====================
      convert tc_verd to integer bitfields
      
      The skb tc_verd field takes up two bytes but uses far fewer bits.
      Convert the remaining use cases to bitfields that fit in existing
      holes (depending on config options) and potentially save the two
      bytes in struct sk_buff.
      
      This patchset is based on an earlier set by Florian Westphal and its
      discussion (http://www.spinics.net/lists/netdev/msg329181.html).
      
      Patches 1 and 2 are low hanging fruit: removing the last traces of
        data that are no longer stored in tc_verd.
      
      Patches 3 and 4 convert tc_verd to individual bitfields (5 bits).
      
      Patch 5 reduces TC_AT to a single bitfield,
        as AT_STACK is not valid here (unlike in the case of TC_FROM).
      
      Patch 6 changes TC_FROM to two bitfields with clearly defined purpose.
      
      It may be possible to reduce storage further after this initial round.
      If tc_skip_classify is set only by IFB, testing skb_iif may suffice.
      The L2 header pushing/popping logic can perhaps be shared with
      AF_PACKET, which currently not pkt_type for the same purpose.
      
      Changes:
        RFC -> v1
          - (patch 3): remove no longer needed label in tfc_action_exec
          - (patch 5): set tc_at_ingress at the same points as existing
                       SET_TC_AT calls
      
      Tested ingress mirred + netem + ifb:
      
        ip link set dev ifb0 up
        tc qdisc add dev eth0 ingress
        tc filter add dev eth0 parent ffff: \
          u32 match ip dport 8000 0xffff \
          action mirred egress redirect dev ifb0
        tc qdisc add dev ifb0 root netem delay 1000ms
        nc -u -l 8000 &
        ssh $otherhost nc -u $host 8000
      
      Tested egress mirred:
      
        ip link add veth1 type veth peer name veth2
        ip link set dev veth1 up
        ip link set dev veth2 up
        tcpdump -n -i veth2 udp and dst port 8000 &
      
        tc qdisc add dev eth0 root handle 1: prio
        tc filter add dev eth0 parent 1:0 \
          u32 match ip dport 8000 0xffff \
          action mirred egress redirect dev veth1
        tc qdisc add dev veth1 root netem delay 1000ms
        nc -u $otherhost 8000
      
      Tested ingress mirred:
      
        ip link add veth1 type veth peer name veth2
        ip link add veth3 type veth peer name veth4
      
        ip netns add ns0
        ip netns add ns1
      
        for i in 1 2 3 4; do \
          NS=ns$((${i}%2)); \
          ip link set dev veth${i} netns ${NS}; \
          ip netns exec ${NS} \
            ip addr add dev veth${i} 192.168.1.${i}/24; \
          ip netns exec ${NS} \
            ip link set dev veth${i} up; \
        done
      
        ip netns exec ns0 tc qdisc add dev veth2 ingress
        ip netns exec ns0 \
          tc filter add dev veth2 parent ffff: \
            u32 match ip dport 8000 0xffff \
            action mirred ingress redirect dev veth4
      
        ip netns exec ns0 \
          tcpdump -n -i veth4 udp and dst port 8000 &
        ip netns exec ns1 \
          nc -u 192.168.1.2 8000
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4289e60c
    • Willem de Bruijn's avatar
      net-tc: convert tc_from to tc_from_ingress and tc_redirected · bc31c905
      Willem de Bruijn authored
      The tc_from field fulfills two roles. It encodes whether a packet was
      redirected by an act_mirred device and, if so, whether act_mirred was
      called on ingress or egress. Split it into separate fields.
      
      The information is needed by the special IFB loop, where packets are
      taken out of the normal path by act_mirred, forwarded to IFB, then
      reinjected at their original location (ingress or egress) by IFB.
      
      The IFB device cannot use skb->tc_at_ingress, because that may have
      been overwritten as the packet travels from act_mirred to ifb_xmit,
      when it passes through tc_classify on the IFB egress path. Cache this
      value in skb->tc_from_ingress.
      
      That field is valid only if a packet arriving at ifb_xmit came from
      act_mirred. Other packets can be crafted to reach ifb_xmit. These
      must be dropped. Set tc_redirected on redirection and drop all packets
      that do not have this bit set.
      
      Both fields are set only on cloned skbs in tc actions, so original
      packet sources do not have to clear the bit when reusing packets
      (notably, pktgen and octeon).
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc31c905
    • Willem de Bruijn's avatar
      net-tc: convert tc_at to tc_at_ingress · 8dc07fdb
      Willem de Bruijn authored
      Field tc_at is used only within tc actions to distinguish ingress from
      egress processing. A single bit is sufficient for this purpose.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8dc07fdb
    • Willem de Bruijn's avatar
      net-tc: convert tc_verd to integer bitfields · a5135bcf
      Willem de Bruijn authored
      Extract the remaining two fields from tc_verd and remove the __u16
      completely. TC_AT and TC_FROM are converted to equivalent two-bit
      integer fields tc_at and tc_from. Where possible, use existing
      helper skb_at_tc_ingress when reading tc_at. Introduce helper
      skb_reset_tc to clear fields.
      
      Not documenting tc_from and tc_at, because they will be replaced
      with single bit fields in follow-on patches.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5135bcf
    • Willem de Bruijn's avatar
      net-tc: extract skip classify bit from tc_verd · e7246e12
      Willem de Bruijn authored
      Packets sent by the IFB device skip subsequent tc classification.
      A single bit governs this state. Move it out of tc_verd in
      anticipation of removing that __u16 completely.
      
      The new bitfield tc_skip_classify temporarily uses one bit of a
      hole, until tc_verd is removed completely in a follow-up patch.
      
      Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
      With that many options, little value in documenting it.
      
      Introduce a helper function to deduplicate the logic in the two
      sites that check this bit.
      
      The field tc_skip_classify is set only in IFB on skbs cloned in
      act_mirred, so original packet sources do not have to clear the
      bit when reusing packets (notably, pktgen and octeon).
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7246e12
    • Willem de Bruijn's avatar
      net-tc: make MAX_RECLASSIFY_LOOP local · d6264071
      Willem de Bruijn authored
      This field is no longer kept in tc_verd. Remove it from the global
      definition of that struct.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6264071
    • Willem de Bruijn's avatar
      net-tc: remove unused tc_verd fields · aec745e2
      Willem de Bruijn authored
      Remove the last reference to tc_verd's munge and redirect ttl bits.
      These fields are no longer used.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aec745e2
  2. 08 Jan, 2017 16 commits