1. 22 Oct, 2015 18 commits
  2. 03 Oct, 2015 22 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.1.10 · 27f1b7fe
      Greg Kroah-Hartman authored
      27f1b7fe
    • Kyle Evans's avatar
      hp-wmi: limit hotkey enable · 85695fdc
      Kyle Evans authored
      commit 8a1513b4 upstream.
      
      Do not write initialize magic on systems that do not have
      feature query 0xb. Fixes Bug #82451.
      
      Redefine FEATURE_QUERY to align with 0xb and FEATURE2 with 0xd
      for code clearity.
      
      Add a new test function, hp_wmi_bios_2008_later() & simplify
      hp_wmi_bios_2009_later(), which fixes a bug in cases where
      an improper value is returned. Probably also fixes Bug #69131.
      
      Add missing __init tag.
      Signed-off-by: default avatarKyle Evans <kvans32@gmail.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85695fdc
    • Luis Henriques's avatar
      zram: fix possible use after free in zcomp_create() · 6e3105d5
      Luis Henriques authored
      commit 3aaf14da upstream.
      
      zcomp_create() verifies the success of zcomp_strm_{multi,single}_create()
      through comp->stream, which can potentially be pointing to memory that
      was freed if these functions returned an error.
      
      While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic
      'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create()
      could return other error codes.  Function documentation updated
      accordingly.
      
      Fixes: beca3ec7 ("zram: add multi stream functionality")
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e3105d5
    • Herbert Xu's avatar
      netlink: Replace rhash_portid with bound · d4862367
      Herbert Xu authored
      [ Upstream commit da314c99 ]
      
      On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
      >
      > store_release and load_acquire are different from the usual memory
      > barriers and can't be paired this way.  You have to pair store_release
      > and load_acquire.  Besides, it isn't a particularly good idea to
      
      OK I've decided to drop the acquire/release helpers as they don't
      help us at all and simply pessimises the code by using full memory
      barriers (on some architectures) where only a write or read barrier
      is needed.
      
      > depend on memory barriers embedded in other data structures like the
      > above.  Here, especially, rhashtable_insert() would have write barrier
      > *before* the entry is hashed not necessarily *after*, which means that
      > in the above case, a socket which appears to have set bound to a
      > reader might not visible when the reader tries to look up the socket
      > on the hashtable.
      
      But you are right we do need an explicit write barrier here to
      ensure that the hashing is visible.
      
      > There's no reason to be overly smart here.  This isn't a crazy hot
      > path, write barriers tend to be very cheap, store_release more so.
      > Please just do smp_store_release() and note what it's paired with.
      
      It's not about being overly smart.  It's about actually understanding
      what's going on with the code.  I've seen too many instances of
      people simply sprinkling synchronisation primitives around without
      any knowledge of what is happening underneath, which is just a recipe
      for creating hard-to-debug races.
      
      > > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
      > >  		}
      > >  	}
      > >
      > > -	if (!nlk->portid) {
      > > +	if (!nlk->bound) {
      >
      > I don't think you can skip load_acquire here just because this is the
      > second deref of the variable.  That doesn't change anything.  Race
      > condition could still happen between the first and second tests and
      > skipping the second would lead to the same kind of bug.
      
      The reason this one is OK is because we do not use nlk->portid or
      try to get nlk from the hash table before we return to user-space.
      
      However, there is a real bug here that none of these acquire/release
      helpers discovered.  The two bound tests here used to be a single
      one.  Now that they are separate it is entirely possible for another
      thread to come in the middle and bind the socket.  So we need to
      repeat the portid check in order to maintain consistency.
      
      > > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
      > >  	    !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
      > >  		return -EPERM;
      > >
      > > -	if (!nlk->portid)
      > > +	if (!nlk->bound)
      >
      > Don't we need load_acquire here too?  Is this path holding a lock
      > which makes that unnecessary?
      
      Ditto.
      
      ---8<---
      The commit 1f770c0a ("netlink:
      Fix autobind race condition that leads to zero port ID") created
      some new races that can occur due to inconcsistencies between the
      two port IDs.
      
      Tejun is right that a barrier is unavoidable.  Therefore I am
      reverting to the original patch that used a boolean to indicate
      that a user netlink socket has been bound.
      
      Barriers have been added where necessary to ensure that a valid
      portid and the hashed socket is visible.
      
      I have also changed netlink_insert to only return EBUSY if the
      socket is bound to a portid different to the requested one.  This
      combined with only reading nlk->bound once in netlink_bind fixes
      a race where two threads that bind the socket at the same time
      with different port IDs may both succeed.
      
      Fixes: 1f770c0a ("netlink: Fix autobind race condition that leads to zero port ID")
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Nacked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4862367
    • Herbert Xu's avatar
      netlink: Fix autobind race condition that leads to zero port ID · 4e277624
      Herbert Xu authored
      [ Upstream commit 1f770c0a ]
      
      The commit c0bb07df ("netlink:
      Reset portid after netlink_insert failure") introduced a race
      condition where if two threads try to autobind the same socket
      one of them may end up with a zero port ID.  This led to kernel
      deadlocks that were observed by multiple people.
      
      This patch reverts that commit and instead fixes it by introducing
      a separte rhash_portid variable so that the real portid is only set
      after the socket has been successfully hashed.
      
      Fixes: c0bb07df ("netlink: Reset portid after netlink_insert failure")
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e277624
    • Stas Sergeev's avatar
      mvneta: use inband status only when explicitly enabled · d6001764
      Stas Sergeev authored
      [ Upstream commit f8af8e6e in net-next tree,
        will be pushed to Linus very soon. ]
      
      The commit 898b2970 ("mvneta: implement SGMII-based in-band link state
      signaling") implemented the link parameters auto-negotiation unconditionally.
      Unfortunately it appears that some HW that implements SGMII protocol,
      doesn't generate the inband status, so it is not possible to auto-negotiate
      anything with such HW.
      
      This patch enables the auto-negotiation only if explicitly requested with
      the 'managed' DT property.
      
      This patch fixes the following regression:
      https://lkml.org/lkml/2015/7/8/865Signed-off-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      
      CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      CC: netdev@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6001764
    • Stas Sergeev's avatar
      of_mdio: add new DT property 'managed' to specify the PHY management type · ebfd3e10
      Stas Sergeev authored
      [ Upstream commit 4cba5c21 in net-next tree,
        will be pushed to Linus very soon. ]
      
      Currently the PHY management type is selected by the MAC driver arbitrary.
      The decision is based on the presence of the "fixed-link" node and on a
      will of the driver's authors.
      This caused a regression recently, when mvneta driver suddenly started
      to use the in-band status for auto-negotiation on fixed links.
      It appears the auto-negotiation may not work when expected by the MAC driver.
      Sebastien Rannou explains:
      << Yes, I confirm that my HW does not generate an in-band status. AFAIK, it's
      a PHY that aggregates 4xSGMIIs to 1xQSGMII ; the MAC side of the PHY (with
      inband status) is connected to the switch through QSGMII, and in this context
      we are on the media side of the PHY. >>
      https://lkml.org/lkml/2015/7/10/206
      
      This patch introduces the new string property 'managed' that allows
      the user to set the management type explicitly.
      The supported values are:
      "auto" - default. Uses either MDIO or nothing, depending on the presence
      of the fixed-link node
      "in-band-status" - use in-band status
      Signed-off-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      
      CC: Rob Herring <robh+dt@kernel.org>
      CC: Pawel Moll <pawel.moll@arm.com>
      CC: Mark Rutland <mark.rutland@arm.com>
      CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
      CC: Kumar Gala <galak@codeaurora.org>
      CC: Florian Fainelli <f.fainelli@gmail.com>
      CC: Grant Likely <grant.likely@linaro.org>
      CC: devicetree@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebfd3e10
    • Stas Sergeev's avatar
      net: phy: fixed_phy: handle link-down case · 282117ac
      Stas Sergeev authored
      [ Upstream 868a4215 in net-next tree,
        will be pushed to Linus very soon. ]
      
      fixed_phy_register() currently hardcodes the fixed PHY link to 1, and
      expects to find a "speed" parameter to provide correct information
      towards the fixed PHY consumer.
      
      In a subsequent change, where we allow "managed" (e.g: (RS)GMII in-band
      status auto-negotiation) fixed PHYs, none of these parameters can be
      provided since they will be auto-negotiated, hence, we just provide a
      zero-initialized fixed_phy_status to fixed_phy_register() which makes it
      fail when we call fixed_phy_update_regs() since status.speed = 0 which
      makes us hit the "default" label and error out.
      
      Without this change, we would also see potentially inconsistent
      speed/duplex parameters for fixed PHYs when the link is DOWN.
      
      CC: netdev@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      [florian: add more background to why this is correct and desirable]
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      282117ac
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Do not override speed settings · 90eb52c9
      Florian Fainelli authored
      [ Upstream d2eac98f in net-next tree,
        will be pushed to Linus very soon. ]
      
      The SF2 driver currently overrides speed settings for its port
      configured using a fixed PHY, this is both unnecessary and incorrect,
      because we keep feedback to the hardware parameters that we read from
      the PHY device, which in the case of a fixed PHY cannot possibly change
      speed.
      
      This is a required change to allow the fixed PHY code to allow
      registering a PHY with a link configured as DOWN by default and avoid
      some sort of circular dependency where we require the link_update
      callback to run to program the hardware, and we then utilize the fixed
      PHY parameters to program the hardware with the same settings.
      
      Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90eb52c9
    • Wilson Kok's avatar
      fib_rules: fix fib rule dumps across multiple skbs · fce13464
      Wilson Kok authored
      [ Upstream commit 41fc0143 ]
      
      dump_rules returns skb length and not error.
      But when family == AF_UNSPEC, the caller of dump_rules
      assumes that it returns an error. Hence, when family == AF_UNSPEC,
      we continue trying to dump on -EMSGSIZE errors resulting in
      incorrect dump idx carried between skbs belonging to the same dump.
      This results in fib rule dump always only dumping rules that fit
      into the first skb.
      
      This patch fixes dump_rules to return error so that we exit correctly
      and idx is correctly maintained between skbs that are part of the
      same dump.
      Signed-off-by: default avatarWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fce13464
    • WANG Cong's avatar
      net: revert "net_sched: move tp->root allocation into fw_init()" · 74bff4a0
      WANG Cong authored
      [ Upstream commit d8aecb10 ]
      
      fw filter uses tp->root==NULL to check if it is the old method,
      so it doesn't need allocation at all in this case. This patch
      reverts the offending commit and adds some comments for old
      method to make it obvious.
      
      Fixes: 33f8b9ec ("net_sched: move tp->root allocation into fw_init()")
      Reported-by: default avatarAkshat Kakkar <akshat.1984@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74bff4a0
    • Eric Dumazet's avatar
      tcp: add proper TS val into RST packets · c3647e60
      Eric Dumazet authored
      [ Upstream commit 675ee231 ]
      
      RST packets sent on behalf of TCP connections with TS option (RFC 7323
      TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr.
      
      A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100
      ecr 0], length 0
      B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss
      1460,nop,nop,TS val 7264344 ecr 100], length 0
      A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr
      7264344], length 0
      
      B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0
      ecr 110], length 0
      
      We need to call skb_mstamp_get() to get proper TS val,
      derived from skb->skb_mstamp
      
      Note that RFC 1323 was advocating to not send TS option in RST segment,
      but RFC 7323 recommends the opposite :
      
        Once TSopt has been successfully negotiated, that is both <SYN> and
        <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
        segment for the duration of the connection, and SHOULD be sent in an
        <RST> segment (see Section 5.2 for details)
      
      Note this RFC recommends to send TS val = 0, but we believe it is
      premature : We do not know if all TCP stacks are properly
      handling the receive side :
      
         When an <RST> segment is
         received, it MUST NOT be subjected to the PAWS check by verifying an
         acceptable value in SEG.TSval, and information from the Timestamps
         option MUST NOT be used to update connection state information.
         SEG.TSecr MAY be used to provide stricter <RST> acceptance checks.
      
      In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider
      to decide to send TS val = 0, if it buys something.
      
      Fixes: 7faee5c0 ("tcp: remove TCP_SKB_CB(skb)->when")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3647e60
    • Jesse Gross's avatar
      openvswitch: Zero flows on allocation. · 6d80e350
      Jesse Gross authored
      [ Upstream commit ae5f2fb1 ]
      
      When support for megaflows was introduced, OVS needed to start
      installing flows with a mask applied to them. Since masking is an
      expensive operation, OVS also had an optimization that would only
      take the parts of the flow keys that were covered by a non-zero
      mask. The values stored in the remaining pieces should not matter
      because they are masked out.
      
      While this works fine for the purposes of matching (which must always
      look at the mask), serialization to netlink can be problematic. Since
      the flow and the mask are serialized separately, the uninitialized
      portions of the flow can be encoded with whatever values happen to be
      present.
      
      In terms of functionality, this has little effect since these fields
      will be masked out by definition. However, it leaks kernel memory to
      userspace, which is a potential security vulnerability. It is also
      possible that other code paths could look at the masked key and get
      uninitialized data, although this does not currently appear to be an
      issue in practice.
      
      This removes the mask optimization for flows that are being installed.
      This was always intended to be the case as the mask optimizations were
      really targetting per-packet flow operations.
      
      Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d80e350
    • Michael S. Tsirkin's avatar
      macvtap: fix TUNSETSNDBUF values > 64k · cf9cf6bc
      Michael S. Tsirkin authored
      [ Upstream commit 3ea79249 ]
      
      Upon TUNSETSNDBUF,  macvtap reads the requested sndbuf size into
      a local variable u.
      commit 39ec7de7 ("macvtap: fix uninitialized access on
      TUNSETIFF") changed its type to u16 (which is the right thing to
      do for all other macvtap ioctls), breaking all values > 64k.
      
      The value of TUNSETSNDBUF is actually a signed 32 bit integer, so
      the right thing to do is to read it into an int.
      
      Cc: David S. Miller <davem@davemloft.net>
      Fixes: 39ec7de7 ("macvtap: fix uninitialized access on TUNSETIFF")
      Reported-by: Mark A. Peloquin
      Bisected-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf9cf6bc
    • Eric Dumazet's avatar
      net/mlx4_en: really allow to change RSS key · fd0a1a9d
      Eric Dumazet authored
      [ Upsteam commit 4671fc6d ]
      
      When changing rss key, we do not want to overwrite user provided key
      by the one provided by netdev_rss_key_fill(), which is the host random
      key generated at boot time.
      
      Fixes: 947cbb0a ("net/mlx4_en: Support for configurable RSS hash function")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Eyal Perry <eyalpe@mellanox.com>
      CC: Amir Vadai <amirv@mellanox.com>
      Acked-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd0a1a9d
    • Linus Lüssing's avatar
      bridge: fix igmpv3 / mldv2 report parsing · 022ce825
      Linus Lüssing authored
      [ Upstream commit c2d4fbd2 ]
      
      With the newly introduced helper functions the skb pulling is hidden in
      the checksumming function - and undone before returning to the caller.
      
      The IGMPv3 and MLDv2 report parsing functions in the bridge still
      assumed that the skb is pointing to the beginning of the IGMP/MLD
      message while it is now kept at the beginning of the IPv4/6 header,
      breaking the message parsing and creating packet loss.
      
      Fixing this by taking the offset between IP and IGMP/MLD header into
      account, too.
      
      Fixes: 9afd85c9 ("net: Export IGMP/MLD message validation code")
      Reported-by: default avatarTobias Powalowski <tobias.powalowski@googlemail.com>
      Tested-by: default avatarTobias Powalowski <tobias.powalowski@googlemail.com>
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      022ce825
    • Marcelo Ricardo Leitner's avatar
      sctp: fix race on protocol/netns initialization · 5cadd6ba
      Marcelo Ricardo Leitner authored
      [ Upstream commit 8e2d61e0 ]
      
      Consider sctp module is unloaded and is being requested because an user
      is creating a sctp socket.
      
      During initialization, sctp will add the new protocol type and then
      initialize pernet subsys:
      
              status = sctp_v4_protosw_init();
              if (status)
                      goto err_protosw_init;
      
              status = sctp_v6_protosw_init();
              if (status)
                      goto err_v6_protosw_init;
      
              status = register_pernet_subsys(&sctp_net_ops);
      
      The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
      is possible for userspace to create SCTP sockets like if the module is
      already fully loaded. If that happens, one of the possible effects is
      that we will have readers for net->sctp.local_addr_list list earlier
      than expected and sctp_net_init() does not take precautions while
      dealing with that list, leading to a potential panic but not limited to
      that, as sctp_sock_init() will copy a bunch of blank/partially
      initialized values from net->sctp.
      
      The race happens like this:
      
           CPU 0                           |  CPU 1
        socket()                           |
         __sock_create                     | socket()
          inet_create                      |  __sock_create
           list_for_each_entry_rcu(        |
              answer, &inetsw[sock->type], |
              list) {                      |   inet_create
            /* no hits */                  |
           if (unlikely(err)) {            |
            ...                            |
            request_module()               |
            /* socket creation is blocked  |
             * the module is fully loaded  |
             */                            |
             sctp_init                     |
              sctp_v4_protosw_init         |
               inet_register_protosw       |
                list_add_rcu(&p->list,     |
                             last_perm);   |
                                           |  list_for_each_entry_rcu(
                                           |     answer, &inetsw[sock->type],
              sctp_v6_protosw_init         |     list) {
                                           |     /* hit, so assumes protocol
                                           |      * is already loaded
                                           |      */
                                           |  /* socket creation continues
                                           |   * before netns is initialized
                                           |   */
              register_pernet_subsys       |
      
      Simply inverting the initialization order between
      register_pernet_subsys() and sctp_v4_protosw_init() is not possible
      because register_pernet_subsys() will create a control sctp socket, so
      the protocol must be already visible by then. Deferring the socket
      creation to a work-queue is not good specially because we loose the
      ability to handle its errors.
      
      So, as suggested by Vlad, the fix is to split netns initialization in
      two moments: defaults and control socket, so that the defaults are
      already loaded by when we register the protocol, while control socket
      initialization is kept at the same moment it is today.
      
      Fixes: 4db67e80 ("sctp: Make the address lists per network namespace")
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cadd6ba
    • Daniel Borkmann's avatar
      netlink, mmap: transform mmap skb into full skb on taps · 65d48c63
      Daniel Borkmann authored
      [ Upstream commit 1853c949 ]
      
      Ken-ichirou reported that running netlink in mmap mode for receive in
      combination with nlmon will throw a NULL pointer dereference in
      __kfree_skb() on nlmon_xmit(), in my case I can also trigger an "unable
      to handle kernel paging request". The problem is the skb_clone() in
      __netlink_deliver_tap_skb() for skbs that are mmaped.
      
      I.e. the cloned skb doesn't have a destructor, whereas the mmap netlink
      skb has it pointed to netlink_skb_destructor(), set in the handler
      netlink_ring_setup_skb(). There, skb->head is being set to NULL, so
      that in such cases, __kfree_skb() doesn't perform a skb_release_data()
      via skb_release_all(), where skb->head is possibly being freed through
      kfree(head) into slab allocator, although netlink mmap skb->head points
      to the mmap buffer. Similarly, the same has to be done also for large
      netlink skbs where the data area is vmalloced. Therefore, as discussed,
      make a copy for these rather rare cases for now. This fixes the issue
      on my and Ken-ichirou's test-cases.
      
      Reference: http://thread.gmane.org/gmane.linux.network/371129
      Fixes: bcbde0d4 ("net: netlink: virtual tap device management")
      Reported-by: default avatarKen-ichirou MATSUZAWA <chamaken@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarKen-ichirou MATSUZAWA <chamaken@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65d48c63
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Fix 64-bits register writes · 85b72208
      Florian Fainelli authored
      [ Upstream commit 03679a14 ]
      
      The macro to write 64-bits quantities to the 32-bits register swapped
      the value and offsets arguments, we want to preserve the ordering of the
      arguments with respect to how writel() is implemented for instance:
      value first, offset/base second.
      
      Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85b72208
    • Roopa Prabhu's avatar
      ipv6: fix multipath route replace error recovery · c59231c8
      Roopa Prabhu authored
      [ Upstream commit 6b9ea5a6 ]
      
      Problem:
      The ecmp route replace support for ipv6 in the kernel, deletes the
      existing ecmp route too early, ie when it installs the first nexthop.
      If there is an error in installing the subsequent nexthops, its too late
      to recover the already deleted existing route leaving the fib
      in an inconsistent state.
      
      This patch reduces the possibility of this by doing the following:
      a) Changes the existing multipath route add code to a two stage process:
        build rt6_infos + insert them
      	ip6_route_add rt6_info creation code is moved into
      	ip6_route_info_create.
      b) This ensures that most errors are caught during building rt6_infos
        and we fail early
      c) Separates multipath add and del code. Because add needs the special
        two stage mode in a) and delete essentially does not care.
      d) In any event if the code fails during inserting a route again, a
        warning is printed (This should be unlikely)
      
      Before the patch:
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      /* Try replacing the route with a duplicate nexthop */
      $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
      fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
      swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
      RTNETLINK answers: File exists
      
      $ip -6 route show
      /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
       * kernel */
      
      After the patch:
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      /* Try replacing the route with a duplicate nexthop */
      $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
      fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
      swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
      RTNETLINK answers: File exists
      
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c59231c8
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Fix ageing conditions and operation · 0f5262e8
      Florian Fainelli authored
      [ Upstream commit 39797a27 ]
      
      The comparison check between cur_hw_state and hw_state is currently
      invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while
      hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is
      going to cause an additional aging to occur. Fix this by not shifting
      cur_hw_state while reading it, but instead, mask the value with the
      appropriately shitfted bitmask.
      
      The other problem with the fast-ageing process is that we did not set
      the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically
      learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL
      register to avoid leaving spurious bits sets from one operation to the
      other.
      
      Fixes: 12f460f2 ("net: dsa: bcm_sf2: add HW bridging support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f5262e8
    • Richard Laing's avatar
      net/ipv6: Correct PIM6 mrt_lock handling · 5008d77e
      Richard Laing authored
      [ Upstream commit 25b4a44c ]
      
      In the IPv6 multicast routing code the mrt_lock was not being released
      correctly in the MFC iterator, as a result adding or deleting a MIF would
      cause a hang because the mrt_lock could not be acquired.
      
      This fix is a copy of the code for the IPv4 case and ensures that the lock
      is released correctly.
      Signed-off-by: default avatarRichard Laing <richard.laing@alliedtelesis.co.nz>
      Acked-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5008d77e