1. 22 Oct, 2015 21 commits
  2. 03 Oct, 2015 19 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.2.3 · fcba09f2
      Greg Kroah-Hartman authored
      fcba09f2
    • Kyle Evans's avatar
      hp-wmi: limit hotkey enable · b2b2c7be
      Kyle Evans authored
      commit 8a1513b4 upstream.
      
      Do not write initialize magic on systems that do not have
      feature query 0xb. Fixes Bug #82451.
      
      Redefine FEATURE_QUERY to align with 0xb and FEATURE2 with 0xd
      for code clearity.
      
      Add a new test function, hp_wmi_bios_2008_later() & simplify
      hp_wmi_bios_2009_later(), which fixes a bug in cases where
      an improper value is returned. Probably also fixes Bug #69131.
      
      Add missing __init tag.
      Signed-off-by: default avatarKyle Evans <kvans32@gmail.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2b2c7be
    • Luis Henriques's avatar
      zram: fix possible use after free in zcomp_create() · 6abf903c
      Luis Henriques authored
      commit 3aaf14da upstream.
      
      zcomp_create() verifies the success of zcomp_strm_{multi,single}_create()
      through comp->stream, which can potentially be pointing to memory that
      was freed if these functions returned an error.
      
      While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic
      'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create()
      could return other error codes.  Function documentation updated
      accordingly.
      
      Fixes: beca3ec7 ("zram: add multi stream functionality")
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6abf903c
    • Carol L Soto's avatar
      net/mlx4_core: Capping number of requested MSIXs to MAX_MSIX · 92b52680
      Carol L Soto authored
      [ Upstream commit 9293267a ]
      
      We currently manage IRQs in pool_bm which is a bit field
      of MAX_MSIX bits. Thus, allocating more than MAX_MSIX
      interrupts can't be managed in pool_bm.
      Fixing this by capping number of requested MSIXs to
      MAX_MSIX.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarCarol L Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92b52680
    • Stas Sergeev's avatar
      mvneta: use inband status only when explicitly enabled · 0d106a6a
      Stas Sergeev authored
      [ Upstream commit f8af8e6e in net-next tree,
        will be pushed to Linus very soon. ]
      
      The commit 898b2970 ("mvneta: implement SGMII-based in-band link state
      signaling") implemented the link parameters auto-negotiation unconditionally.
      Unfortunately it appears that some HW that implements SGMII protocol,
      doesn't generate the inband status, so it is not possible to auto-negotiate
      anything with such HW.
      
      This patch enables the auto-negotiation only if explicitly requested with
      the 'managed' DT property.
      
      This patch fixes the following regression:
      https://lkml.org/lkml/2015/7/8/865Signed-off-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      
      CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      CC: netdev@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d106a6a
    • Stas Sergeev's avatar
      of_mdio: add new DT property 'managed' to specify the PHY management type · 40448fc0
      Stas Sergeev authored
      [ Upstream commit 4cba5c21 in net-next tree,
        will be pushed to Linus very soon. ]
      
      Currently the PHY management type is selected by the MAC driver arbitrary.
      The decision is based on the presence of the "fixed-link" node and on a
      will of the driver's authors.
      This caused a regression recently, when mvneta driver suddenly started
      to use the in-band status for auto-negotiation on fixed links.
      It appears the auto-negotiation may not work when expected by the MAC driver.
      Sebastien Rannou explains:
      << Yes, I confirm that my HW does not generate an in-band status. AFAIK, it's
      a PHY that aggregates 4xSGMIIs to 1xQSGMII ; the MAC side of the PHY (with
      inband status) is connected to the switch through QSGMII, and in this context
      we are on the media side of the PHY. >>
      https://lkml.org/lkml/2015/7/10/206
      
      This patch introduces the new string property 'managed' that allows
      the user to set the management type explicitly.
      The supported values are:
      "auto" - default. Uses either MDIO or nothing, depending on the presence
      of the fixed-link node
      "in-band-status" - use in-band status
      Signed-off-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      
      CC: Rob Herring <robh+dt@kernel.org>
      CC: Pawel Moll <pawel.moll@arm.com>
      CC: Mark Rutland <mark.rutland@arm.com>
      CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
      CC: Kumar Gala <galak@codeaurora.org>
      CC: Florian Fainelli <f.fainelli@gmail.com>
      CC: Grant Likely <grant.likely@linaro.org>
      CC: devicetree@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40448fc0
    • Stas Sergeev's avatar
      net: phy: fixed_phy: handle link-down case · bfba942d
      Stas Sergeev authored
      [ Upstream 868a4215 in net-next tree,
        will be pushed to Linus very soon. ]
      
      fixed_phy_register() currently hardcodes the fixed PHY link to 1, and
      expects to find a "speed" parameter to provide correct information
      towards the fixed PHY consumer.
      
      In a subsequent change, where we allow "managed" (e.g: (RS)GMII in-band
      status auto-negotiation) fixed PHYs, none of these parameters can be
      provided since they will be auto-negotiated, hence, we just provide a
      zero-initialized fixed_phy_status to fixed_phy_register() which makes it
      fail when we call fixed_phy_update_regs() since status.speed = 0 which
      makes us hit the "default" label and error out.
      
      Without this change, we would also see potentially inconsistent
      speed/duplex parameters for fixed PHYs when the link is DOWN.
      
      CC: netdev@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      [florian: add more background to why this is correct and desirable]
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfba942d
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Do not override speed settings · b11c94db
      Florian Fainelli authored
      [ Upstream d2eac98f in net-next tree,
        will be pushed to Linus very soon. ]
      
      The SF2 driver currently overrides speed settings for its port
      configured using a fixed PHY, this is both unnecessary and incorrect,
      because we keep feedback to the hardware parameters that we read from
      the PHY device, which in the case of a fixed PHY cannot possibly change
      speed.
      
      This is a required change to allow the fixed PHY code to allow
      registering a PHY with a link configured as DOWN by default and avoid
      some sort of circular dependency where we require the link_update
      callback to run to program the hardware, and we then utilize the fixed
      PHY parameters to program the hardware with the same settings.
      
      Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b11c94db
    • Guillaume Nault's avatar
      ppp: fix lockdep splat in ppp_dev_uninit() · 4c8f9d6c
      Guillaume Nault authored
      [ Upstream commit 58a89eca ]
      
      ppp_dev_uninit() locks all_ppp_mutex while under rtnl mutex protection.
      ppp_create_interface() must then lock these mutexes in that same order
      to avoid possible deadlock.
      
      [  120.880011] ======================================================
      [  120.880011] [ INFO: possible circular locking dependency detected ]
      [  120.880011] 4.2.0 #1 Not tainted
      [  120.880011] -------------------------------------------------------
      [  120.880011] ppp-apitest/15827 is trying to acquire lock:
      [  120.880011]  (&pn->all_ppp_mutex){+.+.+.}, at: [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
      [  120.880011]
      [  120.880011] but task is already holding lock:
      [  120.880011]  (rtnl_mutex){+.+.+.}, at: [<ffffffff812e4255>] rtnl_lock+0x12/0x14
      [  120.880011]
      [  120.880011] which lock already depends on the new lock.
      [  120.880011]
      [  120.880011]
      [  120.880011] the existing dependency chain (in reverse order) is:
      [  120.880011]
      [  120.880011] -> #1 (rtnl_mutex){+.+.+.}:
      [  120.880011]        [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
      [  120.880011]        [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
      [  120.880011]        [<ffffffff812e4255>] rtnl_lock+0x12/0x14
      [  120.880011]        [<ffffffff812d9d94>] register_netdev+0x11/0x27
      [  120.880011]        [<ffffffffa0147b17>] ppp_ioctl+0x289/0xc98 [ppp_generic]
      [  120.880011]        [<ffffffff8113b367>] do_vfs_ioctl+0x4ea/0x532
      [  120.880011]        [<ffffffff8113b3fd>] SyS_ioctl+0x4e/0x7d
      [  120.880011]        [<ffffffff813ad7d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      [  120.880011]
      [  120.880011] -> #0 (&pn->all_ppp_mutex){+.+.+.}:
      [  120.880011]        [<ffffffff8107334e>] __lock_acquire+0xb07/0xe76
      [  120.880011]        [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
      [  120.880011]        [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
      [  120.880011]        [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
      [  120.880011]        [<ffffffff812d5263>] rollback_registered_many+0x19e/0x252
      [  120.880011]        [<ffffffff812d5381>] rollback_registered+0x29/0x38
      [  120.880011]        [<ffffffff812d53fa>] unregister_netdevice_queue+0x6a/0x77
      [  120.880011]        [<ffffffffa0146a94>] ppp_release+0x42/0x79 [ppp_generic]
      [  120.880011]        [<ffffffff8112d9f6>] __fput+0xec/0x192
      [  120.880011]        [<ffffffff8112dacc>] ____fput+0x9/0xb
      [  120.880011]        [<ffffffff8105447a>] task_work_run+0x66/0x80
      [  120.880011]        [<ffffffff81001801>] prepare_exit_to_usermode+0x8c/0xa7
      [  120.880011]        [<ffffffff81001900>] syscall_return_slowpath+0xe4/0x104
      [  120.880011]        [<ffffffff813ad931>] int_ret_from_sys_call+0x25/0x9f
      [  120.880011]
      [  120.880011] other info that might help us debug this:
      [  120.880011]
      [  120.880011]  Possible unsafe locking scenario:
      [  120.880011]
      [  120.880011]        CPU0                    CPU1
      [  120.880011]        ----                    ----
      [  120.880011]   lock(rtnl_mutex);
      [  120.880011]                                lock(&pn->all_ppp_mutex);
      [  120.880011]                                lock(rtnl_mutex);
      [  120.880011]   lock(&pn->all_ppp_mutex);
      [  120.880011]
      [  120.880011]  *** DEADLOCK ***
      
      Fixes: 8cb775bc ("ppp: fix device unregistration upon netns deletion")
      Reported-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c8f9d6c
    • Wilson Kok's avatar
      fib_rules: fix fib rule dumps across multiple skbs · 45c191bb
      Wilson Kok authored
      [ Upstream commit 41fc0143 ]
      
      dump_rules returns skb length and not error.
      But when family == AF_UNSPEC, the caller of dump_rules
      assumes that it returns an error. Hence, when family == AF_UNSPEC,
      we continue trying to dump on -EMSGSIZE errors resulting in
      incorrect dump idx carried between skbs belonging to the same dump.
      This results in fib rule dump always only dumping rules that fit
      into the first skb.
      
      This patch fixes dump_rules to return error so that we exit correctly
      and idx is correctly maintained between skbs that are part of the
      same dump.
      Signed-off-by: default avatarWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45c191bb
    • WANG Cong's avatar
      net: revert "net_sched: move tp->root allocation into fw_init()" · d8abd058
      WANG Cong authored
      [ Upstream commit d8aecb10 ]
      
      fw filter uses tp->root==NULL to check if it is the old method,
      so it doesn't need allocation at all in this case. This patch
      reverts the offending commit and adds some comments for old
      method to make it obvious.
      
      Fixes: 33f8b9ec ("net_sched: move tp->root allocation into fw_init()")
      Reported-by: default avatarAkshat Kakkar <akshat.1984@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8abd058
    • David Woodhouse's avatar
      Fix AF_PACKET ABI breakage in 4.2 · dd9eb1b1
      David Woodhouse authored
      [ Upstream commit d3869efe ]
      
      Commit 7d824109 ("virtio: add explicit big-endian support to memory
      accessors") accidentally changed the virtio_net header used by
      AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.
      
      Since virtio_legacy_is_little_endian() is a very long identifier,
      define a vio_le macro and use that throughout the code instead of the
      hard-coded 'false' for little-endian.
      
      This restores the ABI to match 4.1 and earlier kernels, and makes my
      test program work again.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd9eb1b1
    • Eric Dumazet's avatar
      tcp: add proper TS val into RST packets · 9d0af4ef
      Eric Dumazet authored
      [ Upstream commit 675ee231 ]
      
      RST packets sent on behalf of TCP connections with TS option (RFC 7323
      TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr.
      
      A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100
      ecr 0], length 0
      B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss
      1460,nop,nop,TS val 7264344 ecr 100], length 0
      A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr
      7264344], length 0
      
      B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0
      ecr 110], length 0
      
      We need to call skb_mstamp_get() to get proper TS val,
      derived from skb->skb_mstamp
      
      Note that RFC 1323 was advocating to not send TS option in RST segment,
      but RFC 7323 recommends the opposite :
      
        Once TSopt has been successfully negotiated, that is both <SYN> and
        <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
        segment for the duration of the connection, and SHOULD be sent in an
        <RST> segment (see Section 5.2 for details)
      
      Note this RFC recommends to send TS val = 0, but we believe it is
      premature : We do not know if all TCP stacks are properly
      handling the receive side :
      
         When an <RST> segment is
         received, it MUST NOT be subjected to the PAWS check by verifying an
         acceptable value in SEG.TSval, and information from the Timestamps
         option MUST NOT be used to update connection state information.
         SEG.TSecr MAY be used to provide stricter <RST> acceptance checks.
      
      In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider
      to decide to send TS val = 0, if it buys something.
      
      Fixes: 7faee5c0 ("tcp: remove TCP_SKB_CB(skb)->when")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d0af4ef
    • Jesse Gross's avatar
      openvswitch: Zero flows on allocation. · 1e0c9d37
      Jesse Gross authored
      [ Upstream commit ae5f2fb1 ]
      
      When support for megaflows was introduced, OVS needed to start
      installing flows with a mask applied to them. Since masking is an
      expensive operation, OVS also had an optimization that would only
      take the parts of the flow keys that were covered by a non-zero
      mask. The values stored in the remaining pieces should not matter
      because they are masked out.
      
      While this works fine for the purposes of matching (which must always
      look at the mask), serialization to netlink can be problematic. Since
      the flow and the mask are serialized separately, the uninitialized
      portions of the flow can be encoded with whatever values happen to be
      present.
      
      In terms of functionality, this has little effect since these fields
      will be masked out by definition. However, it leaks kernel memory to
      userspace, which is a potential security vulnerability. It is also
      possible that other code paths could look at the masked key and get
      uninitialized data, although this does not currently appear to be an
      issue in practice.
      
      This removes the mask optimization for flows that are being installed.
      This was always intended to be the case as the mask optimizations were
      really targetting per-packet flow operations.
      
      Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e0c9d37
    • Russell King's avatar
      net: dsa: actually force the speed on the CPU port · ccbe6aba
      Russell King authored
      [ Upstream commit 53adc9e8 ]
      
      Commit 54d792f2 ("net: dsa: Centralise global and port setup
      code into mv88e6xxx.") merged in the 4.2 merge window broke the link
      speed forcing for the CPU port of Marvell DSA switches.  The original
      code was:
      
              /* MAC Forcing register: don't force link, speed, duplex
               * or flow control state to any particular values on physical
               * ports, but force the CPU port and all DSA ports to 1000 Mb/s
               * full duplex.
               */
              if (dsa_is_cpu_port(ds, p) || ds->dsa_port_mask & (1 << p))
                      REG_WRITE(addr, 0x01, 0x003e);
              else
                      REG_WRITE(addr, 0x01, 0x0003);
      
      but the new code does a read-modify-write:
      
                      reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
                      if (dsa_is_cpu_port(ds, port) ||
                          ds->dsa_port_mask & (1 << port)) {
                              reg |= PORT_PCS_CTRL_FORCE_LINK |
                                      PORT_PCS_CTRL_LINK_UP |
                                      PORT_PCS_CTRL_DUPLEX_FULL |
                                      PORT_PCS_CTRL_FORCE_DUPLEX;
                              if (mv88e6xxx_6065_family(ds))
                                      reg |= PORT_PCS_CTRL_100;
                              else
                                      reg |= PORT_PCS_CTRL_1000;
      
      The link speed in the PCS control register is a two bit field.  Forcing
      the link speed in this way doesn't ensure that the bit field is set to
      the correct value - on the hardware I have here, the speed bitfield
      remains set to 0x03, resulting in the speed not being forced to gigabit.
      
      We must clear both bits before forcing the link speed.
      
      Fixes: 54d792f2 ("net: dsa: Centralise global and port setup code into mv88e6xxx.")
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ccbe6aba
    • Herbert Xu's avatar
      netlink: Replace rhash_portid with bound · e2a3131d
      Herbert Xu authored
      [ Upstream commit da314c99 ]
      
      On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
      >
      > store_release and load_acquire are different from the usual memory
      > barriers and can't be paired this way.  You have to pair store_release
      > and load_acquire.  Besides, it isn't a particularly good idea to
      
      OK I've decided to drop the acquire/release helpers as they don't
      help us at all and simply pessimises the code by using full memory
      barriers (on some architectures) where only a write or read barrier
      is needed.
      
      > depend on memory barriers embedded in other data structures like the
      > above.  Here, especially, rhashtable_insert() would have write barrier
      > *before* the entry is hashed not necessarily *after*, which means that
      > in the above case, a socket which appears to have set bound to a
      > reader might not visible when the reader tries to look up the socket
      > on the hashtable.
      
      But you are right we do need an explicit write barrier here to
      ensure that the hashing is visible.
      
      > There's no reason to be overly smart here.  This isn't a crazy hot
      > path, write barriers tend to be very cheap, store_release more so.
      > Please just do smp_store_release() and note what it's paired with.
      
      It's not about being overly smart.  It's about actually understanding
      what's going on with the code.  I've seen too many instances of
      people simply sprinkling synchronisation primitives around without
      any knowledge of what is happening underneath, which is just a recipe
      for creating hard-to-debug races.
      
      > > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
      > >  		}
      > >  	}
      > >
      > > -	if (!nlk->portid) {
      > > +	if (!nlk->bound) {
      >
      > I don't think you can skip load_acquire here just because this is the
      > second deref of the variable.  That doesn't change anything.  Race
      > condition could still happen between the first and second tests and
      > skipping the second would lead to the same kind of bug.
      
      The reason this one is OK is because we do not use nlk->portid or
      try to get nlk from the hash table before we return to user-space.
      
      However, there is a real bug here that none of these acquire/release
      helpers discovered.  The two bound tests here used to be a single
      one.  Now that they are separate it is entirely possible for another
      thread to come in the middle and bind the socket.  So we need to
      repeat the portid check in order to maintain consistency.
      
      > > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
      > >  	    !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
      > >  		return -EPERM;
      > >
      > > -	if (!nlk->portid)
      > > +	if (!nlk->bound)
      >
      > Don't we need load_acquire here too?  Is this path holding a lock
      > which makes that unnecessary?
      
      Ditto.
      
      ---8<---
      The commit 1f770c0a ("netlink:
      Fix autobind race condition that leads to zero port ID") created
      some new races that can occur due to inconcsistencies between the
      two port IDs.
      
      Tejun is right that a barrier is unavoidable.  Therefore I am
      reverting to the original patch that used a boolean to indicate
      that a user netlink socket has been bound.
      
      Barriers have been added where necessary to ensure that a valid
      portid and the hashed socket is visible.
      
      I have also changed netlink_insert to only return EBUSY if the
      socket is bound to a portid different to the requested one.  This
      combined with only reading nlk->bound once in netlink_bind fixes
      a race where two threads that bind the socket at the same time
      with different port IDs may both succeed.
      
      Fixes: 1f770c0a ("netlink: Fix autobind race condition that leads to zero port ID")
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Nacked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2a3131d
    • Herbert Xu's avatar
      netlink: Fix autobind race condition that leads to zero port ID · 6e32e731
      Herbert Xu authored
      [ Upstream commit 1f770c0a ]
      
      The commit c0bb07df ("netlink:
      Reset portid after netlink_insert failure") introduced a race
      condition where if two threads try to autobind the same socket
      one of them may end up with a zero port ID.  This led to kernel
      deadlocks that were observed by multiple people.
      
      This patch reverts that commit and instead fixes it by introducing
      a separte rhash_portid variable so that the real portid is only set
      after the socket has been successfully hashed.
      
      Fixes: c0bb07df ("netlink: Reset portid after netlink_insert failure")
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e32e731
    • Michael S. Tsirkin's avatar
      macvtap: fix TUNSETSNDBUF values > 64k · 3463bb42
      Michael S. Tsirkin authored
      [ Upstream commit 3ea79249 ]
      
      Upon TUNSETSNDBUF,  macvtap reads the requested sndbuf size into
      a local variable u.
      commit 39ec7de7 ("macvtap: fix uninitialized access on
      TUNSETIFF") changed its type to u16 (which is the right thing to
      do for all other macvtap ioctls), breaking all values > 64k.
      
      The value of TUNSETSNDBUF is actually a signed 32 bit integer, so
      the right thing to do is to read it into an int.
      
      Cc: David S. Miller <davem@davemloft.net>
      Fixes: 39ec7de7 ("macvtap: fix uninitialized access on TUNSETIFF")
      Reported-by: Mark A. Peloquin
      Bisected-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3463bb42
    • Eric Dumazet's avatar
      net/mlx4_en: really allow to change RSS key · d1925961
      Eric Dumazet authored
      [ Upsteam commit 4671fc6d ]
      
      When changing rss key, we do not want to overwrite user provided key
      by the one provided by netdev_rss_key_fill(), which is the host random
      key generated at boot time.
      
      Fixes: 947cbb0a ("net/mlx4_en: Support for configurable RSS hash function")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Eyal Perry <eyalpe@mellanox.com>
      CC: Amir Vadai <amirv@mellanox.com>
      Acked-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1925961