1. 20 Sep, 2017 40 commits
    • Andy Lutomirski's avatar
      x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps · 831621ad
      Andy Lutomirski authored
      commit 9584d98b upstream.
      
      In ELF_COPY_CORE_REGS, we're copying from the current task, so
      accessing thread.fsbase and thread.gsbase makes no sense.  Just read
      the values from the CPU registers.
      
      In practice, the old code would have been correct most of the time
      simply because thread.fsbase and thread.gsbase usually matched the
      CPU registers.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chang Seok <chang.seok.bae@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      831621ad
    • Andy Lutomirski's avatar
      x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common · 90ecd1c5
      Andy Lutomirski authored
      commit 767d035d upstream.
      
      execve used to leak FSBASE and GSBASE on AMD CPUs.  Fix it.
      
      The security impact of this bug is small but not quite zero -- it
      could weaken ASLR when a privileged task execs a less privileged
      program, but only if program changed bitness across the exec, or the
      child binary was highly unusual or actively malicious.  A child
      program that was compromised after the exec would not have access to
      the leaked base.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chang Seok <chang.seok.bae@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90ecd1c5
    • Jaegeuk Kim's avatar
      f2fs: check hot_data for roll-forward recovery · cb14d4ce
      Jaegeuk Kim authored
      commit 125c9fb1 upstream.
      
      We need to check HOT_DATA to truncate any previous data block when doing
      roll-forward recovery.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb14d4ce
    • Jaegeuk Kim's avatar
      f2fs: let fill_super handle roll-forward errors · 96a069a6
      Jaegeuk Kim authored
      commit afd2b4da upstream.
      
      If we set CP_ERROR_FLAG in roll-forward error, f2fs is no longer to proceed
      any IOs due to f2fs_cp_error(). But, for example, if some stale data is involved
      on roll-forward process, we're able to get -ENOENT, getting fs stuck.
      If we get any error, let fill_super set SBI_NEED_FSCK and try to recover back
      to stable point.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96a069a6
    • Marcelo Ricardo Leitner's avatar
      sctp: fix missing wake ups in some situations · 442df042
      Marcelo Ricardo Leitner authored
      
      [ Upstream commit 7906b00f ]
      
      Commit fb586f25 ("sctp: delay calls to sk_data_ready() as much as
      possible") minimized the number of wake ups that are triggered in case
      the association receives a packet with multiple data chunks on it and/or
      when io_events are enabled and then commit 0970f5b3 ("sctp: signal
      sk_data_ready earlier on data chunks reception") moved the wake up to as
      soon as possible. It thus relies on the state machine running later to
      clean the flag that the event was already generated.
      
      The issue is that there are 2 call paths that calls
      sctp_ulpq_tail_event() outside of the state machine, causing the flag to
      linger and possibly omitting a needed wake up in the sequence.
      
      One of the call paths is when enabling SCTP_SENDER_DRY_EVENTS via
      setsockopt(SCTP_EVENTS), as noticed by Harald Welte. The other is when
      partial reliability triggers removal of chunks from the send queue when
      the application calls sendmsg().
      
      This commit fixes it by not setting the flag in case the socket is not
      owned by the user, as it won't be cleaned later. This works for
      user-initiated calls and also for rx path processing.
      
      Fixes: fb586f25 ("sctp: delay calls to sk_data_ready() as much as possible")
      Reported-by: default avatarHarald Welte <laforge@gnumonks.org>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      442df042
    • Eric Dumazet's avatar
      ipv6: fix typo in fib6_net_exit() · aa02286a
      Eric Dumazet authored
      
      [ Upstream commit 32a805ba ]
      
      IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.
      
      Fixes: ba1cc08d ("ipv6: fix memory leak with multiple tables during netns destruction")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa02286a
    • Sabrina Dubroca's avatar
      ipv6: fix memory leak with multiple tables during netns destruction · 18c6d4c4
      Sabrina Dubroca authored
      
      [ Upstream commit ba1cc08d ]
      
      fib6_net_exit only frees the main and local tables. If another table was
      created with fib6_alloc_table, we leak it when the netns is destroyed.
      
      Fix this in the same way ip_fib_net_exit cleans up tables, by walking
      through the whole hashtable of fib6_table's. We can get rid of the
      special cases for local and main, since they're also part of the
      hashtable.
      
      Reproducer:
          ip netns add x
          ip -net x -6 rule add from 6003:1::/64 table 100
          ip netns del x
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Fixes: 58f09b78 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18c6d4c4
    • Xin Long's avatar
      ip6_gre: update mtu properly in ip6gre_err · 888b7a94
      Xin Long authored
      
      [ Upstream commit 5c25f30c ]
      
      Now when probessing ICMPV6_PKT_TOOBIG, ip6gre_err only subtracts the
      offset of gre header from mtu info. The expected mtu of gre device
      should also subtract gre header. Otherwise, the next packets still
      can't be sent out.
      
      Jianlin found this issue when using the topo:
        client(ip6gre)<---->(nic1)route(nic2)<----->(ip6gre)server
      
      and reducing nic2's mtu, then both tcp and sctp's performance with
      big size data became 0.
      
      This patch is to fix it by also subtracting grehdr (tun->tun_hlen)
      from mtu info when updating gre device's mtu in ip6gre_err(). It
      also needs to subtract ETH_HLEN if gre dev'type is ARPHRD_ETHER.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      888b7a94
    • Jason Wang's avatar
      vhost_net: correctly check tx avail during rx busy polling · 88f6c6f2
      Jason Wang authored
      
      [ Upstream commit 8b949bef ]
      
      We check tx avail through vhost_enable_notify() in the past which is
      wrong since it only checks whether or not guest has filled more
      available buffer since last avail idx synchronization which was just
      done by vhost_vq_avail_empty() before. What we really want is checking
      pending buffers in the avail ring. Fix this by calling
      vhost_vq_avail_empty() instead.
      
      This issue could be noticed by doing netperf TCP_RR benchmark as
      client from guest (but not host). With this fix, TCP_RR from guest to
      localhost restores from 1375.91 trans per sec to 55235.28 trans per
      sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz).
      
      Fixes: 03088137 ("vhost_net: basic polling support")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88f6c6f2
    • Claudiu Manoil's avatar
      gianfar: Fix Tx flow control deactivation · fc33f146
      Claudiu Manoil authored
      
      [ Upstream commit 5d621672 ]
      
      The wrong register is checked for the Tx flow control bit,
      it should have been maccfg1 not maccfg2.
      This went unnoticed for so long probably because the impact is
      hardly visible, not to mention the tangled code from adjust_link().
      First, link flow control (i.e. handling of Rx/Tx link level pause frames)
      is disabled by default (needs to be enabled via 'ethtool -A').
      Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few
      old boards), which results in Tx flow control remaining always on
      once activated.
      
      Fixes: 45b679c9 ("gianfar: Implement PAUSE frame generation support")
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc33f146
    • Jesper Dangaard Brouer's avatar
      Revert "net: fix percpu memory leaks" · a44bb1c4
      Jesper Dangaard Brouer authored
      
      [ Upstream commit 5a63643e ]
      
      This reverts commit 1d6119ba.
      
      After reverting commit 6d7b857d ("net: use lib/percpu_counter API
      for fragmentation mem accounting") then here is no need for this
      fix-up patch.  As percpu_counter is no longer used, it cannot
      memory leak it any-longer.
      
      Fixes: 6d7b857d ("net: use lib/percpu_counter API for fragmentation mem accounting")
      Fixes: 1d6119ba ("net: fix percpu memory leaks")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a44bb1c4
    • Jesper Dangaard Brouer's avatar
      Revert "net: use lib/percpu_counter API for fragmentation mem accounting" · 8fbf9f91
      Jesper Dangaard Brouer authored
      
      [ Upstream commit fb452a1a ]
      
      This reverts commit 6d7b857d.
      
      There is a bug in fragmentation codes use of the percpu_counter API,
      that can cause issues on systems with many CPUs.
      
      The frag_mem_limit() just reads the global counter (fbc->count),
      without considering other CPUs can have upto batch size (130K) that
      haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
      this become dangerous at >=24 CPUs (3*1024*1024/130000=24).
      
      The correct API usage would be to use __percpu_counter_compare() which
      does the right thing, and takes into account the number of (online)
      CPUs and batch size, to account for this and call __percpu_counter_sum()
      when needed.
      
      We choose to revert the use of the lib/percpu_counter API for frag
      memory accounting for several reasons:
      
      1) On systems with CPUs > 24, the heavier fully locked
         __percpu_counter_sum() is always invoked, which will be more
         expensive than the atomic_t that is reverted to.
      
      Given systems with more than 24 CPUs are becoming common this doesn't
      seem like a good option.  To mitigate this, the batch size could be
      decreased and thresh be increased.
      
      2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
         CPU, before SKBs are pushed into sockets on remote CPUs.  Given
         NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
         likely be limited.  Thus, a fair chance that atomic add+dec happen
         on the same CPU.
      
      Revert note that commit 1d6119ba ("net: fix percpu memory leaks")
      removed init_frag_mem_limit() and instead use inet_frags_init_net().
      After this revert, inet_frags_uninit_net() becomes empty.
      
      Fixes: 6d7b857d ("net: use lib/percpu_counter API for fragmentation mem accounting")
      Fixes: 1d6119ba ("net: fix percpu memory leaks")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8fbf9f91
    • Ido Schimmel's avatar
      bridge: switchdev: Clear forward mark when transmitting packet · 79f08820
      Ido Schimmel authored
      
      [ Upstream commit 79e99bdd ]
      
      Commit 6bc506b4 ("bridge: switchdev: Add forward mark support for
      stacked devices") added the 'offload_fwd_mark' bit to the skb in order
      to allow drivers to indicate to the bridge driver that they already
      forwarded the packet in L2.
      
      In case the bit is set, before transmitting the packet from each port,
      the port's mark is compared with the mark stored in the skb's control
      block. If both marks are equal, we know the packet arrived from a switch
      device that already forwarded the packet and it's not re-transmitted.
      
      However, if the packet is transmitted from the bridge device itself
      (e.g., br0), we should clear the 'offload_fwd_mark' bit as the mark
      stored in the skb's control block isn't valid.
      
      This scenario can happen in rare cases where a packet was trapped during
      L3 forwarding and forwarded by the kernel to a bridge device.
      
      Fixes: 6bc506b4 ("bridge: switchdev: Add forward mark support for stacked devices")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Tested-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79f08820
    • Ido Schimmel's avatar
      mlxsw: spectrum: Forbid linking to devices that have uppers · 2f4232ba
      Ido Schimmel authored
      
      [ Upstream commit 25cc72a3 ]
      
      The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the
      device in case a port is enslaved to a master netdev such as bridge or
      bond.
      
      Since the driver ignores events unrelated to its ports and their
      uppers, it's possible to engineer situations in which the device's data
      path differs from the kernel's.
      
      One example to such a situation is when a port is enslaved to a bond
      that is already enslaved to a bridge. When the bond was enslaved the
      driver ignored the event - as the bond wasn't one of its uppers - and
      therefore a bridge port instance isn't created in the device.
      
      Until such configurations are supported forbid them by checking that the
      upper device doesn't have uppers of its own.
      
      Fixes: 0d65fc13 ("mlxsw: spectrum: Implement LAG port join/leave")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarNogah Frankel <nogahf@mellanox.com>
      Tested-by: default avatarNogah Frankel <nogahf@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f4232ba
    • Andrew Lunn's avatar
      net: fec: Allow reception of frames bigger than 1522 bytes · a9e548de
      Andrew Lunn authored
      
      [ Upstream commit fbbeefdd ]
      
      The FEC Receive Control Register has a 14 bit field indicating the
      longest frame that may be received. It is being set to 1522. Frames
      longer than this are discarded, but counted as being in error.
      
      When using DSA, frames from the switch has an additional header,
      either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
      of 1522 bytes received by the switch on a port becomes 1530 bytes when
      passed to the host via the FEC interface.
      
      Change the maximum receive size to 2048 - 64, where 64 is the maximum
      rx_alignment applied on the receive buffer for AVB capable FEC
      cores. Use this value also for the maximum receive buffer size. The
      driver is already allocating a receive SKB of 2048 bytes, so this
      change should not have any significant effects.
      
      Tested on imx51, imx6, vf610.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9e548de
    • Florian Fainelli's avatar
      Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()" · b8fcbae2
      Florian Fainelli authored
      
      [ Upstream commit ebc8254a ]
      
      This reverts commit 7ad813f2 ("net: phy:
      Correctly process PHY_HALTED in phy_stop_machine()") because it is
      creating the possibility for a NULL pointer dereference.
      
      David Daney provide the following call trace and diagram of events:
      
      When ndo_stop() is called we call:
      
       phy_disconnect()
          +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
          +---> phy_stop_machine()
          |      +---> phy_state_machine()
          |              +----> queue_delayed_work(): Work queued.
          +--->phy_detach() implies: phydev->attached_dev = NULL;
      
      Now at a later time the queued work does:
      
       phy_state_machine()
          +---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:
      
       CPU 12 Unable to handle kernel paging request at virtual address
      0000000000000048, epc == ffffffff80de37ec, ra == ffffffff80c7c
      Oops[#1]:
      CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
      Workqueue: events_power_efficient phy_state_machine
      task: 80000004021ed100 task.stack: 8000000409d70000
      $ 0   : 0000000000000000 ffffffff84720060 0000000000000048 0000000000000004
      $ 4   : 0000000000000000 0000000000000001 0000000000000004 0000000000000000
      $ 8   : 0000000000000000 0000000000000000 00000000ffff98f3 0000000000000000
      $12   : 8000000409d73fe0 0000000000009c00 ffffffff846547c8 000000000000af3b
      $16   : 80000004096bab68 80000004096babd0 0000000000000000 80000004096ba800
      $20   : 0000000000000000 0000000000000000 ffffffff81090000 0000000000000008
      $24   : 0000000000000061 ffffffff808637b0
      $28   : 8000000409d70000 8000000409d73cf0 80000000271bd300 ffffffff80c7804c
      Hi    : 000000000000002a
      Lo    : 000000000000003f
      epc   : ffffffff80de37ec netif_carrier_off+0xc/0x58
      ra    : ffffffff80c7804c phy_state_machine+0x48c/0x4f8
      Status: 14009ce3        KX SX UX KERNEL EXL IE
      Cause : 00800008 (ExcCode 02)
      BadVA : 0000000000000048
      PrId  : 000d9501 (Cavium Octeon III)
      Modules linked in:
      Process kworker/12:1 (pid: 1502, threadinfo=8000000409d70000,
      task=80000004021ed100, tls=0000000000000000)
      Stack : 8000000409a54000 80000004096bab68 80000000271bd300 80000000271c1e00
              0000000000000000 ffffffff808a1708 8000000409a54000 80000000271bd300
              80000000271bd320 8000000409a54030 ffffffff80ff0f00 0000000000000001
              ffffffff81090000 ffffffff808a1ac0 8000000402182080 ffffffff84650000
              8000000402182080 ffffffff84650000 ffffffff80ff0000 8000000409a54000
              ffffffff808a1970 0000000000000000 80000004099e8000 8000000402099240
              0000000000000000 ffffffff808a8598 0000000000000000 8000000408eeeb00
              8000000409a54000 00000000810a1d00 0000000000000000 8000000409d73de8
              8000000409d73de8 0000000000000088 000000000c009c00 8000000409d73e08
              8000000409d73e08 8000000402182080 ffffffff808a84d0 8000000402182080
              ...
      Call Trace:
      [<ffffffff80de37ec>] netif_carrier_off+0xc/0x58
      [<ffffffff80c7804c>] phy_state_machine+0x48c/0x4f8
      [<ffffffff808a1708>] process_one_work+0x158/0x368
      [<ffffffff808a1ac0>] worker_thread+0x150/0x4c0
      [<ffffffff808a8598>] kthread+0xc8/0xe0
      [<ffffffff808617f0>] ret_from_kernel_thread+0x14/0x1c
      
      The original motivation for this change originated from Marc Gonzales
      indicating that his network driver did not have its adjust_link callback
      executing with phydev->link = 0 while he was expecting it.
      
      PHYLIB has never made any such guarantees ever because phy_stop() merely just
      tells the workqueue to move into PHY_HALTED state which will happen
      asynchronously.
      Reported-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reported-by: default avatarDavid Daney <ddaney.cavm@gmail.com>
      Fixes: 7ad813f2 ("net: phy: Correctly process PHY_HALTED in phy_stop_machine()")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8fcbae2
    • Tal Gilboa's avatar
      net/mlx5e: Fix CQ moderation mode not set properly · b88be44f
      Tal Gilboa authored
      
      [ Upstream commit 1213ad28 ]
      
      cq_period_mode assignment was mistakenly removed so it was always set to "0",
      which is EQE based moderation, regardless of the device CAPs and
      requested value in ethtool.
      
      Fixes: 6a9764ef ("net/mlx5e: Isolate open_channels from priv->params")
      Signed-off-by: default avatarTal Gilboa <talgi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b88be44f
    • Moshe Shemesh's avatar
      net/mlx5e: Fix inline header size for small packets · 8049c41d
      Moshe Shemesh authored
      
      [ Upstream commit 6aace17e ]
      
      Fix inline header size, make sure it is not greater than skb len.
      This bug effects small packets, for example L2 packets with size < 18.
      
      Fixes: ae76715d ("net/mlx5e: Check the minimum inline header mode before xmit")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8049c41d
    • Shahar Klein's avatar
      net/mlx5: E-Switch, Unload the representors in the correct order · 8db40bcf
      Shahar Klein authored
      
      [ Upstream commit 19122039 ]
      
      When changing from switchdev to legacy mode, all the representor port
      devices (uplink nic and reps) are cleaned up. Part of this cleaning
      process is removing the neigh entries and the hash table containing them.
      However, a representor neigh entry might be linked to the uplink port
      hash table and if the uplink nic is cleaned first the cleaning of the
      representor will end up in null deref.
      Fix that by unloading the representors in the opposite order of load.
      
      Fixes: cb67b832 ("net/mlx5e: Introduce SRIOV VF representors")
      Signed-off-by: default avatarShahar Klein <shahark@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8db40bcf
    • Paul Blakey's avatar
      net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address · b0034cb5
      Paul Blakey authored
      
      [ Upstream commit 08820528 ]
      
      Currently if vxlan tunnel ipv6 src isn't supplied the driver fails to
      resolve it as part of the route lookup. The resulting encap header
      is left with a zeroed out ipv6 src address so the packets are sent
      with this src ip.
      
      Use an appropriate route lookup API that also resolves the source
      ipv6 address if it's not supplied.
      
      Fixes: ce99f6b9 ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels')
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0034cb5
    • Inbar Karmy's avatar
      net/mlx5e: Don't override user RSS upon set channels · 53c55257
      Inbar Karmy authored
      
      [ Upstream commit 5a8e1267 ]
      
      Currently, increasing the number of combined channels is changing
      the RSS spread to use the new created channels.
      Prevent the RSS spread change in case the user explicitly declare it,
      to avoid overriding user configuration.
      
      Tested:
      when RSS default:
      
      # ethtool -L ens8 combined 4
      RSS spread will change and point to 4 channels.
      
      # ethtool -X ens8 equal 4
      # ethtool -L ens8 combined 6
      RSS will not change after increasing the number of the channels.
      
      Fixes: 8bf36862 ('ethtool: ensure channel counts are within bounds during SCHANNELS')
      Signed-off-by: default avatarInbar Karmy <inbark@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53c55257
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix dangling page pointer on DMA mapping error · ba008489
      Eran Ben Elisha authored
      
      [ Upstream commit 0556ce72 ]
      
      Function mlx5e_dealloc_rx_wqe is using page pointer value as an
      indication to valid DMA mapping. In case that the mapping failed, we
      released the page but kept the dangling pointer. Store the page pointer
      only after the DMA mapping passed to avoid invalid page DMA unmap.
      
      Fixes: bc77b240 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba008489
    • Noa Osherovich's avatar
      net/mlx5: Fix arm SRQ command for ISSI version 0 · 7ae1eccb
      Noa Osherovich authored
      
      [ Upstream commit 672d0880 ]
      
      Support for ISSI version 0 was recently broken as the arm_srq_cmd
      command, which is used only for ISSI version 0, was given the opcode
      for ISSI version 1 instead of ISSI version 0.
      
      Change arm_srq_cmd to use the correct command opcode for ISSI version
      0.
      
      Fixes: af1ba291 ('{net, IB}/mlx5: Refactor internal SRQ API')
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ae1eccb
    • Huy Nguyen's avatar
      net/mlx5e: Fix DCB_CAP_ATTR_DCBX capability for DCBNL getcap. · 0b6b3028
      Huy Nguyen authored
      
      [ Upstream commit 9e10bf1d ]
      
      Current code doesn't report DCB_CAP_DCBX_HOST capability when query
      through getcap. User space lldptool expects capability to have HOST mode
      set when it wants to configure DCBX CEE mode. In absence of HOST mode
      capability, lldptool fails to switch to CEE mode.
      
      This fix returns DCB_CAP_DCBX_HOST capability when port's DCBX
      controlled mode is under software control.
      
      Fixes: 3a6a931d ("net/mlx5e: Support DCBX CEE API")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b6b3028
    • Huy Nguyen's avatar
      net/mlx5e: Check for qos capability in dcbnl_initialize · 9b919ad3
      Huy Nguyen authored
      
      [ Upstream commit 33c52b67 ]
      
      qos capability is the master capability bit that determines
      if the DCBX is supported for the PCI function. If this bit is off,
      driver cannot run any dcbx code.
      
      Fixes: e207b7e9 ("net/mlx5e: ConnectX-4 firmware support for DCBX")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b919ad3
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278 · 31034e44
      Florian Fainelli authored
      
      [ Upstream commit df191632 ]
      
      BCM7278 has only 128 entries while BCM7445 has the full 256 entries set,
      fix that.
      
      Fixes: 7318166c ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31034e44
    • Eric Dumazet's avatar
      kcm: do not attach PF_KCM sockets to avoid deadlock · f9901adf
      Eric Dumazet authored
      
      [ Upstream commit 351050ec ]
      
      syzkaller had no problem to trigger a deadlock, attaching a KCM socket
      to another one (or itself). (original syzkaller report was a very
      confusing lockdep splat during a sendmsg())
      
      It seems KCM claims to only support TCP, but no enforcement is done,
      so we might need to add additional checks.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarTom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9901adf
    • Benjamin Poirier's avatar
      packet: Don't write vnet header beyond end of buffer · e7ebdeb4
      Benjamin Poirier authored
      
      [ Upstream commit edbd58be ]
      
      ... which may happen with certain values of tp_reserve and maclen.
      
      Fixes: 58d19b19 ("packet: vnet_hdr support for tpacket_rcv")
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7ebdeb4
    • Xin Long's avatar
      ipv6: do not set sk_destruct in IPV6_ADDRFORM sockopt · ef5a20f0
      Xin Long authored
      
      [ Upstream commit e8d411d2 ]
      
      ChunYu found a kernel warn_on during syzkaller fuzzing:
      
      [40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0
      [40226.144849] Call Trace:
      [40226.147590]  <IRQ>
      [40226.149859]  dump_stack+0xe2/0x186
      [40226.176546]  __warn+0x1a4/0x1e0
      [40226.180066]  warn_slowpath_null+0x31/0x40
      [40226.184555]  inet_sock_destruct+0x78d/0x9a0
      [40226.246355]  __sk_destruct+0xfa/0x8c0
      [40226.290612]  rcu_process_callbacks+0xaa0/0x18a0
      [40226.336816]  __do_softirq+0x241/0x75e
      [40226.367758]  irq_exit+0x1f6/0x220
      [40226.371458]  smp_apic_timer_interrupt+0x7b/0xa0
      [40226.376507]  apic_timer_interrupt+0x93/0xa0
      
      The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct.
      As after commit f970bd9e ("udp: implement memory accounting helpers"),
      udp has changed to use udp_destruct_sock as sk_destruct where it would
      udp_rmem_release all rmem.
      
      But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after
      changing family to PF_INET. If rmem is not 0 at that time, and there is
      no place to release rmem before calling inet_sock_destruct, the warn_on
      will be triggered.
      
      This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt
      any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has
      already set it's sk_destruct with inet_sock_destruct and UDP has set with
      udp_destruct_sock since they're created.
      
      Fixes: f970bd9e ("udp: implement memory accounting helpers")
      Reported-by: default avatarChunYu Wang <chunwang@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef5a20f0
    • Xin Long's avatar
      ipv6: set dst.obsolete when a cached route has expired · 440ea29a
      Xin Long authored
      
      [ Upstream commit 1e2ea8ad ]
      
      Now it doesn't check for the cached route expiration in ipv6's
      dst_ops->check(), because it trusts dst_gc that would clean the
      cached route up when it's expired.
      
      The problem is in dst_gc, it would clean the cached route only
      when it's refcount is 1. If some other module (like xfrm) keeps
      holding it and the module only release it when dst_ops->check()
      fails.
      
      But without checking for the cached route expiration, .check()
      may always return true. Meanwhile, without releasing the cached
      route, dst_gc couldn't del it. It will cause this cached route
      never to expire.
      
      This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
      when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
      in .check.
      
      Note that this is even needed when ipv6 dst_gc timer is removed
      one day. It would set dst.obsolete in .redirect and .update_pmtu
      instead, and check for cached route expiration when getting it,
      just like what ipv4 route does.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      440ea29a
    • Stefano Brivio's avatar
      cxgb4: Fix stack out-of-bounds read due to wrong size to t4_record_mbox() · 24bd86e6
      Stefano Brivio authored
      
      [ Upstream commit 0f308686 ]
      
      Passing commands for logging to t4_record_mbox() with size
      MBOX_LEN, when the actual command size is actually smaller,
      causes out-of-bounds stack accesses in t4_record_mbox() while
      copying command words here:
      
      	for (i = 0; i < size / 8; i++)
      		entry->cmd[i] = be64_to_cpu(cmd[i]);
      
      Up to 48 bytes from the stack are then leaked to debugfs.
      
      This happens whenever we send (and log) commands described by
      structs fw_sched_cmd (32 bytes leaked), fw_vi_rxmode_cmd (48),
      fw_hello_cmd (48), fw_bye_cmd (48), fw_initialize_cmd (48),
      fw_reset_cmd (48), fw_pfvf_cmd (32), fw_eq_eth_cmd (16),
      fw_eq_ctrl_cmd (32), fw_eq_ofld_cmd (32), fw_acl_mac_cmd(16),
      fw_rss_glb_config_cmd(32), fw_rss_vi_config_cmd(32),
      fw_devlog_cmd(32), fw_vi_enable_cmd(48), fw_port_cmd(32),
      fw_sched_cmd(32), fw_devlog_cmd(32).
      
      The cxgb4vf driver got this right instead.
      
      When we call t4_record_mbox() to log a command reply, a MBOX_LEN
      size can be used though, as get_mbox_rpl() will fill cmd_rpl up
      completely.
      
      Fixes: 7f080c3f ("cxgb4: Add support to enable logging of firmware mailbox commands")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24bd86e6
    • Antoine Tenart's avatar
      net: mvpp2: fix the mac address used when using PPv2.2 · 59b304fd
      Antoine Tenart authored
      
      [ Upstream commit 4c228682 ]
      
      The mac address is only retrieved from h/w when using PPv2.1. Otherwise
      the variable holding it is still checked and used if it contains a valid
      value. As the variable isn't initialized to an invalid mac address
      value, we end up with random mac addresses which can be the same for all
      the ports handled by this PPv2 driver.
      
      Fixes this by initializing the h/w mac address variable to {0}, which is
      an invalid mac address value. This way the random assignation fallback
      is called and all ports end up with their own addresses.
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@free-electrons.com>
      Fixes: 26975821 ("net: mvpp2: handle misc PPv2.1/PPv2.2 differences")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59b304fd
    • Paolo Abeni's avatar
      udp6: set rx_dst_cookie on rx_dst updates · 38ca2d39
      Paolo Abeni authored
      
      [ Upstream commit 64f0f5d1 ]
      
      Currently, in the udp6 code, the dst cookie is not initialized/updated
      concurrently with the RX dst used by early demux.
      
      As a result, the dst_check() in the early_demux path always fails,
      the rx dst cache is always invalidated, and we can't really
      leverage significant gain from the demux lookup.
      
      Fix it adding udp6 specific variant of sk_rx_dst_set() and use it
      to set the dst cookie when the dst entry is really changed.
      
      The issue is there since the introduction of early demux for ipv6.
      
      Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38ca2d39
    • stephen hemminger's avatar
      netvsc: fix deadlock betwen link status and removal · b4426cf2
      stephen hemminger authored
      
      [ Upstream commit 9b4e946c ]
      
      There is a deadlock possible when canceling the link status
      delayed work queue. The removal process is run with RTNL held,
      and the link status callback is acquring RTNL.
      
      Resolve the issue by using trylock and rescheduling.
      If cancel is in process, that block it from happening.
      
      Fixes: 122a5f64 ("staging: hv: use delayed_work for netvsc_send_garp()")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4426cf2
    • Florian Fainelli's avatar
      net: systemport: Free DMA coherent descriptors on errors · 3f0204b0
      Florian Fainelli authored
      
      [ Upstream commit c2062ee3 ]
      
      In case bcm_sysport_init_tx_ring() is not able to allocate ring->cbs, we
      would return with an error, and call bcm_sysport_fini_tx_ring() and it
      would see that ring->cbs is NULL and do nothing. This would leak the
      coherent DMA descriptor area, so we need to free it on error before
      returning.
      Reported-by: default avatarEric Dumazet <edumazet@gmail.com>
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f0204b0
    • Florian Fainelli's avatar
      net: bcmgenet: Be drop monitor friendly · 71dd9ac5
      Florian Fainelli authored
      
      [ Upstream commit d4fec855 ]
      
      There are 3 spots where we call dev_kfree_skb() but we are actually
      just doing a normal SKB consumption: __bcmgenet_tx_reclaim() for normal
      TX reclamation, bcmgenet_alloc_rx_buffers() during the initial RX ring
      setup and bcmgenet_free_rx_buffers() during RX ring cleanup.
      
      Fixes: d6707bec ("net: bcmgenet: rewrite bcmgenet_rx_refill()")
      Fixes: f48bed16 ("net: bcmgenet: Free skb after last Tx frag")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71dd9ac5
    • Florian Fainelli's avatar
      net: systemport: Be drop monitor friendly · 7def678f
      Florian Fainelli authored
      
      [ Upstream commit c45182eb ]
      
      Utilize dev_consume_skb_any(cb->skb) in bcm_sysport_free_cb() which is
      used when a TX packet is completed, as well as when the RX ring is
      cleaned on shutdown. None of these two cases are packet drops, so be
      drop monitor friendly.
      Suggested-by: default avatarEric Dumazet <edumazet@gmail.com>
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7def678f
    • Bob Peterson's avatar
      tipc: Fix tipc_sk_reinit handling of -EAGAIN · c86a65cf
      Bob Peterson authored
      
      [ Upstream commit 6c7e983b ]
      
      In 9dbbfb0a function tipc_sk_reinit
      had additional logic added to loop in the event that function
      rhashtable_walk_next() returned -EAGAIN. No worries.
      
      However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
      and therefore skips the call to rhashtable_walk_stop(). That has
      the effect of calling rcu_read_lock() without its paired call to
      rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
      may not be apparent for a while, especially since resize events may
      be rare. But the comments to rhashtable_walk_start() state:
      
       * ...Note that we take the RCU lock in all
       * cases including when we return an error.  So you must always call
       * rhashtable_walk_stop to clean up.
      
      This patch replaces the continue with a goto and label to ensure a
      matching call to rhashtable_walk_stop().
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c86a65cf
    • Arnd Bergmann's avatar
      qlge: avoid memcpy buffer overflow · 8aafed19
      Arnd Bergmann authored
      
      [ Upstream commit e58f9583 ]
      
      gcc-8.0.0 (snapshot) points out that we copy a variable-length string
      into a fixed length field using memcpy() with the destination length,
      and that ends up copying whatever follows the string:
      
          inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
      drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=]
        memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);
      
      Changing it to use strncpy() will instead zero-pad the destination,
      which seems to be the right thing to do here.
      
      The bug is probably harmless, but it seems like a good idea to address
      it in stable kernels as well, if only for the purpose of building with
      gcc-8 without warnings.
      
      Fixes: a61f8026 ("qlge: Add ethtool register dump function.")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8aafed19
    • Stefano Brivio's avatar
      sctp: Avoid out-of-bounds reads from address storage · 6da13824
      Stefano Brivio authored
      
      [ Upstream commit ee6c88bb ]
      
      inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy
      sizeof(sockaddr_storage) bytes to fill in sockaddr structs used
      to export diagnostic information to userspace.
      
      However, the memory allocated to store sockaddr information is
      smaller than that and depends on the address family, so we leak
      up to 100 uninitialized bytes to userspace. Just use the size of
      the source structs instead, in all the three cases this is what
      userspace expects. Zero out the remaining memory.
      
      Unused bytes (i.e. when IPv4 addresses are used) in source
      structs sctp_sockaddr_entry and sctp_transport are already
      cleared by sctp_add_bind_addr() and sctp_transport_new(),
      respectively.
      
      Noticed while testing KASAN-enabled kernel with 'ss':
      
      [ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800
      [ 2326.896800] Read of size 128 by task ss/9527
      [ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1
      [ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
      [ 2326.917585] Call Trace:
      [ 2326.920312]  dump_stack+0x63/0x8d
      [ 2326.924014]  kasan_object_err+0x21/0x70
      [ 2326.928295]  kasan_report+0x288/0x540
      [ 2326.932380]  ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
      [ 2326.938500]  ? skb_put+0x8b/0xd0
      [ 2326.942098]  ? memset+0x31/0x40
      [ 2326.945599]  check_memory_region+0x13c/0x1a0
      [ 2326.950362]  memcpy+0x23/0x50
      [ 2326.953669]  inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
      [ 2326.959596]  ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag]
      [ 2326.966495]  ? __lock_sock+0x102/0x150
      [ 2326.970671]  ? sock_def_wakeup+0x60/0x60
      [ 2326.975048]  ? remove_wait_queue+0xc0/0xc0
      [ 2326.979619]  sctp_diag_dump+0x44a/0x760 [sctp_diag]
      [ 2326.985063]  ? sctp_ep_dump+0x280/0x280 [sctp_diag]
      [ 2326.990504]  ? memset+0x31/0x40
      [ 2326.994007]  ? mutex_lock+0x12/0x40
      [ 2326.997900]  __inet_diag_dump+0x57/0xb0 [inet_diag]
      [ 2327.003340]  ? __sys_sendmsg+0x150/0x150
      [ 2327.007715]  inet_diag_dump+0x4d/0x80 [inet_diag]
      [ 2327.012979]  netlink_dump+0x1e6/0x490
      [ 2327.017064]  __netlink_dump_start+0x28e/0x2c0
      [ 2327.021924]  inet_diag_handler_cmd+0x189/0x1a0 [inet_diag]
      [ 2327.028045]  ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag]
      [ 2327.034651]  ? inet_diag_dump_compat+0x190/0x190 [inet_diag]
      [ 2327.040965]  ? __netlink_lookup+0x1b9/0x260
      [ 2327.045631]  sock_diag_rcv_msg+0x18b/0x1e0
      [ 2327.050199]  netlink_rcv_skb+0x14b/0x180
      [ 2327.054574]  ? sock_diag_bind+0x60/0x60
      [ 2327.058850]  sock_diag_rcv+0x28/0x40
      [ 2327.062837]  netlink_unicast+0x2e7/0x3b0
      [ 2327.067212]  ? netlink_attachskb+0x330/0x330
      [ 2327.071975]  ? kasan_check_write+0x14/0x20
      [ 2327.076544]  netlink_sendmsg+0x5be/0x730
      [ 2327.080918]  ? netlink_unicast+0x3b0/0x3b0
      [ 2327.085486]  ? kasan_check_write+0x14/0x20
      [ 2327.090057]  ? selinux_socket_sendmsg+0x24/0x30
      [ 2327.095109]  ? netlink_unicast+0x3b0/0x3b0
      [ 2327.099678]  sock_sendmsg+0x74/0x80
      [ 2327.103567]  ___sys_sendmsg+0x520/0x530
      [ 2327.107844]  ? __get_locked_pte+0x178/0x200
      [ 2327.112510]  ? copy_msghdr_from_user+0x270/0x270
      [ 2327.117660]  ? vm_insert_page+0x360/0x360
      [ 2327.122133]  ? vm_insert_pfn_prot+0xb4/0x150
      [ 2327.126895]  ? vm_insert_pfn+0x32/0x40
      [ 2327.131077]  ? vvar_fault+0x71/0xd0
      [ 2327.134968]  ? special_mapping_fault+0x69/0x110
      [ 2327.140022]  ? __do_fault+0x42/0x120
      [ 2327.144008]  ? __handle_mm_fault+0x1062/0x17a0
      [ 2327.148965]  ? __fget_light+0xa7/0xc0
      [ 2327.153049]  __sys_sendmsg+0xcb/0x150
      [ 2327.157133]  ? __sys_sendmsg+0xcb/0x150
      [ 2327.161409]  ? SyS_shutdown+0x140/0x140
      [ 2327.165688]  ? exit_to_usermode_loop+0xd0/0xd0
      [ 2327.170646]  ? __do_page_fault+0x55d/0x620
      [ 2327.175216]  ? __sys_sendmsg+0x150/0x150
      [ 2327.179591]  SyS_sendmsg+0x12/0x20
      [ 2327.183384]  do_syscall_64+0xe3/0x230
      [ 2327.187471]  entry_SYSCALL64_slow_path+0x25/0x25
      [ 2327.192622] RIP: 0033:0x7f41d18fa3b0
      [ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0
      [ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003
      [ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040
      [ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003
      [ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084
      [ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64
      [ 2327.251953] Allocated:
      [ 2327.254581] PID = 9484
      [ 2327.257215]  save_stack_trace+0x1b/0x20
      [ 2327.261485]  save_stack+0x46/0xd0
      [ 2327.265179]  kasan_kmalloc+0xad/0xe0
      [ 2327.269165]  kmem_cache_alloc_trace+0xe6/0x1d0
      [ 2327.274138]  sctp_add_bind_addr+0x58/0x180 [sctp]
      [ 2327.279400]  sctp_do_bind+0x208/0x310 [sctp]
      [ 2327.284176]  sctp_bind+0x61/0xa0 [sctp]
      [ 2327.288455]  inet_bind+0x5f/0x3a0
      [ 2327.292151]  SYSC_bind+0x1a4/0x1e0
      [ 2327.295944]  SyS_bind+0xe/0x10
      [ 2327.299349]  do_syscall_64+0xe3/0x230
      [ 2327.303433]  return_from_SYSCALL_64+0x0/0x6a
      [ 2327.308194] Freed:
      [ 2327.310434] PID = 4131
      [ 2327.313065]  save_stack_trace+0x1b/0x20
      [ 2327.317344]  save_stack+0x46/0xd0
      [ 2327.321040]  kasan_slab_free+0x73/0xc0
      [ 2327.325220]  kfree+0x96/0x1a0
      [ 2327.328530]  dynamic_kobj_release+0x15/0x40
      [ 2327.333195]  kobject_release+0x99/0x1e0
      [ 2327.337472]  kobject_put+0x38/0x70
      [ 2327.341266]  free_notes_attrs+0x66/0x80
      [ 2327.345545]  mod_sysfs_teardown+0x1a5/0x270
      [ 2327.350211]  free_module+0x20/0x2a0
      [ 2327.354099]  SyS_delete_module+0x2cb/0x2f0
      [ 2327.358667]  do_syscall_64+0xe3/0x230
      [ 2327.362750]  return_from_SYSCALL_64+0x0/0x6a
      [ 2327.367510] Memory state around the buggy address:
      [ 2327.372855]  ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc
      [ 2327.380914]  ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00
      [ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb
      [ 2327.397031]                                ^
      [ 2327.401792]  ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
      [ 2327.409850]  ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00
      [ 2327.417907] ==================================================================
      
      This fixes CVE-2017-7558.
      
      References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266
      Fixes: 8f840e47 ("sctp: add the sctp_diag.c file")
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6da13824