1. 06 Feb, 2019 30 commits
    • Toshiaki Makita's avatar
      virtio_net: Don't process redirected XDP frames when XDP is disabled · 05e260f9
      Toshiaki Makita authored
      [ Upstream commit 03aa6d34 ]
      
      Commit 8dcc5b0a ("virtio_net: fix ndo_xdp_xmit crash towards dev not
      ready for XDP") tried to avoid access to unexpected sq while XDP is
      disabled, but was not complete.
      
      There was a small window which causes out of bounds sq access in
      virtnet_xdp_xmit() while disabling XDP.
      
      An example case of
       - curr_queue_pairs = 6 (2 for SKB and 4 for XDP)
       - online_cpu_num = xdp_queue_paris = 4
      when XDP is enabled:
      
      CPU 0                         CPU 1
      (Disabling XDP)               (Processing redirected XDP frames)
      
                                    virtnet_xdp_xmit()
      virtnet_xdp_set()
       _virtnet_set_queues()
        set curr_queue_pairs (2)
                                     check if rq->xdp_prog is not NULL
                                     virtnet_xdp_sq(vi)
                                      qp = curr_queue_pairs -
                                           xdp_queue_pairs +
                                           smp_processor_id()
                                         = 2 - 4 + 1 = -1
                                      sq = &vi->sq[qp] // out of bounds access
        set xdp_queue_pairs (0)
        rq->xdp_prog = NULL
      
      Basically we should not change curr_queue_pairs and xdp_queue_pairs
      while someone can read the values. Thus, when disabling XDP, assign NULL
      to rq->xdp_prog first, and wait for RCU grace period, then change
      xxx_queue_pairs.
      Note that we need to keep the current order when enabling XDP though.
      
      - v2: Make rcu_assign_pointer/synchronize_net conditional instead of
            _virtnet_set_queues.
      
      Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05e260f9
    • Toshiaki Makita's avatar
      virtio_net: Fix out of bounds access of sq · 0921dd50
      Toshiaki Makita authored
      [ Upstream commit 1667c08a ]
      
      When XDP is disabled, curr_queue_pairs + smp_processor_id() can be
      larger than max_queue_pairs.
      There is no guarantee that we have enough XDP send queues dedicated for
      each cpu when XDP is disabled, so do not count drops on sq in that case.
      
      Fixes: 5b8f3c8d ("virtio_net: Add XDP related stats")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0921dd50
    • Toshiaki Makita's avatar
      virtio_net: Fix not restoring real_num_rx_queues · d97117bd
      Toshiaki Makita authored
      [ Upstream commit 188313c1 ]
      
      When _virtnet_set_queues() failed we did not restore real_num_rx_queues.
      Fix this by placing the change of real_num_rx_queues after
      _virtnet_set_queues().
      This order is also in line with virtnet_set_channels().
      
      Fixes: 4941d472 ("virtio-net: do not reset during XDP set")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d97117bd
    • Toshiaki Makita's avatar
      virtio_net: Don't call free_old_xmit_skbs for xdp_frames · 4c2e63dc
      Toshiaki Makita authored
      [ Upstream commit 534da5e8 ]
      
      When napi_tx is enabled, virtnet_poll_cleantx() called
      free_old_xmit_skbs() even for xdp send queue.
      This is bogus since the queue has xdp_frames, not sk_buffs, thus mangled
      device tx bytes counters because skb->len is meaningless value, and even
      triggered oops due to general protection fault on freeing them.
      
      Since xdp send queues do not aquire locks, old xdp_frames should be
      freed only in virtnet_xdp_xmit(), so just skip free_old_xmit_skbs() for
      xdp send queues.
      
      Similarly virtnet_poll_tx() called free_old_xmit_skbs(). This NAPI
      handler is called even without calling start_xmit() because cb for tx is
      by default enabled. Once the handler is called, it enabled the cb again,
      and then the handler would be called again. We don't need this handler
      for XDP, so don't enable cb as well as not calling free_old_xmit_skbs().
      
      Also, we need to disable tx NAPI when disabling XDP, so
      virtnet_poll_tx() can safely access curr_queue_pairs and
      xdp_queue_pairs, which are not atomically updated while disabling XDP.
      
      Fixes: b92f1e67 ("virtio-net: transmit napi")
      Fixes: 7b0411ef ("virtio-net: clean tx descriptors from rx napi")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c2e63dc
    • Toshiaki Makita's avatar
      virtio_net: Don't enable NAPI when interface is down · b6862baa
      Toshiaki Makita authored
      [ Upstream commit 8be4d9a4 ]
      
      Commit 4e09ff53 ("virtio-net: disable NAPI only when enabled during
      XDP set") tried to fix inappropriate NAPI enabling/disabling when
      !netif_running(), but was not complete.
      
      On error path virtio_net could enable NAPI even when !netif_running().
      This can cause enabling NAPI twice on virtnet_open(), which would
      trigger BUG_ON() in napi_enable().
      
      Fixes: 4941d472 ("virtio-net: do not reset during XDP set")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6862baa
    • Xin Long's avatar
      sctp: set flow sport from saddr only when it's 0 · 37b34a91
      Xin Long authored
      [ Upstream commit ecf938fe ]
      
      Now sctp_transport_pmtu() passes transport->saddr into .get_dst() to set
      flow sport from 'saddr'. However, transport->saddr is set only when
      transport->dst exists in sctp_transport_route().
      
      If sctp_transport_pmtu() is called without transport->saddr set, like
      when transport->dst doesn't exists, the flow sport will be set to 0
      from transport->saddr, which will cause a wrong route to be got.
      
      Commit 6e91b578 ("sctp: re-use sctp_transport_pmtu in
      sctp_transport_route") made the issue be triggered more easily
      since sctp_transport_pmtu() would be called in sctp_transport_route()
      after that.
      
      In gerneral, fl4->fl4_sport should always be set to
      htons(asoc->base.bind_addr.port), unless transport->asoc doesn't exist
      in sctp_v4/6_get_dst(), which is the case:
      
        sctp_ootb_pkt_new() ->
          sctp_transport_route()
      
      For that, we can simply handle it by setting flow sport from saddr only
      when it's 0 in sctp_v4/6_get_dst().
      
      Fixes: 6e91b578 ("sctp: re-use sctp_transport_pmtu in sctp_transport_route")
      Reported-by: default avatarYing Xu <yinxu@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37b34a91
    • Xin Long's avatar
      sctp: set chunk transport correctly when it's a new asoc · cbf23d40
      Xin Long authored
      [ Upstream commit 4ff40b86 ]
      
      In the paths:
      
        sctp_sf_do_unexpected_init() ->
          sctp_make_init_ack()
        sctp_sf_do_dupcook_a/b()() ->
          sctp_sf_do_5_1D_ce()
      
      The new chunk 'retval' transport is set from the incoming chunk 'chunk'
      transport. However, 'retval' transport belong to the new asoc, which
      is a different one from 'chunk' transport's asoc.
      
      It will cause that the 'retval' chunk gets set with a wrong transport.
      Later when sending it and because of Commit b9fd6839 ("sctp: add
      sctp_packet_singleton"), sctp_packet_singleton() will set some fields,
      like vtag to 'retval' chunk from that wrong transport's asoc.
      
      This patch is to fix it by setting 'retval' transport correctly which
      belongs to the right asoc in sctp_make_init_ack() and
      sctp_sf_do_5_1D_ce().
      
      Fixes: b9fd6839 ("sctp: add sctp_packet_singleton")
      Reported-by: default avatarYing Xu <yinxu@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbf23d40
    • Bodong Wang's avatar
      Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager" · a188f568
      Bodong Wang authored
      [ Upstream commit 4e046de0 ]
      
      This reverts commit 5f5991f3.
      
      With the original commit, eswitch instance will not be initialized for
      a function which is vport group manager but not eswitch manager such as
      host PF on SmartNIC (BlueField) card. This will result in a kernel crash
      when such a vport group manager is trying to access vports in its group.
      E.g, PF vport manager (not eswitch manager) tries to configure the MAC
      of its VF vport, a kernel trace will happen similar as bellow:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       ...
       RIP: 0010:mlx5_eswitch_get_vport_config+0xc/0x180 [mlx5_core]
       ...
      
      Fixes: 5f5991f3 ("net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager")
      Signed-off-by: default avatarBodong Wang <bodong@mellanox.com>
      Reported-by: default avatarYuval Avnery <yuvalav@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a188f568
    • Nir Dotan's avatar
      ip6mr: Fix notifiers call on mroute_clean_tables() · 505e5f3d
      Nir Dotan authored
      [ Upstream commit 146820cc ]
      
      When the MC route socket is closed, mroute_clean_tables() is called to
      cleanup existing routes. Mistakenly notifiers call was put on the cleanup
      of the unresolved MC route entries cache.
      In a case where the MC socket closes before an unresolved route expires,
      the notifier call leads to a crash, caused by the driver trying to
      increment a non initialized refcount_t object [1] and then when handling
      is done, to decrement it [2]. This was detected by a test recently added in
      commit 6d4efada ("selftests: forwarding: Add multicast routing test").
      
      Fix that by putting notifiers call on the resolved entries traversal,
      instead of on the unresolved entries traversal.
      
      [1]
      
      [  245.748967] refcount_t: increment on 0; use-after-free.
      [  245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
      ...
      [  245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      [  245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
      ...
      [  245.907487] Call Trace:
      [  245.910231]  mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
      [  245.917913]  notifier_call_chain+0x45/0x7
      [  245.922484]  atomic_notifier_call_chain+0x15/0x20
      [  245.927729]  call_fib_notifiers+0x15/0x30
      [  245.932205]  mroute_clean_tables+0x372/0x3f
      [  245.936971]  ip6mr_sk_done+0xb1/0xc0
      [  245.940960]  ip6_mroute_setsockopt+0x1da/0x5f0
      ...
      
      [2]
      
      [  246.128487] refcount_t: underflow; use-after-free.
      [  246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
      [  246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      ...
      [  246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
      [  246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
      ...
      [  246.298889] Call Trace:
      [  246.301617]  refcount_dec_and_test_checked+0x11/0x20
      [  246.307170]  mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
      [  246.315531]  process_one_work+0x1fa/0x3f0
      [  246.320005]  worker_thread+0x2f/0x3e0
      [  246.324083]  kthread+0x118/0x130
      [  246.327683]  ? wq_update_unbound_numa+0x1b0/0x1b0
      [  246.332926]  ? kthread_park+0x80/0x80
      [  246.337013]  ret_from_fork+0x1f/0x30
      
      Fixes: 088aa3ee ("ip6mr: Support fib notifications")
      Signed-off-by: default avatarNir Dotan <nird@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      505e5f3d
    • Aya Levin's avatar
      net/mlx5e: Allow MAC invalidation while spoofchk is ON · 50990a40
      Aya Levin authored
      [ Upstream commit 9d2cbdc5 ]
      
      Prior to this patch the driver prohibited spoof checking on invalid MAC.
      Now the user can set this configuration if it wishes to.
      
      This is required since libvirt might invalidate the VF Mac by setting it
      to zero, while spoofcheck is ON.
      
      Fixes: 1ab2068a ("net/mlx5: Implement vports admin state backup/restore")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50990a40
    • Xin Long's avatar
      sctp: improve the events for sctp stream adding · 4ec13999
      Xin Long authored
      [ Upstream commit 8220c870 ]
      
      This patch is to improve sctp stream adding events in 2 places:
      
        1. In sctp_process_strreset_addstrm_out(), move up SCTP_MAX_STREAM
           and in stream allocation failure checks, as the adding has to
           succeed after reconf_timer stops for the in stream adding
           request retransmission.
      
        3. In sctp_process_strreset_addstrm_in(), no event should be sent,
           as no in or out stream is added here.
      
      Fixes: 50a41591 ("sctp: implement receiver-side procedures for the Add Outgoing Streams Request Parameter")
      Fixes: c5c4ebb3 ("sctp: implement receiver-side procedures for the Add Incoming Streams Request Parameter")
      Reported-by: default avatarYing Xu <yinxu@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ec13999
    • Lorenzo Bianconi's avatar
      net: ip6_gre: always reports o_key to userspace · 9f7d849b
      Lorenzo Bianconi authored
      [ Upstream commit c706863b ]
      
      As Erspan_v4, Erspan_v6 protocol relies on o_key to configure
      session id header field. However TUNNEL_KEY bit is cleared in
      ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field
      of the external GRE header and so the configured o_key is not reported
      to userspace. The issue can be triggered with the following reproducer:
      
      $ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \
          key 1 seq erspan_ver 1
      $ip link set ip6erspan1 up
      ip -d link sh ip6erspan1
      
      ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT
          link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
          ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq
      
      Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
      ip6gre_fill_info
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f7d849b
    • Jason Wang's avatar
      vhost: fix OOB in get_rx_bufs() · aafe74b7
      Jason Wang authored
      [ Upstream commit b46a0bf7 ]
      
      After batched used ring updating was introduced in commit e2b3b35e
      ("vhost_net: batch used ring update in rx"). We tend to batch heads in
      vq->heads for more than one packet. But the quota passed to
      get_rx_bufs() was not correctly limited, which can result a OOB write
      in vq->heads.
      
              headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                          vhost_len, &in, vq_log, &log,
                          likely(mergeable) ? UIO_MAXIOV : 1);
      
      UIO_MAXIOV was still used which is wrong since we could have batched
      used in vq->heads, this will cause OOB if the next buffer needs more
      than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
      batched 64 (VHOST_NET_BATCH) heads:
      Acked-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      
      =============================================================================
      BUG kmalloc-8k (Tainted: G    B            ): Redzone overwritten
      -----------------------------------------------------------------------------
      
      INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
      INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
          kmem_cache_alloc_trace+0xbb/0x140
          alloc_pd+0x22/0x60
          gen8_ppgtt_create+0x11d/0x5f0
          i915_ppgtt_create+0x16/0x80
          i915_gem_create_context+0x248/0x390
          i915_gem_context_create_ioctl+0x4b/0xe0
          drm_ioctl_kernel+0xa5/0xf0
          drm_ioctl+0x2ed/0x3a0
          do_vfs_ioctl+0x9f/0x620
          ksys_ioctl+0x6b/0x80
          __x64_sys_ioctl+0x11/0x20
          do_syscall_64+0x43/0xf0
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x          (null) flags=0x200000000010201
      INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b
      
      Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
      vhost-net. This is done through set the limitation through
      vhost_dev_init(), then set_owner can allocate the number of iov in a
      per device manner.
      
      This fixes CVE-2018-16880.
      
      Fixes: e2b3b35e ("vhost_net: batch used ring update in rx")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aafe74b7
    • Mathias Thore's avatar
      ucc_geth: Reset BQL queue when stopping device · d0773852
      Mathias Thore authored
      [ Upstream commit e15aa3b2 ]
      
      After a timeout event caused by for example a broadcast storm, when
      the MAC and PHY are reset, the BQL TX queue needs to be reset as
      well. Otherwise, the device will exhibit severe performance issues
      even after the storm has ended.
      Co-authored-by: default avatarDavid Gounaris <david.gounaris@infinera.com>
      Signed-off-by: default avatarMathias Thore <mathias.thore@infinera.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0773852
    • George Amanakis's avatar
      tun: move the call to tun_set_real_num_queues · dace5274
      George Amanakis authored
      [ Upstream commit 3a03cb84 ]
      
      Call tun_set_real_num_queues() after the increment of tun->numqueues
      since the former depends on it. Otherwise, the number of queues is not
      correctly accounted for, which results to warnings similar to:
      "vnet0 selects TX queue 11, but real number of TX queues is 11".
      
      Fixes: 0b7959b6 ("tun: publish tfile after it's fully initialized")
      Reported-and-tested-by: default avatarGeorge Amanakis <gamanakis@gmail.com>
      Signed-off-by: default avatarGeorge Amanakis <gamanakis@gmail.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dace5274
    • Xin Long's avatar
      sctp: improve the events for sctp stream reset · e569927a
      Xin Long authored
      [ Upstream commit 2e6dc4d9 ]
      
      This patch is to improve sctp stream reset events in 4 places:
      
        1. In sctp_process_strreset_outreq(), the flag should always be set with
           SCTP_STREAM_RESET_INCOMING_SSN instead of OUTGOING, as receiver's in
           stream is reset here.
        2. In sctp_process_strreset_outreq(), move up SCTP_STRRESET_ERR_WRONG_SSN
           check, as the reset has to succeed after reconf_timer stops for the
           in stream reset request retransmission.
        3. In sctp_process_strreset_inreq(), no event should be sent, as no in
           or out stream is reset here.
        4. In sctp_process_strreset_resp(), SCTP_STREAM_RESET_INCOMING_SSN or
           OUTGOING event should always be sent for stream reset requests, no
           matter it fails or succeeds to process the request.
      
      Fixes: 81054476 ("sctp: implement receiver-side procedures for the Outgoing SSN Reset Request Parameter")
      Fixes: 16e1a919 ("sctp: implement receiver-side procedures for the Incoming SSN Reset Request Parameter")
      Fixes: 11ae76e6 ("sctp: implement receiver-side procedures for the Reconf Response Parameter")
      Reported-by: default avatarYing Xu <yinxu@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e569927a
    • Simon Horman's avatar
      ravb: expand rx descriptor data to accommodate hw checksum · 4fae696c
      Simon Horman authored
      [ Upstream commit 12da6430 ]
      
      EtherAVB may provide a checksum of packet data appended to packet data. In
      order to allow this checksum to be received by the host descriptor data
      needs to be enlarged by 2 bytes to accommodate the checksum.
      
      In the case of MTU-sized packets without a VLAN tag the
      checksum were already accommodated by virtue of the space reserved for the
      VLAN tag. However, a packet of MTU-size with a  VLAN tag consumed all
      packet data space provided by a descriptor leaving no space for the
      trailing checksum.
      
      This was not detected by the driver which incorrectly used the last two
      bytes of packet data as the checksum and truncate the packet by two bytes.
      This resulted all such packets being dropped.
      
      A work around is to disable RX checksum offload
       # ethtool -K eth0 rx off
      
      This patch resolves this problem by increasing the size available for
      packet data in RX descriptors by two bytes.
      
      Tested on R-Car E3 (r8a77990) ES1.0 based Ebisu-4D board
      
      v2
      * Use sizeof(__sum16) directly rather than adding a driver-local
        #define for the size of the checksum provided by the hw (2 bytes).
      
      Fixes: 4d86d381 ("ravb: RX checksum offload")
      Signed-off-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Reviewed-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fae696c
    • Josh Elsasser's avatar
      net: set default network namespace in init_dummy_netdev() · 5f1a18e0
      Josh Elsasser authored
      [ Upstream commit 35edfdc7 ]
      
      Assign a default net namespace to netdevs created by init_dummy_netdev().
      Fixes a NULL pointer dereference caused by busy-polling a socket bound to
      an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
      if napi_poll() received packets:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
        IP: napi_busy_loop+0xd6/0x200
        Call Trace:
          sock_poll+0x5e/0x80
          do_sys_poll+0x324/0x5a0
          SyS_poll+0x6c/0xf0
          do_syscall_64+0x6b/0x1f0
          entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 7db6b048 ("net: Commonize busy polling code to focus on napi_id instead of socket")
      Signed-off-by: default avatarJosh Elsasser <jelsasser@appneta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f1a18e0
    • Bernard Pidoux's avatar
      net/rose: fix NULL ax25_cb kernel panic · fc4154c7
      Bernard Pidoux authored
      [ Upstream commit b0cf0292 ]
      
      When an internally generated frame is handled by rose_xmit(),
      rose_route_frame() is called:
      
              if (!rose_route_frame(skb, NULL)) {
                      dev_kfree_skb(skb);
                      stats->tx_errors++;
                      return NETDEV_TX_OK;
              }
      
      We have the same code sequence in Net/Rom where an internally generated
      frame is handled by nr_xmit() calling nr_route_frame(skb, NULL).
      However, in this function NULL argument is tested while it is not in
      rose_route_frame().
      Then kernel panic occurs later on when calling ax25cmp() with a NULL
      ax25_cb argument as reported many times and recently with syzbot.
      
      We need to test if ax25 is NULL before using it.
      
      Testing:
      Built kernel with CONFIG_ROSE=y.
      Signed-off-by: default avatarBernard Pidoux <f6bvp@free.fr>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: syzbot+1a2c456a1ea08fa5b5f7@syzkaller.appspotmail.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Bernard Pidoux <f6bvp@free.fr>
      Cc: linux-hams@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc4154c7
    • Cong Wang's avatar
      netrom: switch to sock timer API · 2c6b5724
      Cong Wang authored
      [ Upstream commit 63346650 ]
      
      sk_reset_timer() and sk_stop_timer() properly handle
      sock refcnt for timer function. Switching to them
      could fix a refcounting bug reported by syzbot.
      
      Reported-and-tested-by: syzbot+defa700d16f1bd1b9a05@syzkaller.appspotmail.com
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-hams@vger.kernel.org
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c6b5724
    • Aya Levin's avatar
      net/mlx4_core: Add masking for a few queries on HCA caps · 00865891
      Aya Levin authored
      [ Upstream commit a40ded60 ]
      
      Driver reads the query HCA capabilities without the corresponding masks.
      Without the correct masks, the base addresses of the queues are
      unaligned.  In addition some reserved bits were wrongly read.  Using the
      correct masks, ensures alignment of the base addresses and allows future
      firmware versions safe use of the reserved bits.
      
      Fixes: ab9c17a0 ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet")
      Fixes: 0ff1fb65 ("{NET, IB}/mlx4: Add device managed flow steering firmware API")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00865891
    • Lorenzo Bianconi's avatar
      net: ip_gre: use erspan key field for tunnel lookup · 0a198e0b
      Lorenzo Bianconi authored
      [ Upstream commit cb73ee40 ]
      
      Use ERSPAN key header field as tunnel key in gre_parse_header routine
      since ERSPAN protocol sets the key field of the external GRE header to
      0 resulting in a tunnel lookup fail in ip6gre_err.
      In addition remove key field parsing and pskb_may_pull check in
      erspan_rcv and ip6erspan_rcv
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a198e0b
    • Lorenzo Bianconi's avatar
      net: ip_gre: always reports o_key to userspace · 897ea28b
      Lorenzo Bianconi authored
      [ Upstream commit feaf5c79 ]
      
      Erspan protocol (version 1 and 2) relies on o_key to configure
      session id header field. However TUNNEL_KEY bit is cleared in
      erspan_xmit since ERSPAN protocol does not set the key field
      of the external GRE header and so the configured o_key is not reported
      to userspace. The issue can be triggered with the following reproducer:
      
      $ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \
          key 1 seq erspan_ver 1
      $ip link set erspan1 up
      $ip -d link sh erspan1
      
      erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT
        link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
        erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0
      
      Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
      ipgre_fill_info
      
      Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      897ea28b
    • Jacob Wen's avatar
      l2tp: fix reading optional fields of L2TPv3 · 8de67666
      Jacob Wen authored
      [ Upstream commit 4522a70d ]
      
      Use pskb_may_pull() to make sure the optional fields are in skb linear
      parts, so we can safely read them later.
      
      It's easy to reproduce the issue with a net driver that supports paged
      skb data. Just create a L2TPv3 over IP tunnel and then generates some
      network traffic.
      Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.
      
      Changes in v4:
      1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/
      2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/
      3. Add 'Fixes' in commit messages.
      
      Changes in v3:
      1. To keep consistency, move the code out of l2tp_recv_common.
      2. Use "net" instead of "net-next", since this is a bug fix.
      
      Changes in v2:
      1. Only fix L2TPv3 to make code simple.
         To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common.
         It's complicated to do so.
      2. Reloading pointers after pskb_may_pull
      
      Fixes: f7faffa3 ("l2tp: Add L2TPv3 protocol support")
      Fixes: 0d76751f ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
      Fixes: a32e0eec ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
      Signed-off-by: default avatarJacob Wen <jian.w.wen@oracle.com>
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8de67666
    • Jacob Wen's avatar
      l2tp: copy 4 more bytes to linear part if necessary · 3d418a25
      Jacob Wen authored
      [ Upstream commit 91c52470 ]
      
      The size of L2TPv2 header with all optional fields is 14 bytes.
      l2tp_udp_recv_core only moves 10 bytes to the linear part of a
      skb. This may lead to l2tp_recv_common read data outside of a skb.
      
      This patch make sure that there is at least 14 bytes in the linear
      part of a skb to meet the maximum need of l2tp_udp_recv_core and
      l2tp_recv_common. The minimum size of both PPP HDLC-like frame and
      Ethernet frame is larger than 14 bytes, so we are safe to do so.
      
      Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now.
      
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Suggested-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJacob Wen <jian.w.wen@oracle.com>
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d418a25
    • Daniel Borkmann's avatar
      ipvlan, l3mdev: fix broken l3s mode wrt local routes · fcc9c69a
      Daniel Borkmann authored
      [ Upstream commit d5256083 ]
      
      While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
      I ran into the issue that while l3 mode is working fine, l3s mode
      does not have any connectivity to kube-apiserver and hence all pods
      end up in Error state as well. The ipvlan master device sits on
      top of a bond device and hostns traffic to kube-apiserver (also running
      in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
      where the latter is the address of the bond0. While in l3 mode, a
      curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
      works fine from hostns, neither of them do in case of l3s. In the
      latter only a curl to https://127.0.0.1:37573 appeared to work where
      for local addresses of bond0 I saw kernel suddenly starting to emit
      ARP requests to query HW address of bond0 which remained unanswered
      and neighbor entries in INCOMPLETE state. These ARP requests only
      happen while in l3s.
      
      Debugging this further, I found the issue is that l3s mode is piggy-
      backing on l3 master device, and in this case local routes are using
      l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
      f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev
      if relevant") and 5f02ce24 ("net: l3mdev: Allow the l3mdev to be
      a loopback"). I found that reverting them back into using the
      net->loopback_dev fixed ipvlan l3s connectivity and got everything
      working for the CNI.
      
      Now judging from 4fbae7d8 ("ipvlan: Introduce l3s mode") and the
      l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
      on l3 master device is to get the l3mdev_ip_rcv() receive hook for
      setting the dst entry of the input route without adding its own
      ipvlan specific hacks into the receive path, however, any l3 domain
      semantics beyond just that are breaking l3s operation. Note that
      ipvlan also has the ability to dynamically switch its internal
      operation from l3 to l3s for all ports via ipvlan_set_port_mode()
      at runtime. In any case, l3 vs l3s soley distinguishes itself by
      'de-confusing' netfilter through switching skb->dev to ipvlan slave
      device late in NF_INET_LOCAL_IN before handing the skb to L4.
      
      Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
      if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
      without any additional l3mdev semantics on top. This should also have
      minimal impact since dev->priv_flags is already hot in cache. With
      this set, l3s mode is working fine and I also get things like
      masquerading pod traffic on the ipvlan master properly working.
      
        [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
      
      Fixes: f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
      Fixes: 5f02ce24 ("net: l3mdev: Allow the l3mdev to be a loopback")
      Fixes: 4fbae7d8 ("ipvlan: Introduce l3s mode")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Martynas Pumputis <m@lambda.lt>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcc9c69a
    • Yohei Kanemaru's avatar
      ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation · 2f704348
      Yohei Kanemaru authored
      [ Upstream commit ef489749 ]
      
      skb->cb may contain data from previous layers (in an observed case
      IPv4 with L3 Master Device). In the observed scenario, the data in
      IPCB(skb)->frags was misinterpreted as IP6CB(skb)->frag_max_size,
      eventually caused an unexpected IPv6 fragmentation in ip6_fragment()
      through ip6_finish_output().
      
      This patch clears IP6CB(skb), which potentially contains garbage data,
      on the SRH ip4ip6 encapsulation.
      
      Fixes: 32d99d0b ("ipv6: sr: add support for ip4ip6 encapsulation")
      Signed-off-by: default avatarYohei Kanemaru <yohei.kanemaru@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f704348
    • David Ahern's avatar
      ipv6: Consider sk_bound_dev_if when binding a socket to an address · 7e9a6476
      David Ahern authored
      [ Upstream commit c5ee0663 ]
      
      IPv6 does not consider if the socket is bound to a device when binding
      to an address. The result is that a socket can be bound to eth0 and then
      bound to the address of eth1. If the device is a VRF, the result is that
      a socket can only be bound to an address in the default VRF.
      
      Resolve by considering the device if sk_bound_dev_if is set.
      
      This problem exists from the beginning of git history.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e9a6476
    • Arnd Bergmann's avatar
      drm/msm/gpu: fix building without debugfs · 8877843b
      Arnd Bergmann authored
      commit c878a628 upstream.
      
      When debugfs is disabled, but coredump is turned on, the adreno driver fails to build:
      
      drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:4: error: 'struct msm_gpu_funcs' has no member named 'show'
         .show = adreno_show,
          ^~~~
      drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:11: note: (near initialization for 'funcs.base')
      drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:11: error: initialization of 'void (*)(struct msm_gpu *, struct msm_gem_submit *, struct msm_file_private *)' from incompatible pointer type 'void (*)(struct msm_gpu *, struct msm_gpu_state *, struct drm_printer *)' [-Werror=incompatible-pointer-types]
      drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:11: note: (near initialization for 'funcs.base.submit')
      drivers/gpu/drm/msm/adreno/a4xx_gpu.c:546:4: error: 'struct msm_gpu_funcs' has no member named 'show'
      drivers/gpu/drm/msm/adreno/a5xx_gpu.c:1460:4: error: 'struct msm_gpu_funcs' has no member named 'show'
      drivers/gpu/drm/msm/adreno/a6xx_gpu.c:769:4: error: 'struct msm_gpu_funcs' has no member named 'show'
      drivers/gpu/drm/msm/msm_gpu.c: In function 'msm_gpu_devcoredump_read':
      drivers/gpu/drm/msm/msm_gpu.c:289:12: error: 'const struct msm_gpu_funcs' has no member named 'show'
      
      Adjust the #ifdef to make it build again.
      
      Fixes: c0fec7f5 ("drm/msm/gpu: Capture the GPU state on a GPU hang")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8877843b
    • Greg Kroah-Hartman's avatar
      Fix "net: ipv4: do not handle duplicate fragments as overlapping" · 8c763a3c
      Greg Kroah-Hartman authored
      ade44640 ("net: ipv4: do not handle duplicate fragments as
      overlapping") was backported to many stable trees, but it had a problem
      that was "accidentally" fixed by the upstream commit 0ff89efb ("ip:
      fail fast on IP defrag errors")
      
      This is the fixup for that problem as we do not want the larger patch in
      the older stable trees.
      
      Fixes: ade44640 ("net: ipv4: do not handle duplicate fragments as overlapping")
      Reported-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c763a3c
  2. 31 Jan, 2019 10 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.19.19 · dffbba43
      Greg Kroah-Hartman authored
      dffbba43
    • Deepa Dinamani's avatar
      Input: input_event - fix the CONFIG_SPARC64 mixup · 3a3b6a6b
      Deepa Dinamani authored
      commit 141e5dca upstream.
      
      Arnd Bergmann pointed out that CONFIG_* cannot be used in a uapi header.
      Override with an equivalent conditional.
      
      Fixes: 2e746942 ("Input: input_event - provide override for sparc64")
      Fixes: 152194fe ("Input: extend usable life of event timestamps to 2106 on 32 bit systems")
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a3b6a6b
    • Christoph Hellwig's avatar
      ide: fix a typo in the settings proc file name · d4a6ac28
      Christoph Hellwig authored
      commit f8ff6c73 upstream.
      
      Fixes: ec7d9c9c ("ide: replace ->proc_fops with ->proc_show")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4a6ac28
    • Jack Pham's avatar
      usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup · 25ad17d6
      Jack Pham authored
      commit bd674224 upstream.
      
      OUT endpoint requests may somtimes have this flag set when
      preparing to be submitted to HW indicating that there is an
      additional TRB chained to the request for alignment purposes.
      If that request is removed before the controller can execute the
      transfer (e.g. ep_dequeue/ep_disable), the request will not go
      through the dwc3_gadget_ep_cleanup_completed_request() handler
      and will not have its needs_extra_trb flag cleared when
      dwc3_gadget_giveback() is called.  This same request could be
      later requeued for a new transfer that does not require an
      extra TRB and if it is successfully completed, the cleanup
      and TRB reclamation will incorrectly process the additional TRB
      which belongs to the next request, and incorrectly advances the
      TRB dequeue pointer, thereby messing up calculation of the next
      requeust's actual/remaining count when it completes.
      
      The right thing to do here is to ensure that the flag is cleared
      before it is given back to the function driver.  A good place
      to do that is in dwc3_gadget_del_and_unmap_request().
      
      Fixes: c6267a51 ("usb: dwc3: gadget: align transfers to wMaxPacketSize")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJack Pham <jackp@codeaurora.org>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      [jackp: backport to <= 4.20: replaced 'needs_extra_trb' with 'unaligned'
              and 'zero' members in patch and reworded commit text]
      Signed-off-by: default avatarJack Pham <jackp@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25ad17d6
    • Michal Hocko's avatar
      Revert "mm, memory_hotplug: initialize struct pages for the full memory section" · 6bab9573
      Michal Hocko authored
      commit 4aa9fc2a upstream.
      
      This reverts commit 2830bf6f.
      
      The underlying assumption that one sparse section belongs into a single
      numa node doesn't hold really. Robert Shteynfeld has reported a boot
      failure. The boot log was not captured but his memory layout is as
      follows:
      
        Early memory node ranges
          node   1: [mem 0x0000000000001000-0x0000000000090fff]
          node   1: [mem 0x0000000000100000-0x00000000dbdf8fff]
          node   1: [mem 0x0000000100000000-0x0000001423ffffff]
          node   0: [mem 0x0000001424000000-0x0000002023ffffff]
      
      This means that node0 starts in the middle of a memory section which is
      also in node1.  memmap_init_zone tries to initialize padding of a
      section even when it is outside of the given pfn range because there are
      code paths (e.g.  memory hotplug) which assume that the full worth of
      memory section is always initialized.
      
      In this particular case, though, such a range is already intialized and
      most likely already managed by the page allocator.  Scribbling over
      those pages corrupts the internal state and likely blows up when any of
      those pages gets used.
      Reported-by: default avatarRobert Shteynfeld <robert.shteynfeld@gmail.com>
      Fixes: 2830bf6f ("mm, memory_hotplug: initialize struct pages for the full memory section")
      Cc: stable@kernel.org
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bab9573
    • Raju Rangoju's avatar
      nvmet-rdma: fix null dereference under heavy load · 7dbf1297
      Raju Rangoju authored
      commit 5cbab630 upstream.
      
      Under heavy load if we don't have any pre-allocated rsps left, we
      dynamically allocate a rsp, but we are not actually allocating memory
      for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
      fields (req->rsp->status) in nvmet_req_init() will result in crash.
      
      To fix this, allocate the memory for nvme_completion by calling
      nvmet_rdma_alloc_rsp()
      
      Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarRaju Rangoju <rajur@chelsio.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dbf1297
    • Israel Rukshin's avatar
      nvmet-rdma: Add unlikely for response allocated check · fa9184be
      Israel Rukshin authored
      commit ad1f8249 upstream.
      Signed-off-by: default avatarIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Raju  Rangoju <rajur@chelsio.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa9184be
    • David Hildenbrand's avatar
      s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU · 48046a01
      David Hildenbrand authored
      commit 60f1bf29 upstream.
      
      When calling smp_call_ipl_cpu() from the IPL CPU, we will try to read
      from pcpu_devices->lowcore. However, due to prefixing, that will result
      in reading from absolute address 0 on that CPU. We have to go via the
      actual lowcore instead.
      
      This means that right now, we will read lc->nodat_stack == 0 and
      therfore work on a very wrong stack.
      
      This BUG essentially broke rebooting under QEMU TCG (which will report
      a low address protection exception). And checking under KVM, it is
      also broken under KVM. With 1 VCPU it can be easily triggered.
      
      :/# echo 1 > /proc/sys/kernel/sysrq
      :/# echo b > /proc/sysrq-trigger
      [   28.476745] sysrq: SysRq : Resetting
      [   28.476793] Kernel stack overflow.
      [   28.476817] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13
      [   28.476820] Hardware name: IBM 2964 NE1 716 (KVM/Linux)
      [   28.476826] Krnl PSW : 0400c00180000000 0000000000115c0c (pcpu_delegate+0x12c/0x140)
      [   28.476861]            R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
      [   28.476863] Krnl GPRS: ffffffffffffffff 0000000000000000 000000000010dff8 0000000000000000
      [   28.476864]            0000000000000000 0000000000000000 0000000000ab7090 000003e0006efbf0
      [   28.476864]            000000000010dff8 0000000000000000 0000000000000000 0000000000000000
      [   28.476865]            000000007fffc000 0000000000730408 000003e0006efc58 0000000000000000
      [   28.476887] Krnl Code: 0000000000115bfe: 4170f000            la      %r7,0(%r15)
      [   28.476887]            0000000000115c02: 41f0a000            la      %r15,0(%r10)
      [   28.476887]           #0000000000115c06: e370f0980024        stg     %r7,152(%r15)
      [   28.476887]           >0000000000115c0c: c0e5fffff86e        brasl   %r14,114ce8
      [   28.476887]            0000000000115c12: 41f07000            la      %r15,0(%r7)
      [   28.476887]            0000000000115c16: a7f4ffa8            brc     15,115b66
      [   28.476887]            0000000000115c1a: 0707                bcr     0,%r7
      [   28.476887]            0000000000115c1c: 0707                bcr     0,%r7
      [   28.476901] Call Trace:
      [   28.476902] Last Breaking-Event-Address:
      [   28.476920]  [<0000000000a01c4a>] arch_call_rest_init+0x22/0x80
      [   28.476927] Kernel panic - not syncing: Corrupt kernel stack, can't continue.
      [   28.476930] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13
      [   28.476932] Hardware name: IBM 2964 NE1 716 (KVM/Linux)
      [   28.476932] Call Trace:
      
      Fixes: 2f859d0d ("s390/smp: reduce size of struct pcpu")
      Cc: stable@vger.kernel.org # 4.0+
      Reported-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      48046a01
    • Daniel Borkmann's avatar
      bpf: fix inner map masking to prevent oob under speculation · 37c9e3ee
      Daniel Borkmann authored
      [ commit 9d5564dd upstream ]
      
      During review I noticed that inner meta map setup for map in
      map is buggy in that it does not propagate all needed data
      from the reference map which the verifier is later accessing.
      
      In particular one such case is index masking to prevent out of
      bounds access under speculative execution due to missing the
      map's unpriv_array/index_mask field propagation. Fix this such
      that the verifier is generating the correct code for inlined
      lookups in case of unpriviledged use.
      
      Before patch (test_verifier's 'map in map access' dump):
      
        # bpftool prog dump xla id 3
           0: (62) *(u32 *)(r10 -4) = 0
           1: (bf) r2 = r10
           2: (07) r2 += -4
           3: (18) r1 = map[id:4]
           5: (07) r1 += 272                |
           6: (61) r0 = *(u32 *)(r2 +0)     |
           7: (35) if r0 >= 0x1 goto pc+6   | Inlined map in map lookup
           8: (54) (u32) r0 &= (u32) 0      | with index masking for
           9: (67) r0 <<= 3                 | map->unpriv_array.
          10: (0f) r0 += r1                 |
          11: (79) r0 = *(u64 *)(r0 +0)     |
          12: (15) if r0 == 0x0 goto pc+1   |
          13: (05) goto pc+1                |
          14: (b7) r0 = 0                   |
          15: (15) if r0 == 0x0 goto pc+11
          16: (62) *(u32 *)(r10 -4) = 0
          17: (bf) r2 = r10
          18: (07) r2 += -4
          19: (bf) r1 = r0
          20: (07) r1 += 272                |
          21: (61) r0 = *(u32 *)(r2 +0)     | Index masking missing (!)
          22: (35) if r0 >= 0x1 goto pc+3   | for inner map despite
          23: (67) r0 <<= 3                 | map->unpriv_array set.
          24: (0f) r0 += r1                 |
          25: (05) goto pc+1                |
          26: (b7) r0 = 0                   |
          27: (b7) r0 = 0
          28: (95) exit
      
      After patch:
      
        # bpftool prog dump xla id 1
           0: (62) *(u32 *)(r10 -4) = 0
           1: (bf) r2 = r10
           2: (07) r2 += -4
           3: (18) r1 = map[id:2]
           5: (07) r1 += 272                |
           6: (61) r0 = *(u32 *)(r2 +0)     |
           7: (35) if r0 >= 0x1 goto pc+6   | Same inlined map in map lookup
           8: (54) (u32) r0 &= (u32) 0      | with index masking due to
           9: (67) r0 <<= 3                 | map->unpriv_array.
          10: (0f) r0 += r1                 |
          11: (79) r0 = *(u64 *)(r0 +0)     |
          12: (15) if r0 == 0x0 goto pc+1   |
          13: (05) goto pc+1                |
          14: (b7) r0 = 0                   |
          15: (15) if r0 == 0x0 goto pc+12
          16: (62) *(u32 *)(r10 -4) = 0
          17: (bf) r2 = r10
          18: (07) r2 += -4
          19: (bf) r1 = r0
          20: (07) r1 += 272                |
          21: (61) r0 = *(u32 *)(r2 +0)     |
          22: (35) if r0 >= 0x1 goto pc+4   | Now fixed inlined inner map
          23: (54) (u32) r0 &= (u32) 0      | lookup with proper index masking
          24: (67) r0 <<= 3                 | for map->unpriv_array.
          25: (0f) r0 += r1                 |
          26: (05) goto pc+1                |
          27: (b7) r0 = 0                   |
          28: (b7) r0 = 0
          29: (95) exit
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      37c9e3ee
    • Daniel Borkmann's avatar
      bpf: fix sanitation of alu op with pointer / scalar type from different paths · eed84f94
      Daniel Borkmann authored
      [ commit d3bd7413 upstream ]
      
      While 979d63d5 ("bpf: prevent out of bounds speculation on pointer
      arithmetic") took care of rejecting alu op on pointer when e.g. pointer
      came from two different map values with different map properties such as
      value size, Jann reported that a case was not covered yet when a given
      alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from
      different branches where we would incorrectly try to sanitize based
      on the pointer's limit. Catch this corner case and reject the program
      instead.
      
      Fixes: 979d63d5 ("bpf: prevent out of bounds speculation on pointer arithmetic")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eed84f94