1. 07 Oct, 2019 40 commits
    • Tuong Lien's avatar
      tipc: fix unlimited bundling of small messages · ed9420dd
      Tuong Lien authored
      [ Upstream commit e95584a8 ]
      
      We have identified a problem with the "oversubscription" policy in the
      link transmission code.
      
      When small messages are transmitted, and the sending link has reached
      the transmit window limit, those messages will be bundled and put into
      the link backlog queue. However, bundles of data messages are counted
      at the 'CRITICAL' level, so that the counter for that level, instead of
      the counter for the real, bundled message's level is the one being
      increased.
      Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
      to be tested against the unchanged counter for their own level, while
      contributing to an unrestrained increase at the CRITICAL backlog level.
      
      This leaves a gap in congestion control algorithm for small messages
      that can result in starvation for other users or a "real" CRITICAL
      user. Even that eventually can lead to buffer exhaustion & link reset.
      
      We fix this by keeping a 'target_bskb' buffer pointer at each levels,
      then when bundling, we only bundle messages at the same importance
      level only. This way, we know exactly how many slots a certain level
      have occupied in the queue, so can manage level congestion accurately.
      
      By bundling messages at the same level, we even have more benefits. Let
      consider this:
      - One socket sends 64-byte messages at the 'CRITICAL' level;
      - Another sends 4096-byte messages at the 'LOW' level;
      
      When a 64-byte message comes and is bundled the first time, we put the
      overhead of message bundle to it (+ 40-byte header, data copy, etc.)
      for later use, but the next message can be a 4096-byte one that cannot
      be bundled to the previous one. This means the last bundle carries only
      one payload message which is totally inefficient, as for the receiver
      also! Later on, another 64-byte message comes, now we make a new bundle
      and the same story repeats...
      
      With the new bundling algorithm, this will not happen, the 64-byte
      messages will be bundled together even when the 4096-byte message(s)
      comes in between. However, if the 4096-byte messages are sent at the
      same level i.e. 'CRITICAL', the bundling algorithm will again cause the
      same overhead.
      
      Also, the same will happen even with only one socket sending small
      messages at a rate close to the link transmit's one, so that, when one
      message is bundled, it's transmitted shortly. Then, another message
      comes, a new bundle is created and so on...
      
      We will solve this issue radically by another patch.
      
      Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Reported-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed9420dd
    • Dongli Zhang's avatar
      xen-netfront: do not use ~0U as error return value for xennet_fill_frags() · a1afd826
      Dongli Zhang authored
      [ Upstream commit a761129e ]
      
      xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
      to cache extra fragments. This is incorrect because the return type of
      xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
      ring buffer index.
      
      In the situation when the rsp_cons is approaching 0xffffffff, the return
      value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
      caller) would regard as error. As a result, queue->rx.rsp_cons is set
      incorrectly because it is updated only when there is error. If there is no
      error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
      Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
      queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
      This leads to NULL pointer access in the next iteration to process rx ring
      buffer entries.
      
      The symptom is similar to the one fixed in
      commit 00b36850 ("xen-netfront: do not assume sk_buff_head list is
      empty in error handling").
      
      This patch changes the return type of xennet_fill_frags() to indicate
      whether it is successful or failed. The queue->rx.rsp_cons will be
      always updated inside this function.
      
      Fixes: ad4f15dc ("xen/netfront: don't bug in case of too many frags")
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1afd826
    • Dotan Barak's avatar
      net/rds: Fix error handling in rds_ib_add_one() · 36a4043c
      Dotan Barak authored
      [ Upstream commit d64bf89a ]
      
      rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
      after allocation some resources such as protection domain.
      If allocation of such resources fail, then these uninitialized
      variables are accessed in rds_ib_dev_free() in failure path. This
      can potentially crash the system. The code has been updated to
      initialize these variables very early in the function.
      Signed-off-by: default avatarDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: default avatarSudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36a4043c
    • Josh Hunt's avatar
      udp: only do GSO if # of segs > 1 · 012363f5
      Josh Hunt authored
      [ Upstream commit 4094871d ]
      
      Prior to this change an application sending <= 1MSS worth of data and
      enabling UDP GSO would fail if the system had SW GSO enabled, but the
      same send would succeed if HW GSO offload is enabled. In addition to this
      inconsistency the error in the SW GSO case does not get back to the
      application if sending out of a real device so the user is unaware of this
      failure.
      
      With this change we only perform GSO if the # of segments is > 1 even
      if the application has enabled segmentation. I've also updated the
      relevant udpgso selftests.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: default avatarJosh Hunt <johunt@akamai.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      012363f5
    • Linus Walleij's avatar
      net: dsa: rtl8366: Check VLAN ID and not ports · 5c08d7e4
      Linus Walleij authored
      [ Upstream commit e8521e53 ]
      
      There has been some confusion between the port number and
      the VLAN ID in this driver. What we need to check for
      validity is the VLAN ID, nothing else.
      
      The current confusion came from assigning a few default
      VLANs for default routing and we need to rewrite that
      properly.
      
      Instead of checking if the port number is a valid VLAN
      ID, check the actual VLAN IDs passed in to the callback
      one by one as expected.
      
      Fixes: d8652956 ("net: dsa: realtek-smi: Add Realtek SMI driver")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c08d7e4
    • Dexuan Cui's avatar
      vsock: Fix a lockdep warning in __vsock_release() · 3c1f0704
      Dexuan Cui authored
      [ Upstream commit 0d9138ff ]
      
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c1f0704
    • Josh Hunt's avatar
      udp: fix gso_segs calculations · 544aee54
      Josh Hunt authored
      [ Upstream commit 44b321e5 ]
      
      Commit dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload")
      added gso_segs calculation, but incorrectly got sizeof() the pointer and
      not the underlying data type. In addition let's fix the v6 case.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Fixes: dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload")
      Signed-off-by: default avatarJosh Hunt <johunt@akamai.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      544aee54
    • Eric Dumazet's avatar
      sch_dsmark: fix potential NULL deref in dsmark_init() · 79fd59ae
      Eric Dumazet authored
      [ Upstream commit 474f0813 ]
      
      Make sure TCA_DSMARK_INDICES was provided by the user.
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 8799 Comm: syz-executor235 Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:nla_get_u16 include/net/netlink.h:1501 [inline]
      RIP: 0010:dsmark_init net/sched/sch_dsmark.c:364 [inline]
      RIP: 0010:dsmark_init+0x193/0x640 net/sched/sch_dsmark.c:339
      Code: 85 db 58 0f 88 7d 03 00 00 e8 e9 1a ac fb 48 8b 9d 70 ff ff ff 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 ca
      RSP: 0018:ffff88809426f3b8 EFLAGS: 00010247
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff85c6eb09
      RDX: 0000000000000000 RSI: ffffffff85c6eb17 RDI: 0000000000000004
      RBP: ffff88809426f4b0 R08: ffff88808c4085c0 R09: ffffed1015d26159
      R10: ffffed1015d26158 R11: ffff8880ae930ac7 R12: ffff8880a7e96940
      R13: dffffc0000000000 R14: ffff88809426f8c0 R15: 0000000000000000
      FS:  0000000001292880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000080 CR3: 000000008ca1b000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       qdisc_create+0x4ee/0x1210 net/sched/sch_api.c:1237
       tc_modify_qdisc+0x524/0x1c50 net/sched/sch_api.c:1653
       rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5223
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x803/0x920 net/socket.c:2311
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg net/socket.c:2363 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
       do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440369
      
      Fixes: 758cc43c ("[PKT_SCHED]: Fix dsmark to apply changes consistent")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79fd59ae
    • David Howells's avatar
      rxrpc: Fix rxrpc_recvmsg tracepoint · 76b55277
      David Howells authored
      [ Upstream commit db9b2e0a ]
      
      Fix the rxrpc_recvmsg tracepoint to handle being called with a NULL call
      parameter.
      
      Fixes: a25e21f0 ("rxrpc, afs: Use debug_ids rather than pointers in traces")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76b55277
    • Reinhard Speyerer's avatar
      qmi_wwan: add support for Cinterion CLS8 devices · 7047aae6
      Reinhard Speyerer authored
      [ Upstream commit cf74ac6d ]
      
      Add support for Cinterion CLS8 devices.
      Use QMI_QUIRK_SET_DTR as required for Qualcomm MDM9x07 chipsets.
      
      T:  Bus=01 Lev=03 Prnt=05 Port=01 Cnt=02 Dev#= 25 Spd=480  MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=1e2d ProdID=00b0 Rev= 3.18
      S:  Manufacturer=GEMALTO
      S:  Product=USB Modem
      C:* #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=89(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      Signed-off-by: default avatarReinhard Speyerer <rspmn@arcor.de>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7047aae6
    • Eric Dumazet's avatar
      nfc: fix memory leak in llcp_sock_bind() · dd9c580a
      Eric Dumazet authored
      [ Upstream commit a0c2dc1f ]
      
      sysbot reported a memory leak after a bind() has failed.
      
      While we are at it, abort the operation if kmemdup() has failed.
      
      BUG: memory leak
      unreferenced object 0xffff888105d83ec0 (size 32):
        comm "syz-executor067", pid 7207, jiffies 4294956228 (age 19.430s)
        hex dump (first 32 bytes):
          00 69 6c 65 20 72 65 61 64 00 6e 65 74 3a 5b 34  .ile read.net:[4
          30 32 36 35 33 33 30 39 37 5d 00 00 00 00 00 00  026533097]......
        backtrace:
          [<0000000036bac473>] kmemleak_alloc_recursive /./include/linux/kmemleak.h:43 [inline]
          [<0000000036bac473>] slab_post_alloc_hook /mm/slab.h:522 [inline]
          [<0000000036bac473>] slab_alloc /mm/slab.c:3319 [inline]
          [<0000000036bac473>] __do_kmalloc /mm/slab.c:3653 [inline]
          [<0000000036bac473>] __kmalloc_track_caller+0x169/0x2d0 /mm/slab.c:3670
          [<000000000cd39d07>] kmemdup+0x27/0x60 /mm/util.c:120
          [<000000008e57e5fc>] kmemdup /./include/linux/string.h:432 [inline]
          [<000000008e57e5fc>] llcp_sock_bind+0x1b3/0x230 /net/nfc/llcp_sock.c:107
          [<000000009cb0b5d3>] __sys_bind+0x11c/0x140 /net/socket.c:1647
          [<00000000492c3bbc>] __do_sys_bind /net/socket.c:1658 [inline]
          [<00000000492c3bbc>] __se_sys_bind /net/socket.c:1656 [inline]
          [<00000000492c3bbc>] __x64_sys_bind+0x1e/0x30 /net/socket.c:1656
          [<0000000008704b2a>] do_syscall_64+0x76/0x1a0 /arch/x86/entry/common.c:296
          [<000000009f4c57a4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 30cc4587 ("NFC: Move LLCP code to the NFC top level diirectory")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd9c580a
    • Martin KaFai Lau's avatar
      net: Unpublish sk from sk_reuseport_cb before call_rcu · d5b1db1c
      Martin KaFai Lau authored
      [ Upstream commit 8c7138b3 ]
      
      The "reuse->sock[]" array is shared by multiple sockets.  The going away
      sk must unpublish itself from "reuse->sock[]" before making call_rcu()
      call.  However, this unpublish-action is currently done after a grace
      period and it may cause use-after-free.
      
      The fix is to move reuseport_detach_sock() to sk_destruct().
      Due to the above reason, any socket with sk_reuseport_cb has
      to go through the rcu grace period before freeing it.
      
      It is a rather old bug (~3 yrs).  The Fixes tag is not necessary
      the right commit but it is the one that introduced the SOCK_RCU_FREE
      logic and this fix is depending on it.
      
      Fixes: a4298e45 ("net: add SOCK_RCU_FREE socket flag")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5b1db1c
    • Navid Emamdoost's avatar
      net: qlogic: Fix memory leak in ql_alloc_large_buffers · 9d0995cc
      Navid Emamdoost authored
      [ Upstream commit 1acb8f2a ]
      
      In ql_alloc_large_buffers, a new skb is allocated via netdev_alloc_skb.
      This skb should be released if pci_dma_mapping_error fails.
      
      Fixes: 0f8ab89e ("qla3xxx: Check return code from pci_map_single() in ql_release_to_lrg_buf_free_list(), ql_populate_free_queue(), ql_alloc_large_buffers(), and ql3xxx_send()")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d0995cc
    • Paolo Abeni's avatar
      net: ipv4: avoid mixed n_redirects and rate_tokens usage · 124b64fe
      Paolo Abeni authored
      [ Upstream commit b406472b ]
      
      Since commit c09551c6 ("net: ipv4: use a dedicated counter
      for icmp_v4 redirect packets") we use 'n_redirects' to account
      for redirect packets, but we still use 'rate_tokens' to compute
      the redirect packets exponential backoff.
      
      If the device sent to the relevant peer any ICMP error packet
      after sending a redirect, it will also update 'rate_token' according
      to the leaking bucket schema; typically 'rate_token' will raise
      above BITS_PER_LONG and the redirect packets backoff algorithm
      will produce undefined behavior.
      
      Fix the issue using 'n_redirects' to compute the exponential backoff
      in ip_rt_send_redirect().
      
      Note that we still clear rate_tokens after a redirect silence period,
      to avoid changing an established behaviour.
      
      The root cause predates git history; before the mentioned commit in
      the critical scenario, the kernel stopped sending redirects, after
      the mentioned commit the behavior more randomic.
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Fixes: c09551c6 ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      124b64fe
    • David Ahern's avatar
      ipv6: Handle missing host route in __ipv6_ifa_notify · 6f8564ed
      David Ahern authored
      [ Upstream commit 2d819d25 ]
      
      Rajendra reported a kernel panic when a link was taken down:
      
          [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
          [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
      
          <snip>
      
          [ 6870.570501] Call Trace:
          [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
          [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
          [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
          [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
          [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
          [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
          [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
          [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
          [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
          [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
          [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
      
      addrconf_dad_work is kicked to be scheduled when a device is brought
      up. There is a race between addrcond_dad_work getting scheduled and
      taking the rtnl lock and a process taking the link down (under rtnl).
      The latter removes the host route from the inet6_addr as part of
      addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
      to use the host route in __ipv6_ifa_notify. If the down event removes
      the host route due to the race to the rtnl, then the BUG listed above
      occurs.
      
      Since the DAD sequence can not be aborted, add a check for the missing
      host route in __ipv6_ifa_notify. The only way this should happen is due
      to the previously mentioned race. The host route is created when the
      address is added to an interface; it is only removed on a down event
      where the address is kept. Add a warning if the host route is missing
      AND the device is up; this is a situation that should never happen.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: default avatarRajendra Dendukuri <rajendra.dendukuri@broadcom.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f8564ed
    • Eric Dumazet's avatar
      ipv6: drop incoming packets having a v4mapped source address · 658d7ee4
      Eric Dumazet authored
      [ Upstream commit 6af1799a ]
      
      This began with a syzbot report. syzkaller was injecting
      IPv6 TCP SYN packets having a v4mapped source address.
      
      After an unsuccessful 4-tuple lookup, TCP creates a request
      socket (SYN_RECV) and calls reqsk_queue_hash_req()
      
      reqsk_queue_hash_req() calls sk_ehashfn(sk)
      
      At this point we have AF_INET6 sockets, and the heuristic
      used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
      is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)
      
      For the particular spoofed packet, we end up hashing V4 addresses
      which were not initialized by the TCP IPv6 stack, so KMSAN fired
      a warning.
      
      I first fixed sk_ehashfn() to test both source and destination addresses,
      but then faced various problems, including user-space programs
      like packetdrill that had similar assumptions.
      
      Instead of trying to fix the whole ecosystem, it is better
      to admit that we have a dual stack behavior, and that we
      can not build linux kernels without V4 stack anyway.
      
      The dual stack API automatically forces the traffic to be IPv4
      if v4mapped addresses are used at bind() or connect(), so it makes
      no sense to allow IPv6 traffic to use the same v4mapped class.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      658d7ee4
    • Johan Hovold's avatar
      hso: fix NULL-deref on tty open · a495fd19
      Johan Hovold authored
      [ Upstream commit 8353da9f ]
      
      Fix NULL-pointer dereference on tty open due to a failure to handle a
      missing interrupt-in endpoint when probing modem ports:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000006
      	...
      	RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
      	...
      	Call Trace:
      	hso_start_serial_device+0xdc/0x140 [hso]
      	hso_serial_open+0x118/0x1b0 [hso]
      	tty_open+0xf1/0x490
      
      Fixes: 542f5482 ("tty: Modem functions for the HSO driver")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a495fd19
    • Haishuang Yan's avatar
      erspan: remove the incorrect mtu limit for erspan · 7f30c44b
      Haishuang Yan authored
      [ Upstream commit 0e141f75 ]
      
      erspan driver calls ether_setup(), after commit 61e84623
      ("net: centralize net_device min/max MTU checking"), the range
      of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
      
      It causes the dev mtu of the erspan device to not be greater
      than 1500, this limit value is not correct for ipgre tap device.
      
      Tested:
      Before patch:
      # ip link set erspan0 mtu 1600
      Error: mtu greater than device maximum.
      After patch:
      # ip link set erspan0 mtu 1600
      # ip -d link show erspan0
      21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN
      mode DEFAULT group default qlen 1000
          link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0
      
      Fixes: 61e84623 ("net: centralize net_device min/max MTU checking")
      Signed-off-by: default avatarHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f30c44b
    • Vishal Kulkarni's avatar
      cxgb4:Fix out-of-bounds MSI-X info array access · 2b838911
      Vishal Kulkarni authored
      [ Upstream commit 6b517374 ]
      
      When fetching free MSI-X vectors for ULDs, check for the error code
      before accessing MSI-X info array. Otherwise, an out-of-bounds access is
      attempted, which results in kernel panic.
      
      Fixes: 94cdb8bb ("cxgb4: Add support for dynamic allocation of resources for ULD")
      Signed-off-by: default avatarShahjada Abul Husain <shahjada@chelsio.com>
      Signed-off-by: default avatarVishal Kulkarni <vishal@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b838911
    • Daniel Borkmann's avatar
      bpf: fix use after free in prog symbol exposure · ed568ca7
      Daniel Borkmann authored
      commit c751798a upstream.
      
      syzkaller managed to trigger the warning in bpf_jit_free() which checks via
      bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
      in kallsyms, and subsequently trips over GPF when walking kallsyms entries:
      
        [...]
        8021q: adding VLAN 0 to HW filter on device batadv0
        8021q: adding VLAN 0 to HW filter on device batadv0
        WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x113/0x167 lib/dump_stack.c:113
         panic+0x212/0x40b kernel/panic.c:214
         __warn.cold.8+0x1b/0x38 kernel/panic.c:571
         report_bug+0x1a4/0x200 lib/bug.c:186
         fixup_bug arch/x86/kernel/traps.c:178 [inline]
         do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
         do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
         invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
        RIP: 0010:bpf_jit_free+0x1e8/0x2a0
        Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
        RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
        RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
        RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
        RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
        R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
        R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
        BUG: unable to handle kernel paging request at fffffbfff400d000
        #PF error: [normal kernel read fault]
        PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
        Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
        [...]
      
      Upon further debugging, it turns out that whenever we trigger this
      issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
      but yet bpf_jit_free() reported that the entry is /in use/.
      
      Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
      perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
      fd is exposed to the public, a parallel close request came in right
      before we attempted to do the bpf_prog_kallsyms_add().
      
      Given at this time the prog reference count is one, we start to rip
      everything underneath us via bpf_prog_release() -> bpf_prog_put().
      The memory is eventually released via deferred free, so we're seeing
      that bpf_jit_free() has a kallsym entry because we added it from
      bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.
      
      Therefore, move both notifications /before/ we install the fd. The
      issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
      because upon bpf_prog_get_fd_by_id() we'll take another reference to
      the BPF prog, so we're still holding the original reference from the
      bpf_prog_load().
      
      Fixes: 6ee52e2a ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
      Fixes: 74451e66 ("bpf: make jited programs visible in traces")
      Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Song Liu <songliubraving@fb.com>
      Signed-off-by: default avatarZubin Mithra <zsm@chromium.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ed568ca7
    • Damien Le Moal's avatar
      block: mq-deadline: Fix queue restart handling · dbb7339c
      Damien Le Moal authored
      [ Upstream commit cb8acabb ]
      
      Commit 7211aef8 ("block: mq-deadline: Fix write completion
      handling") added a call to blk_mq_sched_mark_restart_hctx() in
      dd_dispatch_request() to make sure that write request dispatching does
      not stall when all target zones are locked. This fix left a subtle race
      when a write completion happens during a dispatch execution on another
      CPU:
      
      CPU 0: Dispatch			CPU1: write completion
      
      dd_dispatch_request()
          lock(&dd->lock);
          ...
          lock(&dd->zone_lock);	dd_finish_request()
          rq = find request		lock(&dd->zone_lock);
          unlock(&dd->zone_lock);
          				zone write unlock
      				unlock(&dd->zone_lock);
      				...
      				__blk_mq_free_request
                                            check restart flag (not set)
      				      -> queue not run
          ...
          if (!rq && have writes)
              blk_mq_sched_mark_restart_hctx()
          unlock(&dd->lock)
      
      Since the dispatch context finishes after the write request completion
      handling, marking the queue as needing a restart is not seen from
      __blk_mq_free_request() and blk_mq_sched_restart() not executed leading
      to the dispatch stall under 100% write workloads.
      
      Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
      dd_dispatch_request() into dd_finish_request() under the zone lock to
      ensure full mutual exclusion between write request dispatch selection
      and zone unlock on write request completion.
      
      Fixes: 7211aef8 ("block: mq-deadline: Fix write completion handling")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarHans Holmberg <Hans.Holmberg@wdc.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dbb7339c
    • Alexandre Ghiti's avatar
      arm: use STACK_TOP when computing mmap base address · af10ffa6
      Alexandre Ghiti authored
      [ Upstream commit 86e568e9 ]
      
      mmap base address must be computed wrt stack top address, using TASK_SIZE
      is wrong since STACK_TOP and TASK_SIZE are not equivalent.
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-8-alex@ghiti.frSigned-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      af10ffa6
    • Alexandre Ghiti's avatar
      arm: properly account for stack randomization and stack guard gap · f91a9c65
      Alexandre Ghiti authored
      [ Upstream commit af0f4297 ]
      
      This commit takes care of stack randomization and stack guard gap when
      computing mmap base address and checks if the task asked for
      randomization.  This fixes the problem uncovered and not fixed for arm
      here: https://lkml.kernel.org/r/20170622200033.25714-1-riel@redhat.com
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-7-alex@ghiti.frSigned-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f91a9c65
    • Alexandre Ghiti's avatar
      mips: properly account for stack randomization and stack guard gap · 53ba8d43
      Alexandre Ghiti authored
      [ Upstream commit b1f61b5b ]
      
      This commit takes care of stack randomization and stack guard gap when
      computing mmap base address and checks if the task asked for
      randomization.  This fixes the problem uncovered and not fixed for arm
      here: https://lkml.kernel.org/r/20170622200033.25714-1-riel@redhat.com
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-10-alex@ghiti.frSigned-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarPaul Burton <paul.burton@mips.com>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53ba8d43
    • Alexandre Ghiti's avatar
      arm64: consider stack randomization for mmap base only when necessary · e1b391ab
      Alexandre Ghiti authored
      [ Upstream commit e8d54b62 ]
      
      Do not offset mmap base address because of stack randomization if current
      task does not want randomization.  Note that x86 already implements this
      behaviour.
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-4-alex@ghiti.frSigned-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e1b391ab
    • Nicolas Boichat's avatar
      kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K · 30ab799e
      Nicolas Boichat authored
      [ Upstream commit b751c52b ]
      
      The current default value (400) is too low on many systems (e.g.  some
      ARM64 platform takes up 1000+ entries).
      
      syzbot uses 16000 as default value, and has proved to be enough on beefy
      configurations, so let's pick that value.
      
      This consumes more RAM on boot (each entry is 160 bytes, so in total
      ~2.5MB of RAM), but the memory would later be freed (early_log is
      __initdata).
      
      Link: http://lkml.kernel.org/r/20190730154027.101525-1-drinkcat@chromium.orgSigned-off-by: default avatarNicolas Boichat <drinkcat@chromium.org>
      Suggested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      30ab799e
    • Changwei Ge's avatar
      ocfs2: wait for recovering done after direct unlock request · 52132ff5
      Changwei Ge authored
      [ Upstream commit 0a3775e4 ]
      
      There is a scenario causing ocfs2 umount hang when multiple hosts are
      rebooting at the same time.
      
      NODE1                           NODE2               NODE3
      send unlock requset to NODE2
                                      dies
                                                          become recovery master
                                                          recover NODE2
      find NODE2 dead
      mark resource RECOVERING
      directly remove lock from grant list
      calculate usage but RECOVERING marked
      **miss the window of purging
      clear RECOVERING
      
      To reproduce this issue, crash a host and then umount ocfs2
      from another node.
      
      To solve this, just let unlock progress wait for recovery done.
      
      Link: http://lkml.kernel.org/r/1550124866-20367-1-git-send-email-gechangwei@live.cnSigned-off-by: default avatarChangwei Ge <gechangwei@live.cn>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      52132ff5
    • Greg Thelen's avatar
      kbuild: clean compressed initramfs image · d4a54645
      Greg Thelen authored
      [ Upstream commit 6279eb3d ]
      
      Since 9e3596b0 ("kbuild: initramfs cleanup, set target from Kconfig")
      "make clean" leaves behind compressed initramfs images.  Example:
      
        $ make defconfig
        $ sed -i 's|CONFIG_INITRAMFS_SOURCE=""|CONFIG_INITRAMFS_SOURCE="/tmp/ir.cpio"|' .config
        $ make olddefconfig
        $ make -s
        $ make -s clean
        $ git clean -ndxf | grep initramfs
        Would remove usr/initramfs_data.cpio.gz
      
      clean rules do not have CONFIG_* context so they do not know which
      compression format was used.  Thus they don't know which files to delete.
      
      Tell clean to delete all possible compression formats.
      
      Once patched usr/initramfs_data.cpio.gz and friends are deleted by
      "make clean".
      
      Link: http://lkml.kernel.org/r/20190722063251.55541-1-gthelen@google.com
      Fixes: 9e3596b0 ("kbuild: initramfs cleanup, set target from Kconfig")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d4a54645
    • Yunfeng Ye's avatar
      crypto: hisilicon - Fix double free in sec_free_hw_sgl() · d983182d
      Yunfeng Ye authored
      [ Upstream commit 24fbf7ba ]
      
      There are two problems in sec_free_hw_sgl():
      
      First, when sgl_current->next is valid, @hw_sgl will be freed in the
      first loop, but it free again after the loop.
      
      Second, sgl_current and sgl_current->next_sgl is not match when
      dma_pool_free() is invoked, the third parameter should be the dma
      address of sgl_current, but sgl_current->next_sgl is the dma address
      of next chain, so use sgl_current->next_sgl is wrong.
      
      Fix this by deleting the last dma_pool_free() in sec_free_hw_sgl(),
      modifying the condition for while loop, and matching the address for
      dma_pool_free().
      
      Fixes: 915e4e84 ("crypto: hisilicon - SEC security accelerator driver")
      Signed-off-by: default avatarYunfeng Ye <yeyunfeng@huawei.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d983182d
    • David Howells's avatar
      hypfs: Fix error number left in struct pointer member · 22c788ba
      David Howells authored
      [ Upstream commit b54c64f7 ]
      
      In hypfs_fill_super(), if hypfs_create_update_file() fails,
      sbi->update_file is left holding an error number.  This is passed to
      hypfs_kill_super() which doesn't check for this.
      
      Fix this by not setting sbi->update_value until after we've checked for
      error.
      
      Fixes: 24bbb1fa ("[PATCH] s390_hypfs filesystem")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      cc: linux-s390@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      22c788ba
    • Jens Axboe's avatar
      pktcdvd: remove warning on attempting to register non-passthrough dev · bbd76d95
      Jens Axboe authored
      [ Upstream commit eb09b3cc ]
      
      Anatoly reports that he gets the below warning when booting -git on
      a sparc64 box on debian unstable:
      
      ...
      [   13.352975] aes_sparc64: Using sparc64 aes opcodes optimized AES
      implementation
      [   13.428002] ------------[ cut here ]------------
      [   13.428081] WARNING: CPU: 21 PID: 586 at
      drivers/block/pktcdvd.c:2597 pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
      [   13.428147] Attempt to register a non-SCSI queue
      [   13.428184] Modules linked in: pktcdvd libdes cdrom aes_sparc64
      n2_rng md5_sparc64 sha512_sparc64 rng_core sha256_sparc64 flash
      sha1_sparc64 ip_tables x_tables ipv6 crc_ccitt nf_defrag_ipv6 autofs4
      ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy
      async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear
      md_mod crc32c_sparc64
      [   13.428452] CPU: 21 PID: 586 Comm: pktsetup Not tainted
      5.3.0-10169-g574cc453 #1234
      [   13.428507] Call Trace:
      [   13.428542]  [00000000004635c0] __warn+0xc0/0x100
      [   13.428582]  [0000000000463634] warn_slowpath_fmt+0x34/0x60
      [   13.428626]  [000000001045b244] pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
      [   13.428674]  [000000001045ccf4] pkt_ctl_ioctl+0x94/0x220 [pktcdvd]
      [   13.428724]  [00000000006b95c8] do_vfs_ioctl+0x628/0x6e0
      [   13.428764]  [00000000006b96c8] ksys_ioctl+0x48/0x80
      [   13.428803]  [00000000006b9714] sys_ioctl+0x14/0x40
      [   13.428847]  [0000000000406294] linux_sparc_syscall+0x34/0x44
      [   13.428890] irq event stamp: 4181
      [   13.428924] hardirqs last  enabled at (4189): [<00000000004e0a74>]
      console_unlock+0x634/0x6c0
      [   13.428984] hardirqs last disabled at (4196): [<00000000004e0540>]
      console_unlock+0x100/0x6c0
      [   13.429048] softirqs last  enabled at (3978): [<0000000000b2e2d8>]
      __do_softirq+0x498/0x520
      [   13.429110] softirqs last disabled at (3967): [<000000000042cfb4>]
      do_softirq_own_stack+0x34/0x60
      [   13.429172] ---[ end trace 2220ca468f32967d ]---
      [   13.430018] pktcdvd: setup of pktcdvd device failed
      [   13.455589] des_sparc64: Using sparc64 des opcodes optimized DES
      implementation
      [   13.515334] camellia_sparc64: Using sparc64 camellia opcodes
      optimized CAMELLIA implementation
      [   13.522856] pktcdvd: setup of pktcdvd device failed
      [   13.529327] pktcdvd: setup of pktcdvd device failed
      [   13.532932] pktcdvd: setup of pktcdvd device failed
      [   13.536165] pktcdvd: setup of pktcdvd device failed
      [   13.539372] pktcdvd: setup of pktcdvd device failed
      [   13.542834] pktcdvd: setup of pktcdvd device failed
      [   13.546536] pktcdvd: setup of pktcdvd device failed
      [   15.431071] XFS (dm-0): Mounting V5 Filesystem
      ...
      
      Apparently debian auto-attaches any cdrom like device to pktcdvd, which
      can lead to the above warning. There's really no reason to warn for this
      situation, kill it.
      Reported-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bbd76d95
    • OGAWA Hirofumi's avatar
      fat: work around race with userspace's read via blockdev while mounting · 0840daee
      OGAWA Hirofumi authored
      [ Upstream commit 07bfa441 ]
      
      If userspace reads the buffer via blockdev while mounting,
      sb_getblk()+modify can race with buffer read via blockdev.
      
      For example,
      
                  FS                               userspace
          bh = sb_getblk()
          modify bh->b_data
                                        read
      				    ll_rw_block(bh)
      				      fill bh->b_data by on-disk data
      				      /* lost modified data by FS */
      				      set_buffer_uptodate(bh)
          set_buffer_uptodate(bh)
      
      Userspace should not use the blockdev while mounting though, the udev
      seems to be already doing this.  Although I think the udev should try to
      avoid this, workaround the race by small overhead.
      
      Link: http://lkml.kernel.org/r/87pnk7l3sw.fsf_-_@mail.parknet.co.jpSigned-off-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0840daee
    • Mike Rapoport's avatar
      ARM: 8903/1: ensure that usable memory in bank 0 starts from a PMD-aligned address · 297904ea
      Mike Rapoport authored
      [ Upstream commit 00d2ec1e ]
      
      The calculation of memblock_limit in adjust_lowmem_bounds() assumes that
      bank 0 starts from a PMD-aligned address. However, the beginning of the
      first bank may be NOMAP memory and the start of usable memory
      will be not aligned to PMD boundary. In such case the memblock_limit will
      be set to the end of the NOMAP region, which will prevent any memblock
      allocations.
      
      Mark the region between the end of the NOMAP area and the next PMD-aligned
      address as NOMAP as well, so that the usable memory will start at
      PMD-aligned address.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      297904ea
    • Jia-Ju Bai's avatar
      security: smack: Fix possible null-pointer dereferences in smack_socket_sock_rcv_skb() · 9a87ab2b
      Jia-Ju Bai authored
      [ Upstream commit 3f4287e7 ]
      
      In smack_socket_sock_rcv_skb(), there is an if statement
      on line 3920 to check whether skb is NULL:
          if (skb && skb->secmark != 0)
      
      This check indicates skb can be NULL in some cases.
      
      But on lines 3931 and 3932, skb is used:
          ad.a.u.net->netif = skb->skb_iif;
          ipv6_skb_to_auditdata(skb, &ad.a, NULL);
      
      Thus, possible null-pointer dereferences may occur when skb is NULL.
      
      To fix these possible bugs, an if statement is added to check skb.
      
      These bugs are found by a static analysis tool STCheck written by us.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9a87ab2b
    • Thierry Reding's avatar
      PCI: exynos: Propagate errors for optional PHYs · 69a32a73
      Thierry Reding authored
      [ Upstream commit ddd69600 ]
      
      devm_of_phy_get() can fail for a number of reasons besides probe
      deferral. It can for example return -ENOMEM if it runs out of memory as
      it tries to allocate devres structures. Propagating only -EPROBE_DEFER
      is problematic because it results in these legitimately fatal errors
      being treated as "PHY not specified in DT".
      
      What we really want is to ignore the optional PHYs only if they have not
      been specified in DT. devm_of_phy_get() returns -ENODEV in this case, so
      that's the special case that we need to handle. So we propagate all
      errors, except -ENODEV, so that real failures will still cause the
      driver to fail probe.
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Cc: Jingoo Han <jingoohan1@gmail.com>
      Cc: Kukjin Kim <kgene@kernel.org>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      69a32a73
    • Thierry Reding's avatar
      PCI: imx6: Propagate errors for optional regulators · 1264d2e7
      Thierry Reding authored
      [ Upstream commit 2170a09f ]
      
      regulator_get_optional() can fail for a number of reasons besides probe
      deferral. It can for example return -ENOMEM if it runs out of memory as
      it tries to allocate data structures. Propagating only -EPROBE_DEFER is
      problematic because it results in these legitimately fatal errors being
      treated as "regulator not specified in DT".
      
      What we really want is to ignore the optional regulators only if they
      have not been specified in DT. regulator_get_optional() returns -ENODEV
      in this case, so that's the special case that we need to handle. So we
      propagate all errors, except -ENODEV, so that real failures will still
      cause the driver to fail probe.
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Cc: Richard Zhu <hongxing.zhu@nxp.com>
      Cc: Lucas Stach <l.stach@pengutronix.de>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: Sascha Hauer <s.hauer@pengutronix.de>
      Cc: Fabio Estevam <festevam@gmail.com>
      Cc: kernel@pengutronix.de
      Cc: linux-imx@nxp.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1264d2e7
    • Thierry Reding's avatar
      PCI: histb: Propagate errors for optional regulators · 403d6c92
      Thierry Reding authored
      [ Upstream commit 8f9e1641 ]
      
      regulator_get_optional() can fail for a number of reasons besides probe
      deferral. It can for example return -ENOMEM if it runs out of memory as
      it tries to allocate data structures. Propagating only -EPROBE_DEFER is
      problematic because it results in these legitimately fatal errors being
      treated as "regulator not specified in DT".
      
      What we really want is to ignore the optional regulators only if they
      have not been specified in DT. regulator_get_optional() returns -ENODEV
      in this case, so that's the special case that we need to handle. So we
      propagate all errors, except -ENODEV, so that real failures will still
      cause the driver to fail probe.
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Cc: Shawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      403d6c92
    • Thierry Reding's avatar
      PCI: rockchip: Propagate errors for optional regulators · ac9c0e2e
      Thierry Reding authored
      [ Upstream commit 0e3ff0ac ]
      
      regulator_get_optional() can fail for a number of reasons besides probe
      deferral. It can for example return -ENOMEM if it runs out of memory as
      it tries to allocate data structures. Propagating only -EPROBE_DEFER is
      problematic because it results in these legitimately fatal errors being
      treated as "regulator not specified in DT".
      
      What we really want is to ignore the optional regulators only if they
      have not been specified in DT. regulator_get_optional() returns -ENODEV
      in this case, so that's the special case that we need to handle. So we
      propagate all errors, except -ENODEV, so that real failures will still
      cause the driver to fail probe.
      Tested-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Reviewed-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Acked-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Cc: Shawn Lin <shawn.lin@rock-chips.com>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: linux-rockchip@lists.infradead.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ac9c0e2e
    • Joao Moreno's avatar
      HID: apple: Fix stuck function keys when using FN · 709c4841
      Joao Moreno authored
      [ Upstream commit aec256d0 ]
      
      This fixes an issue in which key down events for function keys would be
      repeatedly emitted even after the user has raised the physical key. For
      example, the driver fails to emit the F5 key up event when going through
      the following steps:
      - fnmode=1: hold FN, hold F5, release FN, release F5
      - fnmode=2: hold F5, hold FN, release F5, release FN
      
      The repeated F5 key down events can be easily verified using xev.
      Signed-off-by: default avatarJoao Moreno <mail@joaomoreno.com>
      Co-developed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      709c4841
    • Biwen Li's avatar
      rtc: pcf85363/pcf85263: fix regmap error in set_time · 31e98cba
      Biwen Li authored
      [ Upstream commit 7ef66122 ]
      
      Issue:
          - # hwclock -w
            hwclock: RTC_SET_TIME: Invalid argument
      
      Why:
          - Relative commit: 8b9f9d4d ("regmap: verify if register is
            writeable before writing operations"), this patch
            will always check for unwritable registers, it will compare reg
            with max_register in regmap_writeable.
      
          - The pcf85363/pcf85263 has the capability of address wrapping
            which means if you access an address outside the allowed range
            (0x00-0x2f) hardware actually wraps the access to a lower address.
            The rtc-pcf85363 driver will use this feature to configure the time
            and execute 2 actions in the same i2c write operation (stopping the
            clock and configure the time). However the driver has also
            configured the `regmap maxregister` protection mechanism that will
            block accessing addresses outside valid range (0x00-0x2f).
      
      How:
          - Split of writing regs to two parts, first part writes control
            registers about stop_enable and resets, second part writes
            RTC time and date registers.
      Signed-off-by: default avatarBiwen Li <biwen.li@nxp.com>
      Link: https://lore.kernel.org/r/20190829021418.4607-1-biwen.li@nxp.comSigned-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      31e98cba