1. 12 Jan, 2020 26 commits
  2. 09 Jan, 2020 14 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.19.94 · cb1f9a16
      Greg Kroah-Hartman authored
      cb1f9a16
    • Alexander Shishkin's avatar
      perf/x86/intel/bts: Fix the use of page_private() · 78880475
      Alexander Shishkin authored
      [ Upstream commit ff61541c ]
      
      Commit
      
        8062382c ("perf/x86/intel/bts: Add BTS PMU driver")
      
      brought in a warning with the BTS buffer initialization
      that is easily tripped with (assuming KPTI is disabled):
      
      instantly throwing:
      
      > ------------[ cut here ]------------
      > WARNING: CPU: 2 PID: 326 at arch/x86/events/intel/bts.c:86 bts_buffer_setup_aux+0x117/0x3d0
      > Modules linked in:
      > CPU: 2 PID: 326 Comm: perf Not tainted 5.4.0-rc8-00291-gceb9e773 #904
      > RIP: 0010:bts_buffer_setup_aux+0x117/0x3d0
      > Call Trace:
      >  rb_alloc_aux+0x339/0x550
      >  perf_mmap+0x607/0xc70
      >  mmap_region+0x76b/0xbd0
      ...
      
      It appears to assume (for lost raisins) that PagePrivate() is set,
      while later it actually tests for PagePrivate() before using
      page_private().
      
      Make it consistent and always check PagePrivate() before using
      page_private().
      
      Fixes: 8062382c ("perf/x86/intel/bts: Add BTS PMU driver")
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lkml.kernel.org/r/20191205142853.28894-2-alexander.shishkin@linux.intel.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      78880475
    • SeongJae Park's avatar
      xen/blkback: Avoid unmapping unmapped grant pages · 87d43527
      SeongJae Park authored
      [ Upstream commit f9bd84a8 ]
      
      For each I/O request, blkback first maps the foreign pages for the
      request to its local pages.  If an allocation of a local page for the
      mapping fails, it should unmap every mapping already made for the
      request.
      
      However, blkback's handling mechanism for the allocation failure does
      not mark the remaining foreign pages as unmapped.  Therefore, the unmap
      function merely tries to unmap every valid grant page for the request,
      including the pages not mapped due to the allocation failure.  On a
      system that fails the allocation frequently, this problem leads to
      following kernel crash.
      
        [  372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
        [  372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40
        [  372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
        [  372.012562] Oops: 0002 [#1] SMP
        [  372.012566] Modules linked in: act_police sch_ingress cls_u32
        ...
        [  372.012746] Call Trace:
        [  372.012752]  [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40
        [  372.012759]  [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
        ...
        [  372.012802]  [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
        ...
        Decompressing Linux... Parsing ELF... done.
        Booting the kernel.
        [    0.000000] Initializing cgroup subsys cpuset
      
      This commit fixes this problem by marking the grant pages of the given
      request that didn't mapped due to the allocation failure as invalid.
      
      Fixes: c6cc142d ("xen-blkback: use balloon pages for all mappings")
      Reviewed-by: default avatarDavid Woodhouse <dwmw@amazon.de>
      Reviewed-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Reviewed-by: default avatarPaul Durrant <pdurrant@amazon.co.uk>
      Reviewed-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarSeongJae Park <sjpark@amazon.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      87d43527
    • Heiko Carstens's avatar
      s390/smp: fix physical to logical CPU map for SMT · a5011c78
      Heiko Carstens authored
      [ Upstream commit 72a81ad9 ]
      
      If an SMT capable system is not IPL'ed from the first CPU the setup of
      the physical to logical CPU mapping is broken: the IPL core gets CPU
      number 0, but then the next core gets CPU number 1. Correct would be
      that all SMT threads of CPU 0 get the subsequent logical CPU numbers.
      
      This is important since a lot of code (like e.g. the CPU topology
      code) assumes that CPU maps are setup like this. If the mapping is
      broken the system will not IPL due to broken topology masks:
      
      [    1.716341] BUG: arch topology broken
      [    1.716342]      the SMT domain not a subset of the MC domain
      [    1.716343] BUG: arch topology broken
      [    1.716344]      the MC domain not a subset of the BOOK domain
      
      This scenario can usually not happen since LPARs are always IPL'ed
      from CPU 0 and also re-IPL is intiated from CPU 0. However older
      kernels did initiate re-IPL on an arbitrary CPU. If therefore a re-IPL
      from an old kernel into a new kernel is initiated this may lead to
      crash.
      
      Fix this by setting up the physical to logical CPU mapping correctly.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a5011c78
    • Zhihao Cheng's avatar
      ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps · 7764ed0b
      Zhihao Cheng authored
      [ Upstream commit 6abf5726 ]
      
      Running stress-test test_2 in mtd-utils on ubi device, sometimes we can
      get following oops message:
      
        BUG: unable to handle page fault for address: ffffffff00000140
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 280a067 P4D 280a067 PUD 0
        Oops: 0000 [#1] SMP
        CPU: 0 PID: 60 Comm: kworker/u16:1 Kdump: loaded Not tainted 5.2.0 #13
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0
        -0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        Workqueue: writeback wb_workfn (flush-ubifs_0_0)
        RIP: 0010:rb_next_postorder+0x2e/0xb0
        Code: 80 db 03 01 48 85 ff 0f 84 97 00 00 00 48 8b 17 48 83 05 bc 80 db
        03 01 48 83 e2 fc 0f 84 82 00 00 00 48 83 05 b2 80 db 03 01 <48> 3b 7a
        10 48 89 d0 74 02 f3 c3 48 8b 52 08 48 83 05 a3 80 db 03
        RSP: 0018:ffffc90000887758 EFLAGS: 00010202
        RAX: ffff888129ae4700 RBX: ffff888138b08400 RCX: 0000000080800001
        RDX: ffffffff00000130 RSI: 0000000080800024 RDI: ffff888138b08400
        RBP: ffff888138b08400 R08: ffffea0004a6b920 R09: 0000000000000000
        R10: ffffc90000887740 R11: 0000000000000001 R12: ffff888128d48000
        R13: 0000000000000800 R14: 000000000000011e R15: 00000000000007c8
        FS:  0000000000000000(0000) GS:ffff88813ba00000(0000)
        knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffff00000140 CR3: 000000013789d000 CR4: 00000000000006f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
          destroy_old_idx+0x5d/0xa0 [ubifs]
          ubifs_tnc_start_commit+0x4fe/0x1380 [ubifs]
          do_commit+0x3eb/0x830 [ubifs]
          ubifs_run_commit+0xdc/0x1c0 [ubifs]
      
      Above Oops are due to the slab-out-of-bounds happened in do-while of
      function layout_in_gaps indirectly called by ubifs_tnc_start_commit. In
      function layout_in_gaps, there is a do-while loop placing index nodes
      into the gaps created by obsolete index nodes in non-empty index LEBs
      until rest index nodes can totally be placed into pre-allocated empty
      LEBs. @c->gap_lebs points to a memory area(integer array) which records
      LEB numbers used by 'in-the-gaps' method. Whenever a fitable index LEB
      is found, corresponding lnum will be incrementally written into the
      memory area pointed by @c->gap_lebs. The size
      ((@c->lst.idx_lebs + 1) * sizeof(int)) of memory area is allocated before
      do-while loop and can not be changed in the loop. But @c->lst.idx_lebs
      could be increased by function ubifs_change_lp (called by
      layout_leb_in_gaps->ubifs_find_dirty_idx_leb->get_idx_gc_leb) during the
      loop. So, sometimes oob happens when number of cycles in do-while loop
      exceeds the original value of @c->lst.idx_lebs. See detail in
      https://bugzilla.kernel.org/show_bug.cgi?id=204229.
      This patch fixes oob in layout_in_gaps.
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7764ed0b
    • Eric Dumazet's avatar
      net: add annotations on hh->hh_len lockless accesses · bc5fc4a6
      Eric Dumazet authored
      [ Upstream commit c305c6ae ]
      
      KCSAN reported a data-race [1]
      
      While we can use READ_ONCE() on the read sides,
      we need to make sure hh->hh_len is written last.
      
      [1]
      
      BUG: KCSAN: data-race in eth_header_cache / neigh_resolve_output
      
      write to 0xffff8880b9dedcb8 of 4 bytes by task 29760 on cpu 0:
       eth_header_cache+0xa9/0xd0 net/ethernet/eth.c:247
       neigh_hh_init net/core/neighbour.c:1463 [inline]
       neigh_resolve_output net/core/neighbour.c:1480 [inline]
       neigh_resolve_output+0x415/0x470 net/core/neighbour.c:1470
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
       ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
       rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
       process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
       worker_thread+0xa0/0x800 kernel/workqueue.c:2415
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      read to 0xffff8880b9dedcb8 of 4 bytes by task 29572 on cpu 1:
       neigh_resolve_output net/core/neighbour.c:1479 [inline]
       neigh_resolve_output+0x113/0x470 net/core/neighbour.c:1470
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
       ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
       rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
       process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
       worker_thread+0xa0/0x800 kernel/workqueue.c:2415
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 29572 Comm: kworker/1:4 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events rt6_probe_deferred
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bc5fc4a6
    • Darrick J. Wong's avatar
      xfs: periodically yield scrub threads to the scheduler · 58a46618
      Darrick J. Wong authored
      [ Upstream commit 5d1116d4 ]
      
      Christoph Hellwig complained about the following soft lockup warning
      when running scrub after generic/175 when preemption is disabled and
      slub debugging is enabled:
      
      watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [xfs_scrub:161]
      Modules linked in:
      irq event stamp: 41692326
      hardirqs last  enabled at (41692325): [<ffffffff8232c3b7>] _raw_0
      hardirqs last disabled at (41692326): [<ffffffff81001c5a>] trace0
      softirqs last  enabled at (41684994): [<ffffffff8260031f>] __do_e
      softirqs last disabled at (41684987): [<ffffffff81127d8c>] irq_e0
      CPU: 3 PID: 16189 Comm: xfs_scrub Not tainted 5.4.0-rc3+ #30
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.124
      RIP: 0010:_raw_spin_unlock_irqrestore+0x39/0x40
      Code: 89 f3 be 01 00 00 00 e8 d5 3a e5 fe 48 89 ef e8 ed 87 e5 f2
      RSP: 0018:ffffc9000233f970 EFLAGS: 00000286 ORIG_RAX: ffffffffff3
      RAX: ffff88813b398040 RBX: 0000000000000286 RCX: 0000000000000006
      RDX: 0000000000000006 RSI: ffff88813b3988c0 RDI: ffff88813b398040
      RBP: ffff888137958640 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00042b0c00
      R13: 0000000000000001 R14: ffff88810ac32308 R15: ffff8881376fc040
      FS:  00007f6113dea700(0000) GS:ffff88813bb80000(0000) knlGS:00000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6113de8ff8 CR3: 000000012f290000 CR4: 00000000000006e0
      Call Trace:
       free_debug_processing+0x1dd/0x240
       __slab_free+0x231/0x410
       kmem_cache_free+0x30e/0x360
       xchk_ag_btcur_free+0x76/0xb0
       xchk_ag_free+0x10/0x80
       xchk_bmap_iextent_xref.isra.14+0xd9/0x120
       xchk_bmap_iextent+0x187/0x210
       xchk_bmap+0x2e0/0x3b0
       xfs_scrub_metadata+0x2e7/0x500
       xfs_ioc_scrub_metadata+0x4a/0xa0
       xfs_file_ioctl+0x58a/0xcd0
       do_vfs_ioctl+0xa0/0x6f0
       ksys_ioctl+0x5b/0x90
       __x64_sys_ioctl+0x11/0x20
       do_syscall_64+0x4b/0x1a0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      If preemption is disabled, all metadata buffers needed to perform the
      scrub are already in memory, and there are a lot of records to check,
      it's possible that the scrub thread will run for an extended period of
      time without sleeping for IO or any other reason.  Then the watchdog
      timer or the RCU stall timeout can trigger, producing the backtrace
      above.
      
      To fix this problem, call cond_resched() from the scrub thread so that
      we back out to the scheduler whenever necessary.
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      58a46618
    • Masashi Honma's avatar
      ath9k_htc: Discard undersized packets · 6dc835db
      Masashi Honma authored
      [ Upstream commit cd486e62 ]
      
      Sometimes the hardware will push small packets that trigger a WARN_ON
      in mac80211. Discard them early to avoid this issue.
      
      This patch ports 2 patches from ath9k to ath9k_htc.
      commit 3c0efb74 "ath9k: discard
      undersized packets".
      commit df5c4150 "ath9k: correctly
      handle short radar pulses".
      
      [  112.835889] ------------[ cut here ]------------
      [  112.835971] WARNING: CPU: 5 PID: 0 at net/mac80211/rx.c:804 ieee80211_rx_napi+0xaac/0xb40 [mac80211]
      [  112.835973] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath mac80211 cfg80211 libarc4 nouveau snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_hda_codec video snd_hda_core ttm snd_hwdep drm_kms_helper snd_pcm crct10dif_pclmul snd_seq_midi drm snd_seq_midi_event crc32_pclmul snd_rawmidi ghash_clmulni_intel snd_seq aesni_intel aes_x86_64 crypto_simd cryptd snd_seq_device glue_helper snd_timer sch_fq_codel i2c_algo_bit fb_sys_fops snd input_leds syscopyarea sysfillrect sysimgblt intel_cstate mei_me intel_rapl_perf soundcore mxm_wmi lpc_ich mei kvm_intel kvm mac_hid irqbypass parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear e1000e ahci libahci wmi
      [  112.836022] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.3.0-wt #1
      [  112.836023] Hardware name: MouseComputer Co.,Ltd. X99-S01/X99-S01, BIOS 1.0C-W7 04/01/2015
      [  112.836056] RIP: 0010:ieee80211_rx_napi+0xaac/0xb40 [mac80211]
      [  112.836059] Code: 00 00 66 41 89 86 b0 00 00 00 e9 c8 fa ff ff 4c 89 b5 40 ff ff ff 49 89 c6 e9 c9 fa ff ff 48 c7 c7 e0 a2 a5 c0 e8 47 41 b0 e9 <0f> 0b 48 89 df e8 5a 94 2d ea e9 02 f9 ff ff 41 39 c1 44 89 85 60
      [  112.836060] RSP: 0018:ffffaa6180220da8 EFLAGS: 00010286
      [  112.836062] RAX: 0000000000000024 RBX: ffff909a20eeda00 RCX: 0000000000000000
      [  112.836064] RDX: 0000000000000000 RSI: ffff909a2f957448 RDI: ffff909a2f957448
      [  112.836065] RBP: ffffaa6180220e78 R08: 00000000000006e9 R09: 0000000000000004
      [  112.836066] R10: 000000000000000a R11: 0000000000000001 R12: 0000000000000000
      [  112.836068] R13: ffff909a261a47a0 R14: 0000000000000000 R15: 0000000000000004
      [  112.836070] FS:  0000000000000000(0000) GS:ffff909a2f940000(0000) knlGS:0000000000000000
      [  112.836071] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  112.836073] CR2: 00007f4e3ffffa08 CR3: 00000001afc0a006 CR4: 00000000001606e0
      [  112.836074] Call Trace:
      [  112.836076]  <IRQ>
      [  112.836083]  ? finish_td+0xb3/0xf0
      [  112.836092]  ? ath9k_rx_prepare.isra.11+0x22f/0x2a0 [ath9k_htc]
      [  112.836099]  ath9k_rx_tasklet+0x10b/0x1d0 [ath9k_htc]
      [  112.836105]  tasklet_action_common.isra.22+0x63/0x110
      [  112.836108]  tasklet_action+0x22/0x30
      [  112.836115]  __do_softirq+0xe4/0x2da
      [  112.836118]  irq_exit+0xae/0xb0
      [  112.836121]  do_IRQ+0x86/0xe0
      [  112.836125]  common_interrupt+0xf/0xf
      [  112.836126]  </IRQ>
      [  112.836130] RIP: 0010:cpuidle_enter_state+0xa9/0x440
      [  112.836133] Code: 3d bc 20 38 55 e8 f7 1d 84 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 28 29 84 ff 80 7d d3 00 0f 85 e6 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ed 0f 89 ff 01 00 00 41 c7 44 24 10 00 00 00 00 48 83 c4 18
      [  112.836134] RSP: 0018:ffffaa61800e3e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
      [  112.836136] RAX: ffff909a2f96b340 RBX: ffffffffabb58200 RCX: 000000000000001f
      [  112.836137] RDX: 0000001a458adc5d RSI: 0000000026c9b581 RDI: 0000000000000000
      [  112.836139] RBP: ffffaa61800e3e88 R08: 0000000000000002 R09: 000000000002abc0
      [  112.836140] R10: ffffaa61800e3e18 R11: 000000000000002d R12: ffffca617fb40b00
      [  112.836141] R13: 0000000000000002 R14: ffffffffabb582d8 R15: 0000001a458adc5d
      [  112.836145]  ? cpuidle_enter_state+0x98/0x440
      [  112.836149]  ? menu_select+0x370/0x600
      [  112.836151]  cpuidle_enter+0x2e/0x40
      [  112.836154]  call_cpuidle+0x23/0x40
      [  112.836156]  do_idle+0x204/0x280
      [  112.836159]  cpu_startup_entry+0x1d/0x20
      [  112.836164]  start_secondary+0x167/0x1c0
      [  112.836169]  secondary_startup_64+0xa4/0xb0
      [  112.836173] ---[ end trace 9f4cd18479cc5ae5 ]---
      Signed-off-by: default avatarMasashi Honma <masashi.honma@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6dc835db
    • Masashi Honma's avatar
      ath9k_htc: Modify byte order for an error message · f10bcc6b
      Masashi Honma authored
      [ Upstream commit e01fddc1 ]
      
      rs_datalen is be16 so we need to convert it before printing.
      Signed-off-by: default avatarMasashi Honma <masashi.honma@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f10bcc6b
    • Taehee Yoo's avatar
      net: core: limit nested device depth · a2e06554
      Taehee Yoo authored
      [ Upstream commit 5343da4c ]
      
      Current code doesn't limit the number of nested devices.
      Nested devices would be handled recursively and this needs huge stack
      memory. So, unlimited nested devices could make stack overflow.
      
      This patch adds upper_level and lower_level, they are common variables
      and represent maximum lower/upper depth.
      When upper/lower device is attached or dettached,
      {lower/upper}_level are updated. and if maximum depth is bigger than 8,
      attach routine fails and returns -EMLINK.
      
      In addition, this patch converts recursive routine of
      netdev_walk_all_{lower/upper} to iterator routine.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add link dummy0 name vlan1 type vlan id 1
          ip link set vlan1 up
      
          for i in {2..55}
          do
      	    let A=$i-1
      
      	    ip link add vlan$i link vlan$A type vlan id $i
          done
          ip link del dummy0
      
      Splat looks like:
      [  155.513226][  T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850
      [  155.514162][  T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908
      [  155.515048][  T908]
      [  155.515333][  T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96
      [  155.516147][  T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  155.517233][  T908] Call Trace:
      [  155.517627][  T908]
      [  155.517918][  T908] Allocated by task 0:
      [  155.518412][  T908] (stack is not available)
      [  155.518955][  T908]
      [  155.519228][  T908] Freed by task 0:
      [  155.519885][  T908] (stack is not available)
      [  155.520452][  T908]
      [  155.520729][  T908] The buggy address belongs to the object at ffff8880608a6ac0
      [  155.520729][  T908]  which belongs to the cache names_cache of size 4096
      [  155.522387][  T908] The buggy address is located 512 bytes inside of
      [  155.522387][  T908]  4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0)
      [  155.523920][  T908] The buggy address belongs to the page:
      [  155.524552][  T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0
      [  155.525836][  T908] flags: 0x100000000010200(slab|head)
      [  155.526445][  T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0
      [  155.527424][  T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  155.528429][  T908] page dumped because: kasan: bad access detected
      [  155.529158][  T908]
      [  155.529410][  T908] Memory state around the buggy address:
      [  155.530060][  T908]  ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  155.530971][  T908]  ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3
      [  155.531889][  T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  155.532806][  T908]                                            ^
      [  155.533509][  T908]  ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00
      [  155.534436][  T908]  ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00
      [ ... ]
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a2e06554
    • Eric Dumazet's avatar
      tcp: annotate tp->rcv_nxt lockless reads · 67f028ac
      Eric Dumazet authored
      [ Upstream commit dba7d9b8 ]
      
      There are few places where we fetch tp->rcv_nxt while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      
      Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)
      
      syzbot reported :
      
      BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv
      
      write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
       tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
       tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
       tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
       tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
      
      read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
       tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
       tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
       ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
       ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
       ep_send_events fs/eventpoll.c:1793 [inline]
       ep_poll+0xe3/0x900 fs/eventpoll.c:1930
       do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
       __do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
       __se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
       __x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
       do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      67f028ac
    • David Howells's avatar
      rxrpc: Fix possible NULL pointer access in ICMP handling · d9f4d60a
      David Howells authored
      [ Upstream commit f0308fb0 ]
      
      If an ICMP packet comes in on the UDP socket backing an AF_RXRPC socket as
      the UDP socket is being shut down, rxrpc_error_report() may get called to
      deal with it after sk_user_data on the UDP socket has been cleared, leading
      to a NULL pointer access when this local endpoint record gets accessed.
      
      Fix this by just returning immediately if sk_user_data was NULL.
      
      The oops looks like the following:
      
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      ...
      RIP: 0010:rxrpc_error_report+0x1bd/0x6a9
      ...
      Call Trace:
       ? sock_queue_err_skb+0xbd/0xde
       ? __udp4_lib_err+0x313/0x34d
       __udp4_lib_err+0x313/0x34d
       icmp_unreach+0x1ee/0x207
       icmp_rcv+0x25b/0x28f
       ip_protocol_deliver_rcu+0x95/0x10e
       ip_local_deliver+0xe9/0x148
       __netif_receive_skb_one_core+0x52/0x6e
       process_backlog+0xdc/0x177
       net_rx_action+0xf9/0x270
       __do_softirq+0x1b6/0x39a
       ? smpboot_register_percpu_thread+0xce/0xce
       run_ksoftirqd+0x1d/0x42
       smpboot_thread_fn+0x19e/0x1b3
       kthread+0xf1/0xf6
       ? kthread_delayed_work_timer_fn+0x83/0x83
       ret_from_fork+0x24/0x30
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Reported-by: syzbot+611164843bd48cc2190c@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d9f4d60a
    • Michael Roth's avatar
      KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag · a2118e6e
      Michael Roth authored
      [ Upstream commit 3a83f677 ]
      
      On a 2-socket Power9 system with 32 cores/128 threads (SMT4) and 1TB
      of memory running the following guest configs:
      
        guest A:
          - 224GB of memory
          - 56 VCPUs (sockets=1,cores=28,threads=2), where:
            VCPUs 0-1 are pinned to CPUs 0-3,
            VCPUs 2-3 are pinned to CPUs 4-7,
            ...
            VCPUs 54-55 are pinned to CPUs 108-111
      
        guest B:
          - 4GB of memory
          - 4 VCPUs (sockets=1,cores=4,threads=1)
      
      with the following workloads (with KSM and THP enabled in all):
      
        guest A:
          stress --cpu 40 --io 20 --vm 20 --vm-bytes 512M
      
        guest B:
          stress --cpu 4 --io 4 --vm 4 --vm-bytes 512M
      
        host:
          stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M
      
      the below soft-lockup traces were observed after an hour or so and
      persisted until the host was reset (this was found to be reliably
      reproducible for this configuration, for kernels 4.15, 4.18, 5.0,
      and 5.3-rc5):
      
        [ 1253.183290] rcu: INFO: rcu_sched self-detected stall on CPU
        [ 1253.183319] rcu:     124-....: (5250 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=1941
        [ 1256.287426] watchdog: BUG: soft lockup - CPU#105 stuck for 23s! [CPU 52/KVM:19709]
        [ 1264.075773] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [worker:19913]
        [ 1264.079769] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [worker:20331]
        [ 1264.095770] watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [worker:20338]
        [ 1264.131773] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [avocado:19525]
        [ 1280.408480] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
        [ 1316.198012] rcu: INFO: rcu_sched self-detected stall on CPU
        [ 1316.198032] rcu:     124-....: (21003 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=8243
        [ 1340.411024] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
        [ 1379.212609] rcu: INFO: rcu_sched self-detected stall on CPU
        [ 1379.212629] rcu:     124-....: (36756 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=14714
        [ 1404.413615] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
        [ 1442.227095] rcu: INFO: rcu_sched self-detected stall on CPU
        [ 1442.227115] rcu:     124-....: (52509 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=21403
        [ 1455.111787] INFO: task worker:19907 blocked for more than 120 seconds.
        [ 1455.111822]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
        [ 1455.111833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [ 1455.111884] INFO: task worker:19908 blocked for more than 120 seconds.
        [ 1455.111905]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
        [ 1455.111925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [ 1455.111966] INFO: task worker:20328 blocked for more than 120 seconds.
        [ 1455.111986]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
        [ 1455.111998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [ 1455.112048] INFO: task worker:20330 blocked for more than 120 seconds.
        [ 1455.112068]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
        [ 1455.112097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [ 1455.112138] INFO: task worker:20332 blocked for more than 120 seconds.
        [ 1455.112159]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
        [ 1455.112179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [ 1455.112210] INFO: task worker:20333 blocked for more than 120 seconds.
        [ 1455.112231]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
        [ 1455.112242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [ 1455.112282] INFO: task worker:20335 blocked for more than 120 seconds.
        [ 1455.112303]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
        [ 1455.112332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [ 1455.112372] INFO: task worker:20336 blocked for more than 120 seconds.
        [ 1455.112392]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
      
      CPUs 45, 24, and 124 are stuck on spin locks, likely held by
      CPUs 105 and 31.
      
      CPUs 105 and 31 are stuck in smp_call_function_many(), waiting on
      target CPU 42. For instance:
      
        # CPU 105 registers (via xmon)
        R00 = c00000000020b20c   R16 = 00007d1bcd800000
        R01 = c00000363eaa7970   R17 = 0000000000000001
        R02 = c0000000019b3a00   R18 = 000000000000006b
        R03 = 000000000000002a   R19 = 00007d537d7aecf0
        R04 = 000000000000002a   R20 = 60000000000000e0
        R05 = 000000000000002a   R21 = 0801000000000080
        R06 = c0002073fb0caa08   R22 = 0000000000000d60
        R07 = c0000000019ddd78   R23 = 0000000000000001
        R08 = 000000000000002a   R24 = c00000000147a700
        R09 = 0000000000000001   R25 = c0002073fb0ca908
        R10 = c000008ffeb4e660   R26 = 0000000000000000
        R11 = c0002073fb0ca900   R27 = c0000000019e2464
        R12 = c000000000050790   R28 = c0000000000812b0
        R13 = c000207fff623e00   R29 = c0002073fb0ca808
        R14 = 00007d1bbee00000   R30 = c0002073fb0ca800
        R15 = 00007d1bcd600000   R31 = 0000000000000800
        pc  = c00000000020b260 smp_call_function_many+0x3d0/0x460
        cfar= c00000000020b270 smp_call_function_many+0x3e0/0x460
        lr  = c00000000020b20c smp_call_function_many+0x37c/0x460
        msr = 900000010288b033   cr  = 44024824
        ctr = c000000000050790   xer = 0000000000000000   trap =  100
      
      CPU 42 is running normally, doing VCPU work:
      
        # CPU 42 stack trace (via xmon)
        [link register   ] c00800001be17188 kvmppc_book3s_radix_page_fault+0x90/0x2b0 [kvm_hv]
        [c000008ed3343820] c000008ed3343850 (unreliable)
        [c000008ed33438d0] c00800001be11b6c kvmppc_book3s_hv_page_fault+0x264/0xe30 [kvm_hv]
        [c000008ed33439d0] c00800001be0d7b4 kvmppc_vcpu_run_hv+0x8dc/0xb50 [kvm_hv]
        [c000008ed3343ae0] c00800001c10891c kvmppc_vcpu_run+0x34/0x48 [kvm]
        [c000008ed3343b00] c00800001c10475c kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
        [c000008ed3343b90] c00800001c0f5a78 kvm_vcpu_ioctl+0x470/0x7c8 [kvm]
        [c000008ed3343d00] c000000000475450 do_vfs_ioctl+0xe0/0xc70
        [c000008ed3343db0] c0000000004760e4 ksys_ioctl+0x104/0x120
        [c000008ed3343e00] c000000000476128 sys_ioctl+0x28/0x80
        [c000008ed3343e20] c00000000000b388 system_call+0x5c/0x70
        --- Exception: c00 (System Call) at 00007d545cfd7694
        SP (7d53ff7edf50) is in userspace
      
      It was subsequently found that ipi_message[PPC_MSG_CALL_FUNCTION]
      was set for CPU 42 by at least 1 of the CPUs waiting in
      smp_call_function_many(), but somehow the corresponding
      call_single_queue entries were never processed by CPU 42, causing the
      callers to spin in csd_lock_wait() indefinitely.
      
      Nick Piggin suggested something similar to the following sequence as
      a possible explanation (interleaving of CALL_FUNCTION/RESCHEDULE
      IPI messages seems to be most common, but any mix of CALL_FUNCTION and
      !CALL_FUNCTION messages could trigger it):
      
          CPU
            X: smp_muxed_ipi_set_message():
            X:   smp_mb()
            X:   message[RESCHEDULE] = 1
            X: doorbell_global_ipi(42):
            X:   kvmppc_set_host_ipi(42, 1)
            X:   ppc_msgsnd_sync()/smp_mb()
            X:   ppc_msgsnd() -> 42
           42: doorbell_exception(): // from CPU X
           42:   ppc_msgsync()
          105: smp_muxed_ipi_set_message():
          105:   smb_mb()
               // STORE DEFERRED DUE TO RE-ORDERING
        --105:   message[CALL_FUNCTION] = 1
        | 105: doorbell_global_ipi(42):
        | 105:   kvmppc_set_host_ipi(42, 1)
        |  42:   kvmppc_set_host_ipi(42, 0)
        |  42: smp_ipi_demux_relaxed()
        |  42: // returns to executing guest
        |      // RE-ORDERED STORE COMPLETES
        ->105:   message[CALL_FUNCTION] = 1
          105:   ppc_msgsnd_sync()/smp_mb()
          105:   ppc_msgsnd() -> 42
           42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
          105: // hangs waiting on 42 to process messages/call_single_queue
      
      This can be prevented with an smp_mb() at the beginning of
      kvmppc_set_host_ipi(), such that stores to message[<type>] (or other
      state indicated by the host_ipi flag) are ordered vs. the store to
      to host_ipi.
      
      However, doing so might still allow for the following scenario (not
      yet observed):
      
          CPU
            X: smp_muxed_ipi_set_message():
            X:   smp_mb()
            X:   message[RESCHEDULE] = 1
            X: doorbell_global_ipi(42):
            X:   kvmppc_set_host_ipi(42, 1)
            X:   ppc_msgsnd_sync()/smp_mb()
            X:   ppc_msgsnd() -> 42
           42: doorbell_exception(): // from CPU X
           42:   ppc_msgsync()
               // STORE DEFERRED DUE TO RE-ORDERING
        -- 42:   kvmppc_set_host_ipi(42, 0)
        |  42: smp_ipi_demux_relaxed()
        | 105: smp_muxed_ipi_set_message():
        | 105:   smb_mb()
        | 105:   message[CALL_FUNCTION] = 1
        | 105: doorbell_global_ipi(42):
        | 105:   kvmppc_set_host_ipi(42, 1)
        |      // RE-ORDERED STORE COMPLETES
        -> 42:   kvmppc_set_host_ipi(42, 0)
           42: // returns to executing guest
          105:   ppc_msgsnd_sync()/smp_mb()
          105:   ppc_msgsnd() -> 42
           42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
          105: // hangs waiting on 42 to process messages/call_single_queue
      
      Fixing this scenario would require an smp_mb() *after* clearing
      host_ipi flag in kvmppc_set_host_ipi() to order the store vs.
      subsequent processing of IPI messages.
      
      To handle both cases, this patch splits kvmppc_set_host_ipi() into
      separate set/clear functions, where we execute smp_mb() prior to
      setting host_ipi flag, and after clearing host_ipi flag. These
      functions pair with each other to synchronize the sender and receiver
      sides.
      
      With that change in place the above workload ran for 20 hours without
      triggering any lock-ups.
      
      Fixes: 755563bc ("powerpc/powernv: Fixes for hypervisor doorbell handling") # v4.0
      Signed-off-by: default avatarMichael Roth <mdroth@linux.vnet.ibm.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190911223155.16045-1-mdroth@linux.vnet.ibm.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      a2118e6e
    • Florian Westphal's avatar
      selftests: rtnetlink: add addresses with fixed life time · 107726ad
      Florian Westphal authored
      [ Upstream commit 3cfa1488 ]
      
      This exercises kernel code path that deal with addresses that have
      a limited lifetime.
      
      Without previous fix, this triggers following crash on net-next:
       BUG: KASAN: null-ptr-deref in check_lifetime+0x403/0x670
       Read of size 8 at addr 0000000000000010 by task kworker [..]
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      107726ad