1. 27 Mar, 2019 12 commits
    • Lukas Czerner's avatar
      ext4: fix data corruption caused by unaligned direct AIO · 8651fa1e
      Lukas Czerner authored
      commit 372a03e0 upstream.
      
      Ext4 needs to serialize unaligned direct AIO because the zeroing of
      partial blocks of two competing unaligned AIOs can result in data
      corruption.
      
      However it decides not to serialize if the potentially unaligned aio is
      past i_size with the rationale that no pending writes are possible past
      i_size. Unfortunately if the i_size is not block aligned and the second
      unaligned write lands past i_size, but still into the same block, it has
      the potential of corrupting the previous unaligned write to the same
      block.
      
      This is (very simplified) reproducer from Frank
      
          // 41472 = (10 * 4096) + 512
          // 37376 = 41472 - 4096
      
          ftruncate(fd, 41472);
          io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
          io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);
      
          io_submit(io_ctx, 1, &iocbs[1]);
          io_submit(io_ctx, 1, &iocbs[2]);
      
          io_getevents(io_ctx, 2, 2, events, NULL);
      
      Without this patch the 512B range from 40960 up to the start of the
      second unaligned write (41472) is going to be zeroed overwriting the data
      written by the first write. This is a data corruption.
      
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
      *
      0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
      
      With this patch the data corruption is avoided because we will recognize
      the unaligned_aio and wait for the unwritten extent conversion.
      
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
      *
      0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
      *
      0000b200
      Reported-by: default avatarFrank Sorenson <fsorenso@redhat.com>
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Fixes: e9e3bcec ("ext4: serialize unaligned asynchronous DIO")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8651fa1e
    • Jiufei Xue's avatar
      ext4: fix NULL pointer dereference while journal is aborted · d9f0ce85
      Jiufei Xue authored
      commit fa30dde3 upstream.
      
      We see the following NULL pointer dereference while running xfstests
      generic/475:
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
      Oops: 0000 [#1] SMP PTI
      CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
      RIP: 0010:ext4_do_update_inode+0x4ec/0x760
      ...
      Call Trace:
      ? jbd2_journal_get_write_access+0x42/0x50
      ? __ext4_journal_get_write_access+0x2c/0x70
      ? ext4_truncate+0x186/0x3f0
      ext4_mark_iloc_dirty+0x61/0x80
      ext4_mark_inode_dirty+0x62/0x1b0
      ext4_truncate+0x186/0x3f0
      ? unmap_mapping_pages+0x56/0x100
      ext4_setattr+0x817/0x8b0
      notify_change+0x1df/0x430
      do_truncate+0x5e/0x90
      ? generic_permission+0x12b/0x1a0
      
      This is triggered because the NULL pointer handle->h_transaction was
      dereferenced in function ext4_update_inode_fsync_trans().
      I found that the h_transaction was set to NULL in jbd2__journal_restart
      but failed to attached to a new transaction while the journal is aborted.
      
      Fix this by checking the handle before updating the inode.
      
      Fixes: b436b9be ("ext4: Wait for proper transaction commit on fsync")
      Signed-off-by: default avatarJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9f0ce85
    • Josh Poimboeuf's avatar
      objtool: Move objtool_file struct off the stack · c818b2fa
      Josh Poimboeuf authored
      commit 0c671812 upstream.
      
      Objtool uses over 512k of stack, thanks to the hash table embedded in
      the objtool_file struct.  This causes an unnecessarily large stack
      allocation and breaks users with low stack limits.
      
      Move the struct off the stack.
      
      Fixes: 042ba73f ("objtool: Add several performance improvements")
      Reported-by: default avatarVassili Karpov <moosotc@gmail.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/df92dcbc4b84b02ffa252f46876df125fb56e2d7.1552954176.git.jpoimboe@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c818b2fa
    • Chen Jie's avatar
      futex: Ensure that futex address is aligned in handle_futex_death() · 726c28f3
      Chen Jie authored
      commit 5a07168d upstream.
      
      The futex code requires that the user space addresses of futexes are 32bit
      aligned. sys_futex() checks this in futex_get_keys() but the robust list
      code has no alignment check in place.
      
      As a consequence the kernel crashes on architectures with strict alignment
      requirements in handle_futex_death() when trying to cmpxchg() on an
      unaligned futex address which was retrieved from the robust list.
      
      [ tglx: Rewrote changelog, proper sizeof() based alignement check and add
        	comment ]
      
      Fixes: 0771dfef ("[PATCH] lightweight robust futexes: core")
      Signed-off-by: default avatarChen Jie <chenjie6@huawei.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <dvhart@infradead.org>
      Cc: <peterz@infradead.org>
      Cc: <zengweilin@huawei.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1552621478-119787-1-git-send-email-chenjie6@huawei.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      726c28f3
    • Archer Yan's avatar
      MIPS: Fix kernel crash for R6 in jump label branch function · b84089b2
      Archer Yan authored
      commit 47c25036 upstream.
      
      Insert Branch instruction instead of NOP to make sure assembler don't
      patch code in forbidden slot. In jump label function, it might
      be possible to patch Control Transfer Instructions(CTIs) into
      forbidden slot, which will generate Reserved Instruction exception
      in MIPS release 6.
      Signed-off-by: default avatarArcher Yan <ayan@wavecomp.com>
      Reviewed-by: default avatarPaul Burton <paul.burton@mips.com>
      [paul.burton@mips.com:
        - Add MIPS prefix to subject.
        - Mark for stable from v4.0, which introduced r6 support, onwards.]
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b84089b2
    • Yasha Cherikovsky's avatar
      MIPS: Ensure ELF appended dtb is relocated · c7ac334f
      Yasha Cherikovsky authored
      commit 3f0a53bc upstream.
      
      This fixes booting with the combination of CONFIG_RELOCATABLE=y
      and CONFIG_MIPS_ELF_APPENDED_DTB=y.
      
      Sections that appear after the relocation table are not relocated
      on system boot (except .bss, which has special handling).
      
      With CONFIG_MIPS_ELF_APPENDED_DTB, the dtb is part of the
      vmlinux ELF, so it must be relocated together with everything else.
      
      Fixes: 069fd766 ("MIPS: Reserve space for relocation table")
      Signed-off-by: default avatarYasha Cherikovsky <yasha.che3@gmail.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7ac334f
    • Yifeng Li's avatar
      mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction. · f6b55e77
      Yifeng Li authored
      commit 5f5f67da upstream.
      
      Timekeeping IRQs from CS5536 MFGPT are routed to i8259, which then
      triggers the "cascade" IRQ on MIPS CPU. Without IRQF_NO_SUSPEND in
      cascade_irqaction, MFGPT interrupts will be masked in suspend mode,
      and the machine would be unable to resume once suspended.
      
      Previously, MIPS IRQs were not disabled properly, so the original
      code appeared to work. Commit a3e6c1ef ("MIPS: IRQ: Fix disable_irq on
      CPU IRQs") uncovers the bug. To fix it, add IRQF_NO_SUSPEND to
      cascade_irqaction.
      
      This commit is functionally identical to 0add9c2f ("MIPS:
      Loongson-3: Add IRQF_NO_SUSPEND to Cascade irqaction"), but it forgot
      to apply the same fix to Loongson2.
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org # v3.19+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6b55e77
    • Jan Kara's avatar
      udf: Fix crash on IO error during truncate · 92477062
      Jan Kara authored
      commit d3ca4651 upstream.
      
      When truncate(2) hits IO error when reading indirect extent block the
      code just bugs with:
      
      kernel BUG at linux-4.15.0/fs/udf/truncate.c:249!
      ...
      
      Fix the problem by bailing out cleanly in case of IO error.
      
      CC: stable@vger.kernel.org
      Reported-by: default avatarjean-luc malet <jeanluc.malet@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92477062
    • Ilya Dryomov's avatar
      libceph: wait for latest osdmap in ceph_monc_blacklist_add() · 9c32ada4
      Ilya Dryomov authored
      commit bb229bbb upstream.
      
      Because map updates are distributed lazily, an OSD may not know about
      the new blacklist for quite some time after "osd blacklist add" command
      is completed.  This makes it possible for a blacklisted but still alive
      client to overwrite a post-blacklist update, resulting in data
      corruption.
      
      Waiting for latest osdmap in ceph_monc_blacklist_add() and thus using
      the post-blacklist epoch for all post-blacklist requests ensures that
      all such requests "wait" for the blacklist to come into force on their
      respective OSDs.
      
      Cc: stable@vger.kernel.org
      Fixes: 6305a3b4 ("libceph: support for blacklisting clients")
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJason Dillaman <dillaman@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c32ada4
    • Stanislaw Gruszka's avatar
      iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE · cfa2d25f
      Stanislaw Gruszka authored
      commit 4e50ce03 upstream.
      
      Take into account that sg->offset can be bigger than PAGE_SIZE when
      setting segment sg->dma_address. Otherwise sg->dma_address will point
      at diffrent page, what makes DMA not possible with erros like this:
      
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa70c0 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7040 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7080 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7100 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7000 flags=0x0020]
      
      Additinally with wrong sg->dma_address unmap_sg will free wrong pages,
      what what can cause crashes like this:
      
      Feb 28 19:27:45 kernel: BUG: Bad page state in process cinnamon  pfn:39e8b1
      Feb 28 19:27:45 kernel: Disabling lock debugging due to kernel taint
      Feb 28 19:27:45 kernel: flags: 0x2ffff0000000000()
      Feb 28 19:27:45 kernel: raw: 02ffff0000000000 0000000000000000 ffffffff00000301 0000000000000000
      Feb 28 19:27:45 kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      Feb 28 19:27:45 kernel: page dumped because: nonzero _refcount
      Feb 28 19:27:45 kernel: Modules linked in: ccm fuse arc4 nct6775 hwmon_vid amdgpu nls_iso8859_1 nls_cp437 edac_mce_amd vfat fat kvm_amd ccp rng_core kvm mt76x0u mt76x0_common mt76x02_usb irqbypass mt76_usb mt76x02_lib mt76 crct10dif_pclmul crc32_pclmul chash mac80211 amd_iommu_v2 ghash_clmulni_intel gpu_sched i2c_algo_bit ttm wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper snd_hda_codec_hdmi snd_hda_intel drm snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd snd_pcm cfg80211 cryptd mousedev snd_timer glue_helper pcspkr r8169 input_leds realtek agpgart libphy rfkill snd syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore sp5100_tco k10temp i2c_piix4 wmi evdev gpio_amdpt pinctrl_amd mac_hid pcc_cpufreq acpi_cpufreq sg ip_tables x_tables ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) serio_raw(E) atkbd(E) libps2(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) xhci_pci(E) xhci_hcd(E)
      Feb 28 19:27:45 kernel:  scsi_mod(E) i8042(E) serio(E) bcache(E) crc64(E)
      Feb 28 19:27:45 kernel: CPU: 2 PID: 896 Comm: cinnamon Tainted: G    B   W   E     4.20.12-arch1-1-custom #1
      Feb 28 19:27:45 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P1.20 06/26/2018
      Feb 28 19:27:45 kernel: Call Trace:
      Feb 28 19:27:45 kernel:  dump_stack+0x5c/0x80
      Feb 28 19:27:45 kernel:  bad_page.cold.29+0x7f/0xb2
      Feb 28 19:27:45 kernel:  __free_pages_ok+0x2c0/0x2d0
      Feb 28 19:27:45 kernel:  skb_release_data+0x96/0x180
      Feb 28 19:27:45 kernel:  __kfree_skb+0xe/0x20
      Feb 28 19:27:45 kernel:  tcp_recvmsg+0x894/0xc60
      Feb 28 19:27:45 kernel:  ? reuse_swap_page+0x120/0x340
      Feb 28 19:27:45 kernel:  ? ptep_set_access_flags+0x23/0x30
      Feb 28 19:27:45 kernel:  inet_recvmsg+0x5b/0x100
      Feb 28 19:27:45 kernel:  __sys_recvfrom+0xc3/0x180
      Feb 28 19:27:45 kernel:  ? handle_mm_fault+0x10a/0x250
      Feb 28 19:27:45 kernel:  ? syscall_trace_enter+0x1d3/0x2d0
      Feb 28 19:27:45 kernel:  ? __audit_syscall_exit+0x22a/0x290
      Feb 28 19:27:45 kernel:  __x64_sys_recvfrom+0x24/0x30
      Feb 28 19:27:45 kernel:  do_syscall_64+0x5b/0x170
      Feb 28 19:27:45 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Cc: stable@vger.kernel.org
      Reported-and-tested-by: default avatarJan Viktorin <jan.viktorin@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Fixes: 80187fd3 ('iommu/amd: Optimize map_sg and unmap_sg')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfa2d25f
    • Thomas Zimmermann's avatar
      drm/vmwgfx: Don't double-free the mode stored in par->set_mode · eea23ca2
      Thomas Zimmermann authored
      commit c2d31155 upstream.
      
      When calling vmw_fb_set_par(), the mode stored in par->set_mode gets free'd
      twice. The first free is in vmw_fb_kms_detach(), the second is near the
      end of vmw_fb_set_par() under the name of 'old_mode'. The mode-setting code
      only works correctly if the mode doesn't actually change. Removing
      'old_mode' in favor of using par->set_mode directly fixes the problem.
      
      Cc: <stable@vger.kernel.org>
      Fixes: a278724a ("drm/vmwgfx: Implement fbdev on kms v2")
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Reviewed-by: default avatarDeepak Rawat <drawat@vmware.com>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eea23ca2
    • Arnd Bergmann's avatar
      mmc: pxamci: fix enum type confusion · c179e6de
      Arnd Bergmann authored
      commit e60a582b upstream.
      
      clang points out several instances of mismatched types in this drivers,
      all coming from a single declaration:
      
      drivers/mmc/host/pxamci.c:193:15: error: implicit conversion from enumeration type 'enum dma_transfer_direction' to
            different enumeration type 'enum dma_data_direction' [-Werror,-Wenum-conversion]
                      direction = DMA_DEV_TO_MEM;
                                ~ ^~~~~~~~~~~~~~
      drivers/mmc/host/pxamci.c:212:62: error: implicit conversion from enumeration type 'enum dma_data_direction' to
            different enumeration type 'enum dma_transfer_direction' [-Werror,-Wenum-conversion]
              tx = dmaengine_prep_slave_sg(chan, data->sg, host->dma_len, direction,
      
      The behavior is correct, so this must be a simply typo from
      dma_data_direction and dma_transfer_direction being similarly named
      types with a similar purpose.
      
      Fixes: 6464b714 ("mmc: pxamci: switch over to dmaengine use")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Acked-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c179e6de
  2. 23 Mar, 2019 28 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.165 · 1c453afc
      Greg Kroah-Hartman authored
      1c453afc
    • Wanpeng Li's avatar
      KVM: X86: Fix residual mmio emulation request to userspace · 5e29da06
      Wanpeng Li authored
      commit bbeac283 upstream.
      
      Reported by syzkaller:
      
      The kvm-intel.unrestricted_guest=0
      
         WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         CPU: 5 PID: 1014 Comm: warn_test Tainted: G        W  OE   4.13.0-rc3+ #8
         RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         Call Trace:
          ? put_pid+0x3a/0x50
          ? rcu_read_lock_sched_held+0x79/0x80
          ? kmem_cache_free+0x2f2/0x350
          kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? __fget+0xfc/0x210
          do_vfs_ioctl+0xa4/0x6a0
          ? __fget+0x11d/0x210
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x23/0xc2
          ? __this_cpu_preempt_check+0x13/0x20
      
      The syszkaller folks reported a residual mmio emulation request to userspace
      due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and
      incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true
      and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs
      several threads to launch the same vCPU, the thread which lauch this vCPU after
      the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will
      trigger the warning.
      
         #define _GNU_SOURCE
         #include <pthread.h>
         #include <stdio.h>
         #include <stdlib.h>
         #include <string.h>
         #include <sys/wait.h>
         #include <sys/types.h>
         #include <sys/stat.h>
         #include <sys/mman.h>
         #include <fcntl.h>
         #include <unistd.h>
         #include <linux/kvm.h>
         #include <stdio.h>
      
         int kvmcpu;
         struct kvm_run *run;
      
         void* thr(void* arg)
         {
           int res;
           res = ioctl(kvmcpu, KVM_RUN, 0);
           printf("ret1=%d exit_reason=%d suberror=%d\n",
               res, run->exit_reason, run->internal.suberror);
           return 0;
         }
      
         void test()
         {
           int i, kvm, kvmvm;
           pthread_t th[4];
      
           kvm = open("/dev/kvm", O_RDWR);
           kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
           kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
           run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0);
           srand(getpid());
           for (i = 0; i < 4; i++) {
             pthread_create(&th[i], 0, thr, 0);
             usleep(rand() % 10000);
           }
           for (i = 0; i < 4; i++)
             pthread_join(th[i], 0);
         }
      
         int main()
         {
           for (;;) {
             int pid = fork();
             if (pid < 0)
               exit(1);
             if (pid == 0) {
               test();
               exit(0);
             }
             int status;
             while (waitpid(pid, &status, __WALL) != pid) {}
           }
           return 0;
         }
      
      This patch fixes it by resetting the vcpu->mmio_needed once we receive
      the triple fault to avoid the residue.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Zubin Mithra <zsm@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e29da06
    • Sean Christopherson's avatar
      KVM: nVMX: Ignore limit checks on VMX instructions using flat segments · 7b3c6c48
      Sean Christopherson authored
      commit 34333cc6 upstream.
      
      Regarding segments with a limit==0xffffffff, the SDM officially states:
      
          When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
          or may not cause the indicated exceptions.  Behavior is
          implementation-specific and may vary from one execution to another.
      
      In practice, all CPUs that support VMX ignore limit checks for "flat
      segments", i.e. an expand-up data or code segment with base=0 and
      limit=0xffffffff.  This is subtly different than wrapping the effective
      address calculation based on the address size, as the flat segment
      behavior also applies to accesses that would wrap the 4g boundary, e.g.
      a 4-byte access starting at 0xffffffff will access linear addresses
      0xffffffff, 0x0, 0x1 and 0x2.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b3c6c48
    • Sean Christopherson's avatar
      KVM: nVMX: Sign extend displacements of VMX instr's mem operands · 9748354a
      Sean Christopherson authored
      commit 946c522b upstream.
      
      The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
      operands for various instructions, including VMX instructions, as a
      naturally sized unsigned value, but masks the value by the addr size,
      e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
      reported as 0xffffffd8 for a 32-bit address size.  Despite some weird
      wording regarding sign extension, the SDM explicitly states that bits
      beyond the instructions address size are undefined:
      
          In all cases, bits of this field beyond the instructionâ€s address
          size are undefined.
      
      Failure to sign extend the displacement results in KVM incorrectly
      treating a negative displacement as a large positive displacement when
      the address size of the VMX instruction is smaller than KVM's native
      size, e.g. a 32-bit address size on a 64-bit KVM.
      
      The very original decoding, added by commit 064aea77 ("KVM: nVMX:
      Decoding memory operands of VMX instructions"), sort of modeled sign
      extension by truncating the final virtual/linear address for a 32-bit
      address size.  I.e. it messed up the effective address but made it work
      by adjusting the final address.
      
      When segmentation checks were added, the truncation logic was kept
      as-is and no sign extension logic was introduced.  In other words, it
      kept calculating the wrong effective address while mostly generating
      the correct virtual/linear address.  As the effective address is what's
      used in the segment limit checks, this results in KVM incorreclty
      injecting #GP/#SS faults due to non-existent segment violations when
      a nested VMM uses negative displacements with an address size smaller
      than KVM's native address size.
      
      Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
      KVM using 0x100000fd8 as the effective address when checking for a
      segment limit violation.  This causes a 100% failure rate when running
      a 32-bit KVM build as L1 on top of a 64-bit KVM L0.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9748354a
    • Gustavo A. R. Silva's avatar
      drm/radeon/evergreen_cs: fix missing break in switch statement · 45fe916e
      Gustavo A. R. Silva authored
      commit cc5034a5 upstream.
      
      Add missing break statement in order to prevent the code from falling
      through to case CB_TARGET_MASK.
      
      This bug was found thanks to the ongoing efforts to enable
      -Wimplicit-fallthrough.
      
      Fixes: dd220a00 ("drm/radeon/kms: add support for streamout v7")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45fe916e
    • Sakari Ailus's avatar
      media: uvcvideo: Avoid NULL pointer dereference at the end of streaming · 7e1b5809
      Sakari Ailus authored
      commit 9dd0627d upstream.
      
      The UVC video driver converts the timestamp from hardware specific unit
      to one known by the kernel at the time when the buffer is dequeued. This
      is fine in general, but the streamoff operation consists of the
      following steps (among other things):
      
      1. uvc_video_clock_cleanup --- the hardware clock sample array is
         released and the pointer to the array is set to NULL,
      
      2. buffers in active state are returned to the user and
      
      3. buf_finish callback is called on buffers that are prepared.
         buf_finish includes calling uvc_video_clock_update that accesses the
         hardware clock sample array.
      
      The above is serialised by a queue specific mutex. Address the problem
      by skipping the clock conversion if the hardware clock sample array is
      already released.
      
      Fixes: 9c0863b1 ("[media] vb2: call buf_finish from __queue_cancel")
      Reported-by: default avatarChiranjeevi Rapolu <chiranjeevi.rapolu@intel.com>
      Tested-by: default avatarChiranjeevi Rapolu <chiranjeevi.rapolu@intel.com>
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e1b5809
    • Zhang, Jun's avatar
      rcu: Do RCU GP kthread self-wakeup from softirq and interrupt · 3b2bbd1b
      Zhang, Jun authored
      commit 1d1f898d upstream.
      
      The rcu_gp_kthread_wake() function is invoked when it might be necessary
      to wake the RCU grace-period kthread.  Because self-wakeups are normally
      a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
      this kthread, it naturally refuses to do the wakeup.
      
      Unfortunately, natural though it might be, this heuristic fails when
      rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
      that interrupted the grace-period kthread just after the final check of
      the wait-event condition but just before the schedule() call.  In this
      case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
      is within the RCU grace-period kthread's context.  Failing to provide
      this wakeup can result in grace periods failing to start, which in turn
      results in out-of-memory conditions.
      
      This race window is quite narrow, but it actually did happen during real
      testing.  It would of course need to be fixed even if it was strictly
      theoretical in nature.
      
      This patch does not Cc stable because it does not apply cleanly to
      earlier kernel versions.
      
      Fixes: 48a7639c ("rcu: Make callers awaken grace-period kthread")
      Reported-by: default avatar"He, Bo" <bo.he@intel.com>
      Co-developed-by: default avatar"Zhang, Jun" <jun.zhang@intel.com>
      Co-developed-by: default avatar"He, Bo" <bo.he@intel.com>
      Co-developed-by: default avatar"xiao, jin" <jin.xiao@intel.com>
      Co-developed-by: default avatarBai, Jie A <jie.a.bai@intel.com>
      Signed-off: "Zhang, Jun" <jun.zhang@intel.com>
      Signed-off: "He, Bo" <bo.he@intel.com>
      Signed-off: "xiao, jin" <jin.xiao@intel.com>
      Signed-off: Bai, Jie A <jie.a.bai@intel.com>
      Signed-off-by: default avatar"Zhang, Jun" <jun.zhang@intel.com>
      [ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
        !in_serving_softirq() to avoid redundant wakeups and to also handle the
        interrupt-handler scenario as well as the softirq-handler scenario that
        actually occurred in testing. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      3b2bbd1b
    • Aditya Pakki's avatar
      md: Fix failed allocation of md_register_thread · f61b68e1
      Aditya Pakki authored
      commit e406f12d upstream.
      
      mddev->sync_thread can be set to NULL on kzalloc failure downstream.
      The patch checks for such a scenario and frees allocated resources.
      
      Committer node:
      
      Added similar fix to raid5.c, as suggested by Guoqing.
      
      Cc: stable@vger.kernel.org # v3.16+
      Acked-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarAditya Pakki <pakki001@umn.edu>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f61b68e1
    • Adrian Hunter's avatar
      perf intel-pt: Fix divide by zero when TSC is not available · 5ed7a8f6
      Adrian Hunter authored
      commit 07633387 upstream.
      
      When TSC is not available, "timeless" decoding is used but a divide by
      zero occurs if perf_time_to_tsc() is called.
      
      Ensure the divisor is not zero.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.9+
      Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ed7a8f6
    • Adrian Hunter's avatar
      perf intel-pt: Fix overlap calculation for padding · 4f7c16b5
      Adrian Hunter authored
      commit 5a99d99e upstream.
      
      Auxtrace records might have up to 7 bytes of padding appended. Adjust
      the overlap accordingly.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190206103947.15750-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f7c16b5
    • Adrian Hunter's avatar
      perf auxtrace: Define auxtrace record alignment · 300ef83e
      Adrian Hunter authored
      commit c3fcadf0 upstream.
      
      Define auxtrace record alignment so that it can be referenced elsewhere.
      
      Note this is preparation for patch "perf intel-pt: Fix overlap calculation
      for padding"
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190206103947.15750-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      300ef83e
    • Adrian Hunter's avatar
      perf intel-pt: Fix CYC timestamp calculation after OVF · d07d5160
      Adrian Hunter authored
      commit 03997612 upstream.
      
      CYC packet timestamp calculation depends upon CBR which was being
      cleared upon overflow (OVF). That can cause errors due to failing to
      synchronize with sideband events. Even if a CBR change has been lost,
      the old CBR is still a better estimate than zero. So remove the clearing
      of CBR.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190206103947.15750-4-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d07d5160
    • Daniel Axtens's avatar
      bcache: never writeback a discard operation · 7fb9a25c
      Daniel Axtens authored
      commit 9951379b upstream.
      
      Some users see panics like the following when performing fstrim on a
      bcached volume:
      
      [  529.803060] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [  530.183928] #PF error: [normal kernel read fault]
      [  530.412392] PGD 8000001f42163067 P4D 8000001f42163067 PUD 1f42168067 PMD 0
      [  530.750887] Oops: 0000 [#1] SMP PTI
      [  530.920869] CPU: 10 PID: 4167 Comm: fstrim Kdump: loaded Not tainted 5.0.0-rc1+ #3
      [  531.290204] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
      [  531.693137] RIP: 0010:blk_queue_split+0x148/0x620
      [  531.922205] Code: 60 38 89 55 a0 45 31 db 45 31 f6 45 31 c9 31 ff 89 4d 98 85 db 0f 84 7f 04 00 00 44 8b 6d 98 4c 89 ee 48 c1 e6 04 49 03 70 78 <8b> 46 08 44 8b 56 0c 48
      8b 16 44 29 e0 39 d8 48 89 55 a8 0f 47 c3
      [  532.838634] RSP: 0018:ffffb9b708df39b0 EFLAGS: 00010246
      [  533.093571] RAX: 00000000ffffffff RBX: 0000000000046000 RCX: 0000000000000000
      [  533.441865] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000000
      [  533.789922] RBP: ffffb9b708df3a48 R08: ffff940d3b3fdd20 R09: 0000000000000000
      [  534.137512] R10: ffffb9b708df3958 R11: 0000000000000000 R12: 0000000000000000
      [  534.485329] R13: 0000000000000000 R14: 0000000000000000 R15: ffff940d39212020
      [  534.833319] FS:  00007efec26e3840(0000) GS:ffff940d1f480000(0000) knlGS:0000000000000000
      [  535.224098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  535.504318] CR2: 0000000000000008 CR3: 0000001f4e256004 CR4: 00000000001606e0
      [  535.851759] Call Trace:
      [  535.970308]  ? mempool_alloc_slab+0x15/0x20
      [  536.174152]  ? bch_data_insert+0x42/0xd0 [bcache]
      [  536.403399]  blk_mq_make_request+0x97/0x4f0
      [  536.607036]  generic_make_request+0x1e2/0x410
      [  536.819164]  submit_bio+0x73/0x150
      [  536.980168]  ? submit_bio+0x73/0x150
      [  537.149731]  ? bio_associate_blkg_from_css+0x3b/0x60
      [  537.391595]  ? _cond_resched+0x1a/0x50
      [  537.573774]  submit_bio_wait+0x59/0x90
      [  537.756105]  blkdev_issue_discard+0x80/0xd0
      [  537.959590]  ext4_trim_fs+0x4a9/0x9e0
      [  538.137636]  ? ext4_trim_fs+0x4a9/0x9e0
      [  538.324087]  ext4_ioctl+0xea4/0x1530
      [  538.497712]  ? _copy_to_user+0x2a/0x40
      [  538.679632]  do_vfs_ioctl+0xa6/0x600
      [  538.853127]  ? __do_sys_newfstat+0x44/0x70
      [  539.051951]  ksys_ioctl+0x6d/0x80
      [  539.212785]  __x64_sys_ioctl+0x1a/0x20
      [  539.394918]  do_syscall_64+0x5a/0x110
      [  539.568674]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      We have observed it where both:
      1) LVM/devmapper is involved (bcache backing device is LVM volume) and
      2) writeback cache is involved (bcache cache_mode is writeback)
      
      On one machine, we can reliably reproduce it with:
      
       # echo writeback > /sys/block/bcache0/bcache/cache_mode
         (not sure whether above line is required)
       # mount /dev/bcache0 /test
       # for i in {0..10}; do
      	file="$(mktemp /test/zero.XXX)"
      	dd if=/dev/zero of="$file" bs=1M count=256
      	sync
      	rm $file
          done
        # fstrim -v /test
      
      Observing this with tracepoints on, we see the following writes:
      
      fstrim-18019 [022] .... 91107.302026: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4260112 + 196352 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302050: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4456464 + 262144 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302075: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4718608 + 81920 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302094: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5324816 + 180224 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302121: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5505040 + 262144 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302145: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5767184 + 81920 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.308777: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 6373392 + 180224 hit 1 bypass 0
      <crash>
      
      Note the final one has different hit/bypass flags.
      
      This is because in should_writeback(), we were hitting a case where
      the partial stripe condition was returning true and so
      should_writeback() was returning true early.
      
      If that hadn't been the case, it would have hit the would_skip test, and
      as would_skip == s->iop.bypass == true, should_writeback() would have
      returned false.
      
      Looking at the git history from 'commit 72c27061 ("bcache: Write out
      full stripes")', it looks like the idea was to optimise for raid5/6:
      
             * If a stripe is already dirty, force writes to that stripe to
      	 writeback mode - to help build up full stripes of dirty data
      
      To fix this issue, make sure that should_writeback() on a discard op
      never returns true.
      
      More details of debugging:
      https://www.spinics.net/lists/linux-bcache/msg06996.html
      
      Previous reports:
       - https://bugzilla.kernel.org/show_bug.cgi?id=201051
       - https://bugzilla.kernel.org/show_bug.cgi?id=196103
       - https://www.spinics.net/lists/linux-bcache/msg06885.html
      
      (Coly Li: minor modification to follow maximum 75 chars per line rule)
      
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 72c27061 ("bcache: Write out full stripes")
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fb9a25c
    • Viresh Kumar's avatar
      PM / wakeup: Rework wakeup source timer cancellation · 6f76eeca
      Viresh Kumar authored
      commit 1fad17fb upstream.
      
      If wakeup_source_add() is called right after wakeup_source_remove()
      for the same wakeup source, timer_setup() may be called for a
      potentially scheduled timer which is incorrect.
      
      To avoid that, move the wakeup source timer cancellation from
      wakeup_source_drop() to wakeup_source_remove().
      
      Moreover, make wakeup_source_remove() clear the timer function after
      canceling the timer to let wakeup_source_not_registered() treat
      unregistered wakeup sources in the same way as the ones that have
      never been registered.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
      [ rjw: Subject, changelog, merged two patches together ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f76eeca
    • Yihao Wu's avatar
      nfsd: fix wrong check in write_v4_end_grace() · 33c164d5
      Yihao Wu authored
      commit dd838821 upstream.
      
      Commit 62a063b8 "nfsd4: fix crash on writing v4_end_grace before
      nfsd startup" is trying to fix a NULL dereference issue, but it
      mistakenly checks if the nfsd server is started. So fix it.
      
      Fixes: 62a063b8 "nfsd4: fix crash on writing v4_end_grace before nfsd startup"
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: default avatarYihao Wu <wuyihao@linux.alibaba.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33c164d5
    • NeilBrown's avatar
      nfsd: fix memory corruption caused by readdir · e4ea22f9
      NeilBrown authored
      commit b602345d upstream.
      
      If the result of an NFSv3 readdir{,plus} request results in the
      "offset" on one entry having to be split across 2 pages, and is sized
      so that the next directory entry doesn't fit in the requested size,
      then memory corruption can happen.
      
      When encode_entry() is called after encoding the last entry that fits,
      it notices that ->offset and ->offset1 are set, and so stores the
      offset value in the two pages as required.  It clears ->offset1 but
      *does not* clear ->offset.
      
      Normally this omission doesn't matter as encode_entry_baggage() will
      be called, and will set ->offset to a suitable value (not on a page
      boundary).
      But in the case where cd->buflen < elen and nfserr_toosmall is
      returned, ->offset is not reset.
      
      This means that nfsd3proc_readdirplus will see ->offset with a value 4
      bytes before the end of a page, and ->offset1 set to NULL.
      It will try to write 8bytes to ->offset.
      If we are lucky, the next page will be read-only, and the system will
        BUG: unable to handle kernel paging request at...
      
      If we are unlucky, some innocent page will have the first 4 bytes
      corrupted.
      
      nfsd3proc_readdir() doesn't even check for ->offset1, it just blindly
      writes 8 bytes to the offset wherever it is.
      
      Fix this by clearing ->offset after it is used, and copying the
      ->offset handling code from nfsd3_proc_readdirplus into
      nfsd3_proc_readdir.
      
      (Note that the commit hash in the Fixes tag is from the 'history'
       tree - this bug predates git).
      
      Fixes: 0b1d57cf ("[PATCH] kNFSd: Fix nfs3 dentry encoding")
      Fixes-URL: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0b1d57cf7654
      Cc: stable@vger.kernel.org (v2.6.12+)
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4ea22f9
    • Trond Myklebust's avatar
      NFS: Don't recoalesce on error in nfs_pageio_complete_mirror() · 7ed60826
      Trond Myklebust authored
      commit 8127d827 upstream.
      
      If the I/O completion failed with a fatal error, then we should just
      exit nfs_pageio_complete_mirror() rather than try to recoalesce.
      
      Fixes: a7d42ddb ("nfs: add mirroring support to pgio layer")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ed60826
    • Trond Myklebust's avatar
      NFS: Fix an I/O request leakage in nfs_do_recoalesce · 18ae8146
      Trond Myklebust authored
      commit 4d91969e upstream.
      
      Whether we need to exit early, or just reprocess the list, we
      must not lost track of the request which failed to get recoalesced.
      
      Fixes: 03d5eb65 ("NFS: Fix a memory leak in nfs_do_recoalesce")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18ae8146
    • Trond Myklebust's avatar
      NFS: Fix I/O request leakages · 0da4596d
      Trond Myklebust authored
      commit f57dcf4c upstream.
      
      When we fail to add the request to the I/O queue, we currently leave it
      to the caller to free the failed request. However since some of the
      requests that fail are actually created by nfs_pageio_add_request()
      itself, and are not passed back the caller, this leads to a leakage
      issue, which can again cause page locks to leak.
      
      This commit addresses the leakage by freeing the created requests on
      error, using desc->pg_completion_ops->error_cleanup()
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Fixes: a7d42ddb ("nfs: add mirroring support to pgio layer")
      Cc: stable@vger.kernel.org # v4.0: c18b96a1: nfs: clean up rest of reqs
      Cc: stable@vger.kernel.org # v4.0: d600ad1f: NFS41: pop some layoutget
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0da4596d
    • NeilBrown's avatar
      dm: fix to_sector() for 32bit · e393365f
      NeilBrown authored
      commit 0bdb50c5 upstream.
      
      A dm-raid array with devices larger than 4GB won't assemble on
      a 32 bit host since _check_data_dev_sectors() was added in 4.16.
      This is because to_sector() treats its argument as an "unsigned long"
      which is 32bits (4GB) on a 32bit host.  Using "unsigned long long"
      is more correct.
      
      Kernels as early as 4.2 can have other problems due to to_sector()
      being used on the size of a device.
      
      Fixes: 0cf45031 ("dm raid: add support for the MD RAID0 personality")
      cc: stable@vger.kernel.org (v4.2+)
      Reported-and-tested-by: default avatarGuillaume Perréal <gperreal@free.fr>
      Signed-off-by: default avatarNeilBrown <neil@brown.name>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e393365f
    • Gustavo A. R. Silva's avatar
      ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify · a3310231
      Gustavo A. R. Silva authored
      commit e2477233 upstream.
      
      Fix boolean expressions by using logical AND operator '&&' instead of
      bitwise operator '&'.
      
      This issue was detected with the help of Coccinelle.
      
      Fixes: 4fa084af ("ARM: OSIRIS: DVS (Dynamic Voltage Scaling) supoort.")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      [krzk: Fix -Wparentheses warning]
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3310231
    • Michael Ellerman's avatar
      powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning · 380960e5
      Michael Ellerman authored
      commit ca6d5149 upstream.
      
      GCC 8 warns about the logic in vr_get/set(), which with -Werror breaks
      the build:
      
        In function ‘user_regset_copyin’,
            inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:628:9:
        include/linux/regset.h:295:4: error: ‘memcpy’ offset [-527, -529] is
        out of the bounds [0, 16] of object ‘vrsave’ with type ‘union
        <anonymous>’ [-Werror=array-bounds]
        arch/powerpc/kernel/ptrace.c: In function ‘vr_set’:
        arch/powerpc/kernel/ptrace.c:623:5: note: ‘vrsave’ declared here
           } vrsave;
      
      This has been identified as a regression in GCC, see GCC bug 88273.
      
      However we can avoid the warning and also simplify the logic and make
      it more robust.
      
      Currently we pass -1 as end_pos to user_regset_copyout(). This says
      "copy up to the end of the regset".
      
      The definition of the regset is:
      	[REGSET_VMX] = {
      		.core_note_type = NT_PPC_VMX, .n = 34,
      		.size = sizeof(vector128), .align = sizeof(vector128),
      		.active = vr_active, .get = vr_get, .set = vr_set
      	},
      
      The end is calculated as (n * size), ie. 34 * sizeof(vector128).
      
      In vr_get/set() we pass start_pos as 33 * sizeof(vector128), meaning
      we can copy up to sizeof(vector128) into/out-of vrsave.
      
      The on-stack vrsave is defined as:
        union {
      	  elf_vrreg_t reg;
      	  u32 word;
        } vrsave;
      
      And elf_vrreg_t is:
        typedef __vector128 elf_vrreg_t;
      
      So there is no bug, but we rely on all those sizes lining up,
      otherwise we would have a kernel stack exposure/overwrite on our
      hands.
      
      Rather than relying on that we can pass an explict end_pos based on
      the sizeof(vrsave). The result should be exactly the same but it's
      more obviously not over-reading/writing the stack and it avoids the
      compiler warning.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Reported-by: default avatarMathieu Malaterre <malat@debian.org>
      Cc: stable@vger.kernel.org
      Tested-by: default avatarMathieu Malaterre <malat@debian.org>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      380960e5
    • Mark Cave-Ayland's avatar
      powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest · b8f072b0
      Mark Cave-Ayland authored
      commit fe1ef6bc upstream.
      
      Commit 8792468d "powerpc: Add the ability to save FPU without
      giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from
      the bitmask used to update the MSR of the previous thread in
      __giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the
      host kernel.
      
      Leaving FE0/1 enabled means unrelated processes might receive FPEs
      when they're not expecting them and crash. In particular if this
      happens to init the host will then panic.
      
      eg (transcribed):
        qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0
        systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0
        Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      Reinstate these bits to the MSR bitmask to enable MacOS guests to run
      under 32-bit KVM-PR once again without issue.
      
      Fixes: 8792468d ("powerpc: Add the ability to save FPU without giving it up")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: default avatarMark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8f072b0
    • Christophe Leroy's avatar
      powerpc/83xx: Also save/restore SPRG4-7 during suspend · 5d8fff63
      Christophe Leroy authored
      commit 36da5ff0 upstream.
      
      The 83xx has 8 SPRG registers and uses at least SPRG4
      for DTLB handling LRU.
      
      Fixes: 2319f123 ("powerpc/mm: e300c2/c3/c4 TLB errata workaround")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d8fff63
    • Jordan Niethe's avatar
      powerpc/powernv: Make opal log only readable by root · f3b4d46f
      Jordan Niethe authored
      commit 7b62f9bd upstream.
      
      Currently the opal log is globally readable. It is kernel policy to
      limit the visibility of physical addresses / kernel pointers to root.
      Given this and the fact the opal log may contain this information it
      would be better to limit the readability to root.
      
      Fixes: bfc36894 ("powerpc/powernv: Add OPAL message log interface")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarJordan Niethe <jniethe5@gmail.com>
      Reviewed-by: default avatarStewart Smith <stewart@linux.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3b4d46f
    • Christophe Leroy's avatar
      powerpc/wii: properly disable use of BATs when requested. · abd8c860
      Christophe Leroy authored
      commit 6d183ca8 upstream.
      
      'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC
      deny the use of BATS for mapping memory.
      
      This patch makes sure that the specific wii RAM mapping function
      takes it into account as well.
      
      Fixes: de32400d ("wii: use both mem1 and mem2 as ram")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJonathan Neuschafer <j.neuschaefer@gmx.net>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abd8c860
    • Christophe Leroy's avatar
      powerpc/32: Clear on-stack exception marker upon exception return · 9b53d043
      Christophe Leroy authored
      commit 9580b71b upstream.
      
      Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order
      to avoid confusing stacktrace like the one below.
      
        Call Trace:
        [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable)
        [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180
        [c0e9dd10] [c0895130] memchr+0x24/0x74
        [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574
        [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8
        [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4
        --- interrupt: c0e9df00 at 0x400f330
            LR = init_stack+0x1f00/0x2000
        [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable)
        [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
        [c0e9df50] [c0c15434] start_kernel+0x310/0x488
        [c0e9dff0] [00003484] 0x3484
      
      With this patch the trace becomes:
      
        Call Trace:
        [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable)
        [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180
        [c0e9dd10] [c0895150] memchr+0x24/0x74
        [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574
        [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8
        [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4
        [c0e9de80] [c00ae3e4] printk+0xa8/0xcc
        [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
        [c0e9df50] [c0c15434] start_kernel+0x310/0x488
        [c0e9dff0] [00003484] 0x3484
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b53d043
    • zhangyi (F)'s avatar
      jbd2: fix compile warning when using JBUFFER_TRACE · 241f3e33
      zhangyi (F) authored
      commit 01215d3e upstream.
      
      The jh pointer may be used uninitialized in the two cases below and the
      compiler complain about it when enabling JBUFFER_TRACE macro, fix them.
      
      In file included from fs/jbd2/transaction.c:19:0:
      fs/jbd2/transaction.c: In function ‘jbd2_journal_get_undo_access’:
      ./include/linux/jbd2.h:1637:38: warning: ‘jh’ is used uninitialized in this function [-Wuninitialized]
       #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
                                            ^
      fs/jbd2/transaction.c:1219:23: note: ‘jh’ was declared here
        struct journal_head *jh;
                             ^
      In file included from fs/jbd2/transaction.c:19:0:
      fs/jbd2/transaction.c: In function ‘jbd2_journal_dirty_metadata’:
      ./include/linux/jbd2.h:1637:38: warning: ‘jh’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
                                            ^
      fs/jbd2/transaction.c:1332:23: note: ‘jh’ was declared here
        struct journal_head *jh;
                             ^
      Signed-off-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      241f3e33