1. 17 Oct, 2018 1 commit
    • Mathieu Desnoyers's avatar
      tracepoint: Fix tracepoint array element size mismatch · 9c0be3f6
      Mathieu Desnoyers authored
      commit 46e0c9be ("kernel: tracepoints: add support for relative
      references") changes the layout of the __tracepoint_ptrs section on
      architectures supporting relative references. However, it does so
      without turning struct tracepoint * const into const int elsewhere in
      the tracepoint code, which has the following side-effect:
      
      Setting mod->num_tracepoints is done in by module.c:
      
          mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs",
                                               sizeof(*mod->tracepoints_ptrs),
                                               &mod->num_tracepoints);
      
      Basically, since sizeof(*mod->tracepoints_ptrs) is a pointer size
      (rather than sizeof(int)), num_tracepoints is erroneously set to half the
      size it should be on 64-bit arch. So a module with an odd number of
      tracepoints misses the last tracepoint due to effect of integer
      division.
      
      So in the module going notifier:
      
              for_each_tracepoint_range(mod->tracepoints_ptrs,
                      mod->tracepoints_ptrs + mod->num_tracepoints,
                      tp_module_going_check_quiescent, NULL);
      
      the expression (mod->tracepoints_ptrs + mod->num_tracepoints) actually
      evaluates to something within the bounds of the array, but miss the
      last tracepoint if the number of tracepoints is odd on 64-bit arch.
      
      Fix this by introducing a new typedef: tracepoint_ptr_t, which
      is either "const int" on architectures that have PREL32 relocations,
      or "struct tracepoint * const" on architectures that does not have
      this feature.
      
      Also provide a new tracepoint_ptr_defer() static inline to
      encapsulate deferencing this type rather than duplicate code and
      ugly idefs within the for_each_tracepoint_range() implementation.
      
      This issue appears in 4.19-rc kernels, and should ideally be fixed
      before the end of the rc cycle.
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarJessica Yu <jeyu@kernel.org>
      Link: http://lkml.kernel.org/r/20181013191050.22389-1-mathieu.desnoyers@efficios.com
      Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morris <james.morris@microsoft.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      9c0be3f6
  2. 05 Oct, 2018 1 commit
    • Steven Rostedt (VMware)'s avatar
      vsprintf: Fix off-by-one bug in bstr_printf() processing dereferenced pointers · 62165600
      Steven Rostedt (VMware) authored
      The functions vbin_printf() and bstr_printf() are used by trace_printk() to
      try to keep the overhead down during printing. trace_printk() uses
      vbin_printf() at the time of execution, as it only scans the fmt string to
      record the printf values into the buffer, and then uses vbin_printf() to do
      the conversions to print the string based on the format and the saved
      values in the buffer.
      
      This is an issue for dereferenced pointers, as before commit 841a915d,
      the processing of the pointer could happen some time after the pointer value
      was recorded (reading the trace buffer). This means the processing of the
      value at a later time could show different results, or even crash the
      system, if the pointer no longer existed.
      
      Commit 841a915d addressed this by processing dereferenced pointers at
      the time of execution and save the result in the ring buffer as a string.
      The bstr_printf() would then treat these pointers as normal strings, and
      print the value. But there was an off-by-one bug here, where after
      processing the argument, it move the pointer only "strlen(arg)" which made
      the arg pointer not point to the next argument in the ring buffer, but
      instead point to the nul character of the last argument. This causes any
      values after a dereferenced pointer to be corrupted.
      
      Cc: stable@vger.kernel.org
      Fixes: 841a915d ("vsprintf: Do not have bprintf dereference pointers")
      Reported-by: default avatarNikolay Borisov <nborisov@suse.com>
      Tested-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      62165600
  3. 23 Sep, 2018 7 commits
  4. 22 Sep, 2018 1 commit
    • Omar Sandoval's avatar
      block: use nanosecond resolution for iostat · b57e99b4
      Omar Sandoval authored
      Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
      updating properly on 4.18. This is because we started using ktime to
      track elapsed time, and we convert nanoseconds to jiffies when we update
      the partition counter. However, this gets rounded down, so any I/Os that
      take less than a jiffy are not accounted for. Previously in this case,
      the value of jiffies would sometimes increment while we were doing I/O,
      so at least some I/Os were accounted for.
      
      Let's convert the stats to use nanoseconds internally. We still report
      milliseconds as before, now more accurately than ever. The value is
      still truncated to 32 bits for backwards compatibility.
      
      Fixes: 522a7775 ("block: consolidate struct request timestamp fields")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarKlaus Kusche <klaus.kusche@computerix.info>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b57e99b4
  5. 21 Sep, 2018 5 commits
    • Greg Kroah-Hartman's avatar
      Merge tag 'pinctrl-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 10dc890d
      Greg Kroah-Hartman authored
      Linus writes:
        "Pin control fixes for v4.19:
         - Two fixes for the Intel pin controllers than cause
           problems on laptops."
      
      * tag 'pinctrl-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: intel: Do pin translation in other GPIO operations as well
        pinctrl: cannonlake: Fix gpio base for GPP-E
      10dc890d
    • Greg Kroah-Hartman's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a27fb6d9
      Greg Kroah-Hartman authored
      Paolo writes:
        "It's mostly small bugfixes and cleanups, mostly around x86 nested
         virtualization.  One important change, not related to nested
         virtualization, is that the ability for the guest kernel to trap
         CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is
         now masked by default.  This is because the feature is detected
         through an MSR; a very bad idea that Intel seems to like more and
         more.  Some applications choke if the other fields of that MSR are
         not initialized as on real hardware, hence we have to disable the
         whole MSR by default, as was the case before Linux 4.12."
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
        KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
        kvm: selftests: Add platform_info_test
        KVM: x86: Control guest reads of MSR_PLATFORM_INFO
        KVM: x86: Turbo bits in MSR_PLATFORM_INFO
        nVMX x86: Check VPID value on vmentry of L2 guests
        nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
        KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
        KVM: VMX: check nested state and CR4.VMXE against SMM
        kvm: x86: make kvm_{load|put}_guest_fpu() static
        x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
        KVM: VMX: use preemption timer to force immediate VMExit
        KVM: VMX: modify preemption timer bit only when arming timer
        KVM: VMX: immediately mark preemption timer expired only for zero value
        KVM: SVM: Switch to bitmap_zalloc()
        KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
        kvm: selftests: use -pthread instead of -lpthread
        KVM: x86: don't reset root in kvm_mmu_setup()
        kvm: mmu: Don't read PDPTEs when paging is not enabled
        x86/kvm/lapic: always disable MMIO interface in x2APIC mode
        KVM: s390: Make huge pages unavailable in ucontrol VMs
        ...
      a27fb6d9
    • Greg Kroah-Hartman's avatar
      Merge tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs · 0eba8697
      Greg Kroah-Hartman authored
      Richard writes:
        "This pull request contains fixes for UBIFS:
         - A wrong UBIFS assertion in mount code
         - Fix for a NULL pointer deref in mount code
         - Revert of a bad fix for xattrs"
      
      * tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs:
        Revert "ubifs: xattr: Don't operate on deleted inodes"
        ubifs: drop false positive assertion
        ubifs: Check for name being NULL while mounting
      0eba8697
    • Greg Kroah-Hartman's avatar
      Merge tag 'for-linus-20180920' of git://git.kernel.dk/linux-block · 211b100a
      Greg Kroah-Hartman authored
      Jens writes:
        "Storage fixes for 4.19-rc5
      
        - Fix for leaking kernel pointer in floppy ioctl (Andy Whitcroft)
      
        - NVMe pull request from Christoph, and a single ANA log page fix
          (Hannes)
      
        - Regression fix for libata qd32 support, where we trigger an illegal
          active command transition. This fixes a CD-ROM detection issue that
          was reported, but could also trigger premature completion of the
          internal tag (me)"
      
      * tag 'for-linus-20180920' of git://git.kernel.dk/linux-block:
        floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
        libata: mask swap internal and hardware tag
        nvme: count all ANA groups for ANA Log page
      211b100a
    • Greg Kroah-Hartman's avatar
      Merge tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm · a38fd7d8
      Greg Kroah-Hartman authored
      David writes:
        "drm fixes for 4.19-rc5:
      
         - core: fix debugfs for atomic, fix the check for atomic for
           non-modesetting drivers
         - amdgpu: adds a new PCI id, some kfd fixes and a sdma fix
         - i915: a bunch of GVT fixes.
         - vc4: scaling fix
         - vmwgfx: modesetting fixes and a old buffer eviction fix
         - udl: framebuffer destruction fix
         - sun4i: disable on R40 fix until next kernel
         - pl111: NULL termination on table fix"
      
      * tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm: (21 commits)
        drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs
        drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9
        drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
        drm/vmwgfx: Fix buffer object eviction
        drm/vmwgfx: Don't impose STDU limits on framebuffer size
        drm/vmwgfx: limit mode size for all display unit to texture_max
        drm/vmwgfx: limit screen size to stdu_max during check_modeset
        drm/vmwgfx: don't check for old_crtc_state enable status
        drm/amdgpu: add new polaris pci id
        drm: sun4i: drop second PLL from A64 HDMI PHY
        drm: fix drm_drv_uses_atomic_modeset on non modesetting drivers.
        drm/i915/gvt: clear ggtt entries when destroy vgpu
        drm/i915/gvt: request srcu_read_lock before checking if one gfn is valid
        drm/i915/gvt: Add GEN9_CLKGATE_DIS_4 to default BXT mmio handler
        drm/i915/gvt: Init PHY related registers for BXT
        drm/atomic: Use drm_drv_uses_atomic_modeset() for debugfs creation
        drm/fb-helper: Remove set but not used variable 'connector_funcs'
        drm: udl: Destroy framebuffer only if it was initialized
        drm/sun4i: Remove R40 display pipeline compatibles
        drm/pl111: Make sure of_device_id tables are NULL terminated
        ...
      a38fd7d8
  6. 20 Sep, 2018 25 commits
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 4fcb7f8b
      Dave Airlie authored
      A few fixes for 4.19:
      - Add a new polaris pci id
      - KFD fixes for raven and gfx7
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexdeucher@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180920155850.5455-1-alexander.deucher@amd.com
      4fcb7f8b
    • Dave Airlie's avatar
      Merge branch 'vmwgfx-fixes-4.19' of git://people.freedesktop.org/~thomash/linux into drm-fixes · 618cc151
      Dave Airlie authored
      A couple of modesetting fixes and a fix for a long-standing buffer-eviction
      problem cc'd stable.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Thomas Hellstrom <thellstrom@vmware.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180920063935.35492-1-thellstrom@vmware.com
      618cc151
    • Feng Tang's avatar
      x86/mm: Expand static page table for fixmap space · 05ab1d8a
      Feng Tang authored
      We met a kernel panic when enabling earlycon, which is due to the fixmap
      address of earlycon is not statically setup.
      
      Currently the static fixmap setup in head_64.S only covers 2M virtual
      address space, while it actually could be in 4M space with different
      kernel configurations, e.g. when VSYSCALL emulation is disabled.
      
      So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2,
      and add a build time check to ensure that the fixmap is covered by the
      initial static page tables.
      
      Fixes: 1ad83c85 ("x86_64,vsyscall: Make vsyscall emulation configurable")
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts)
      Cc: H Peter Anvin <hpa@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
      05ab1d8a
    • Junxiao Bi's avatar
      ocfs2: fix ocfs2 read block panic · 234b69e3
      Junxiao Bi authored
      While reading block, it is possible that io error return due to underlying
      storage issue, in this case, BH_NeedsValidate was left in the buffer head.
      Then when reading the very block next time, if it was already linked into
      journal, that will trigger the following panic.
      
      [203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342!
      [203748.702533] invalid opcode: 0000 [#1] SMP
      [203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
      [203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2
      [203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015
      [203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000
      [203748.703088] RIP: 0010:[<ffffffffa05e9f09>]  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
      [203748.703130] RSP: 0018:ffff88006ff4b818  EFLAGS: 00010206
      [203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000
      [203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe
      [203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0
      [203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000
      [203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000
      [203748.705871] FS:  00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000
      [203748.706370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670
      [203748.707124] Stack:
      [203748.707371]  ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001
      [203748.707885]  0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00
      [203748.708399]  00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000
      [203748.708915] Call Trace:
      [203748.709175]  [<ffffffffa0609f52>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
      [203748.709680]  [<ffffffffa05eca00>] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2]
      [203748.710185]  [<ffffffffa05ec0cb>] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2]
      [203748.710691]  [<ffffffffa05f0fbf>] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2]
      [203748.711204]  [<ffffffffa065660f>] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2]
      [203748.711716]  [<ffffffffa05f4f3a>] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2]
      [203748.712227]  [<ffffffffa05f442e>] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2]
      [203748.712737]  [<ffffffffa061b2f2>] ocfs2_mknod+0x4b2/0x1370 [ocfs2]
      [203748.713003]  [<ffffffffa061c385>] ocfs2_create+0x65/0x170 [ocfs2]
      [203748.713263]  [<ffffffff8121714b>] vfs_create+0xdb/0x150
      [203748.713518]  [<ffffffff8121b225>] do_last+0x815/0x1210
      [203748.713772]  [<ffffffff812192e9>] ? path_init+0xb9/0x450
      [203748.714123]  [<ffffffff8121bca0>] path_openat+0x80/0x600
      [203748.714378]  [<ffffffff811bcd45>] ? handle_pte_fault+0xd15/0x1620
      [203748.714634]  [<ffffffff8121d7ba>] do_filp_open+0x3a/0xb0
      [203748.714888]  [<ffffffff8122a767>] ? __alloc_fd+0xa7/0x130
      [203748.715143]  [<ffffffff81209ffc>] do_sys_open+0x12c/0x220
      [203748.715403]  [<ffffffff81026ddb>] ? syscall_trace_enter_phase1+0x11b/0x180
      [203748.715668]  [<ffffffff816f0c9f>] ? system_call_after_swapgs+0xe9/0x190
      [203748.715928]  [<ffffffff8120a10e>] SyS_open+0x1e/0x20
      [203748.716184]  [<ffffffff816f0d5e>] system_call_fastpath+0x18/0xd7
      [203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff <0f> 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10
      [203748.717505] RIP  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
      [203748.717775]  RSP <ffff88006ff4b818>
      
      Joesph ever reported a similar panic.
      Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html
      
      Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      234b69e3
    • Roman Gushchin's avatar
      mm: slowly shrink slabs with a relatively small number of objects · 172b06c3
      Roman Gushchin authored
      9092c71b ("mm: use sc->priority for slab shrink targets") changed the
      way that the target slab pressure is calculated and made it
      priority-based:
      
          delta = freeable >> priority;
          delta *= 4;
          do_div(delta, shrinker->seeks);
      
      The problem is that on a default priority (which is 12) no pressure is
      applied at all, if the number of potentially reclaimable objects is less
      than 4096 (1<<12).
      
      This causes the last objects on slab caches of no longer used cgroups to
      (almost) never get reclaimed.  It's obviously a waste of memory.
      
      It can be especially painful, if these stale objects are holding a
      reference to a dying cgroup.  Slab LRU lists are reparented on memcg
      offlining, but corresponding objects are still holding a reference to the
      dying cgroup.  If we don't scan these objects, the dying cgroup can't go
      away.  Most likely, the parent cgroup hasn't any directly charged objects,
      only remaining objects from dying children cgroups.  So it can easily hold
      a reference to hundreds of dying cgroups.
      
      If there are no big spikes in memory pressure, and new memory cgroups are
      created and destroyed periodically, this causes the number of dying
      cgroups grow steadily, causing a slow-ish and hard-to-detect memory
      "leak".  It's not a real leak, as the memory can be eventually reclaimed,
      but it could not happen in a real life at all.  I've seen hosts with a
      steadily climbing number of dying cgroups, which doesn't show any signs of
      a decline in months, despite the host is loaded with a production
      workload.
      
      It is an obvious waste of memory, and to prevent it, let's apply a minimal
      pressure even on small shrinker lists.  E.g.  if there are freeable
      objects, let's scan at least min(freeable, scan_batch) objects.
      
      This fix significantly improves a chance of a dying cgroup to be
      reclaimed, and together with some previous patches stops the steady growth
      of the dying cgroups number on some of our hosts.
      
      Link: http://lkml.kernel.org/r/20180905230759.12236-1-guro@fb.com
      Fixes: 9092c71b ("mm: use sc->priority for slab shrink targets")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      172b06c3
    • YueHaibing's avatar
    • Joel Fernandes (Google)'s avatar
      mm: shmem.c: Correctly annotate new inodes for lockdep · b45d71fb
      Joel Fernandes (Google) authored
      Directories and inodes don't necessarily need to be in the same lockdep
      class.  For ex, hugetlbfs splits them out too to prevent false positives
      in lockdep.  Annotate correctly after new inode creation.  If its a
      directory inode, it will be put into a different class.
      
      This should fix a lockdep splat reported by syzbot:
      
      > ======================================================
      > WARNING: possible circular locking dependency detected
      > 4.18.0-rc8-next-20180810+ #36 Not tainted
      > ------------------------------------------------------
      > syz-executor900/4483 is trying to acquire lock:
      > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock
      > include/linux/fs.h:765 [inline]
      > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at:
      > shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
      >
      > but task is already holding lock:
      > 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
      > drivers/staging/android/ashmem.c:448
      >
      > which lock already depends on the new lock.
      >
      > -> #2 (ashmem_mutex){+.+.}:
      >        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
      >        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
      >        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
      >        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
      >        call_mmap include/linux/fs.h:1844 [inline]
      >        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
      >        do_mmap+0xa10/0x1220 mm/mmap.c:1535
      >        do_mmap_pgoff include/linux/mm.h:2298 [inline]
      >        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
      >        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
      >        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
      >        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
      >        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
      >        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      >
      > -> #1 (&mm->mmap_sem){++++}:
      >        __might_fault+0x155/0x1e0 mm/memory.c:4568
      >        _copy_to_user+0x30/0x110 lib/usercopy.c:25
      >        copy_to_user include/linux/uaccess.h:155 [inline]
      >        filldir+0x1ea/0x3a0 fs/readdir.c:196
      >        dir_emit_dot include/linux/fs.h:3464 [inline]
      >        dir_emit_dots include/linux/fs.h:3475 [inline]
      >        dcache_readdir+0x13a/0x620 fs/libfs.c:193
      >        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
      >        __do_sys_getdents fs/readdir.c:231 [inline]
      >        __se_sys_getdents fs/readdir.c:212 [inline]
      >        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
      >        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      >
      > -> #0 (&sb->s_type->i_mutex_key#9){++++}:
      >        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
      >        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
      >        inode_lock include/linux/fs.h:765 [inline]
      >        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
      >        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
      >        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
      >        vfs_ioctl fs/ioctl.c:46 [inline]
      >        file_ioctl fs/ioctl.c:501 [inline]
      >        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
      >        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
      >        __do_sys_ioctl fs/ioctl.c:709 [inline]
      >        __se_sys_ioctl fs/ioctl.c:707 [inline]
      >        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
      >        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      >
      > other info that might help us debug this:
      >
      > Chain exists of:
      >   &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex
      >
      >  Possible unsafe locking scenario:
      >
      >        CPU0                    CPU1
      >        ----                    ----
      >   lock(ashmem_mutex);
      >                                lock(&mm->mmap_sem);
      >                                lock(ashmem_mutex);
      >   lock(&sb->s_type->i_mutex_key#9);
      >
      >  *** DEADLOCK ***
      >
      > 1 lock held by syz-executor900/4483:
      >  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
      > ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448
      
      Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.orgSigned-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Suggested-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b45d71fb
    • Dominique Martinet's avatar
      fs/proc/kcore.c: fix invalid memory access in multi-page read optimization · a1b3d2f2
      Dominique Martinet authored
      The 'm' kcore_list item could point to kclist_head, and it is incorrect to
      look at m->addr / m->size in this case.
      
      There is no choice but to run through the list of entries for every
      address if we did not find any entry in the previous iteration
      
      Reset 'm' to NULL in that case at Omar Sandoval's suggestion.
      
      [akpm@linux-foundation.org: add comment]
      Link: http://lkml.kernel.org/r/1536100702-28706-1-git-send-email-asmadeus@codewreck.org
      Fixes: bf991c22 ("proc/kcore: optimize multiple page reads")
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Omar Sandoval <osandov@osandov.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Bhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1b3d2f2
    • Pasha Tatashin's avatar
      mm: disable deferred struct page for 32-bit arches · 889c695d
      Pasha Tatashin authored
      Deferred struct page init is needed only on systems with large amount of
      physical memory to improve boot performance.  32-bit systems do not
      benefit from this feature.
      
      Jiri reported a problem where deferred struct pages do not work well with
      x86-32:
      
      [    0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
      [    0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
      [    0.036269] Initializing CPU#0
      [    0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
      [    0.038459] page:f6780000 is uninitialized and poisoned
      [    0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
      [    0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
      [    0.040038] ------------[ cut here ]------------
      [    0.040399] kernel BUG at include/linux/page-flags.h:293!
      [    0.040823] invalid opcode: 0000 [#1] SMP PTI
      [    0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
      [    0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
      [    0.042496] EIP: free_highmem_page+0x64/0x80
      [    0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
      [    0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
      [    0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
      [    0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
      [    0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
      [    0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      [    0.046913] DR6: fffe0ff0 DR7: 00000400
      [    0.047220] Call Trace:
      [    0.047419]  add_highpages_with_active_regions+0xbd/0x10d
      [    0.047854]  set_highmem_pages_init+0x5b/0x71
      [    0.048202]  mem_init+0x2b/0x1e8
      [    0.048460]  start_kernel+0x1d2/0x425
      [    0.048757]  i386_start_kernel+0x93/0x97
      [    0.049073]  startup_32_smp+0x164/0x168
      [    0.049379] Modules linked in:
      [    0.049626] ---[ end trace 337949378db0abbb ]---
      
      We free highmem pages before their struct pages are initialized:
      
      mem_init()
       set_highmem_pages_init()
        add_highpages_with_active_regions()
         free_highmem_page()
          .. Access uninitialized struct page here..
      
      Because there is no reason to have this feature on 32-bit systems, just
      disable it.
      
      Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
      Fixes: 2e3ca40f ("mm: relax deferred struct page requirements")
      Signed-off-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Reported-by: default avatarJiri Slaby <jslaby@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      889c695d
    • KJ Tsanaktsidis's avatar
      fork: report pid exhaustion correctly · f83606f5
      KJ Tsanaktsidis authored
      Make the clone and fork syscalls return EAGAIN when the limit on the
      number of pids /proc/sys/kernel/pid_max is exceeded.
      
      Currently, when the pid_max limit is exceeded, the kernel will return
      ENOSPC from the fork and clone syscalls.  This is contrary to the
      documented behaviour, which explicitly calls out the pid_max case as one
      where EAGAIN should be returned.  It also leads to really confusing error
      messages in userspace programs which will complain about a lack of disk
      space when they fail to create processes/threads for this reason.
      
      This error is being returned because alloc_pid() uses the idr api to find
      a new pid; when there are none available, idr_alloc_cyclic() returns
      -ENOSPC, and this is being propagated back to userspace.
      
      This behaviour has been broken before, and was explicitly fixed in
      commit 35f71bc0 ("fork: report pid reservation failure properly"),
      so I think -EAGAIN is definitely the right thing to return in this case.
      The current behaviour change dates from commit 95846ecf ("pid:
      replace pid bitmap implementation with IDR AIP") and was I believe
      unintentional.
      
      This patch has no impact on the case where allocating a pid fails because
      the child reaper for the namespace is dead; that case will still return
      -ENOMEM.
      
      Link: http://lkml.kernel.org/r/20180903111016.46461-1-ktsanaktsidis@zendesk.com
      Fixes: 95846ecf ("pid: replace pid bitmap implementation with IDR AIP")
      Signed-off-by: default avatarKJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Gargi Sharma <gs051095@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f83606f5
    • Thomas Gleixner's avatar
      MAINTAINERS: Add X86 MM entry · 9068a427
      Thomas Gleixner authored
      Dave, Andy and Peter are de facto overseing the mm parts of X86. Add an
      explicit maintainers entry.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      9068a427
    • Fenghua Yu's avatar
      x86/intel_rdt: Add Reinette as co-maintainer for RDT · a8b3bb33
      Fenghua Yu authored
      Reinette Chatre is doing great job on enabling pseudo-locking and other
      features in RDT. Add her as co-maintainer for RDT.
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/1537472228-221799-1-git-send-email-fenghua.yu@intel.com
      a8b3bb33
    • Richard Weinberger's avatar
      Revert "ubifs: xattr: Don't operate on deleted inodes" · f061c1cc
      Richard Weinberger authored
      This reverts commit 11a6fc3d.
      UBIFS wants to assert that xattr operations are only issued on files
      with positive link count. The said patch made this operations return
      -ENOENT for unlinked files such that the asserts will no longer trigger.
      This was wrong since xattr operations are perfectly fine on unlinked
      files.
      Instead the assertions need to be fixed/removed.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 11a6fc3d ("ubifs: xattr: Don't operate on deleted inodes")
      Reported-by: default avatarKoen Vandeputte <koen.vandeputte@ncentric.com>
      Tested-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      f061c1cc
    • Sascha Hauer's avatar
      ubifs: drop false positive assertion · d3bdc016
      Sascha Hauer authored
      The following sequence triggers
      
      	ubifs_assert(c, c->lst.taken_empty_lebs > 0);
      
      at the end of ubifs_remount_fs():
      
      mount -t ubifs /dev/ubi0_0 /mnt
      echo 1 > /sys/kernel/debug/ubifs/ubi0_0/ro_error
      umount /mnt
      mount -t ubifs -o ro /dev/ubix_y /mnt
      mount -o remount,ro /mnt
      
      The resulting
      
      UBIFS assert failed in ubifs_remount_fs at 1878 (pid 161)
      
      is a false positive. In the case above c->lst.taken_empty_lebs has
      never been changed from its initial zero value. This will only happen
      when the deferred recovery is done.
      
      Fix this by doing the assertion only when recovery has been done
      already.
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      d3bdc016
    • Richard Weinberger's avatar
      ubifs: Check for name being NULL while mounting · 37f31b6c
      Richard Weinberger authored
      The requested device name can be NULL or an empty string.
      Check for that and refuse to continue. UBIFS has to do this manually
      since we cannot use mount_bdev(), which checks for this condition.
      
      Fixes: 1e51764a ("UBIFS: add new flash file system")
      Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      37f31b6c
    • Liran Alon's avatar
      KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs · 26b471c7
      Liran Alon authored
      The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set
      their return value in "r" local var and break out of switch block
      when they encounter some error.
      This is because vcpu_load() is called before the switch block which
      have a proper cleanup of vcpu_put() afterwards.
      
      However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return
      immediately on error without performing above mentioned cleanup.
      
      Thus, change these handlers to behave as expected.
      
      Fixes: 8fcc4b59 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
      Reviewed-by: default avatarMark Kanda <mark.kanda@oracle.com>
      Reviewed-by: default avatarPatrick Colp <patrick.colp@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      26b471c7
    • Yong Zhao's avatar
      drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs · 44d8cc6f
      Yong Zhao authored
      Because CRAT_CU_FLAGS_IOMMU_PRESENT was not set in some BIOS crat, we
      need to workaround this.
      
      For future compatibility, we also overwrite the bit in capability according
      to the value of needs_iommu_device.
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarYong Zhao <Yong.Zhao@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      44d8cc6f
    • Yong Zhao's avatar
      drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9 · 15426dbb
      Yong Zhao authored
      CWSR fails on Raven if the control stack is MTYPE_UC, which is used
      for regular GART mappings. As a workaround we map it using MTYPE_NC.
      
      The MEC firmware expects the control stack at one page offset from the
      start of the MQD so it is part of the MQD allocation on GFXv9. AMDGPU
      added a memory allocation flag just for this purpose.
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarYong Zhao <yong.zhao@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      15426dbb
    • Amber Lin's avatar
      drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7 · caaa4c8a
      Amber Lin authored
      A wrong register bit was examinated for checking SDMA status so it reports
      false failures. This typo only appears on gfx_v7. gfx_v8 checks the correct
      bit.
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAmber Lin <Amber.Lin@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      caaa4c8a
    • Mika Westerberg's avatar
      pinctrl: intel: Do pin translation in other GPIO operations as well · 96147db1
      Mika Westerberg authored
      For some reason I thought GPIOLIB handles translation from GPIO ranges
      to pinctrl pins but it turns out not to be the case. This means that
      when GPIOs operations are performed for a pin controller having a custom
      GPIO base such as Cannon Lake and Ice Lake incorrect pin number gets
      used internally.
      
      Fix this in the same way we did for lock/unlock IRQ operations and
      translate the GPIO number to pin before using it.
      
      Fixes: a60eac32 ("pinctrl: intel: Allow custom GPIO base for pad groups")
      Reported-by: default avatarRajat Jain <rajatja@google.com>
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Tested-by: default avatarRajat Jain <rajatja@google.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      96147db1
    • Jens Axboe's avatar
      Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus · d611aaf3
      Jens Axboe authored
      Pull NVMe fix from Christoph.
      
      * 'nvme-4.19' of git://git.infradead.org/nvme:
        nvme: count all ANA groups for ANA Log page
      d611aaf3
    • Andy Whitcroft's avatar
      floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl · 65eea8ed
      Andy Whitcroft authored
      The final field of a floppy_struct is the field "name", which is a pointer
      to a string in kernel memory.  The kernel pointer should not be copied to
      user memory.  The FDGETPRM ioctl copies a floppy_struct to user memory,
      including this "name" field.  This pointer cannot be used by the user
      and it will leak a kernel address to user-space, which will reveal the
      location of kernel code and data and undermine KASLR protection.
      
      Model this code after the compat ioctl which copies the returned data
      to a previously cleared temporary structure on the stack (excluding the
      name pointer) and copy out to userspace from there.  As we already have
      an inparam union with an appropriate member and that memory is already
      cleared even for read only calls make use of that as a temporary store.
      
      Based on an initial patch by Brian Belleville.
      
      CVE-2018-7755
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      
      Broke up long line.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      65eea8ed
    • Jens Axboe's avatar
      libata: mask swap internal and hardware tag · 7ce5c8cd
      Jens Axboe authored
      hen we're comparing the hardware completion mask passed in from the
      driver with the internal tag pending mask, we need to account for the
      fact that the internal tag is different from the hardware tag. If not,
      then we can end up either prematurely completing the internal tag (since
      it's not set in the hw mask), or simply flag an error:
      
      ata2: illegal qc_active transition (100000000->00000001)
      
      If the internal tag is set, then swap that with the hardware tag in this
      case before comparing with what the hardware reports.
      
      Fixes: 28361c40 ("libata: add extra internal command")
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201151
      Cc: stable@vger.kernel.org
      Reported-by: default avatarPaul Sbarra <sbarra.paul@gmail.com>
      Tested-by: default avatarPaul Sbarra <sbarra.paul@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7ce5c8cd
    • Miguel Ojeda's avatar
      Compiler Attributes: naked can be shared · ae596de1
      Miguel Ojeda authored
      The naked attribute is supported by at least gcc >= 4.6 (for ARM,
      which is the only current user), gcc >= 8 (for x86), clang >= 3.1
      and icc >= 13. See https://godbolt.org/z/350Dyc
      
      Therefore, move it out of compiler-gcc.h so that the definition
      is shared by all compilers.
      
      This also fixes Clang support for ARM32 --- 815f0ddb
      ("include/linux/compiler*.h: make compiler-*.h mutually exclusive").
      
      Fixes: 815f0ddb ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Eli Friedman <efriedma@codeaurora.org>
      Cc: Christopher Li <sparse@chrisli.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Dominique Martinet <asmadeus@codewreck.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-sparse@vger.kernel.org
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Tested-by: default avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: default avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: default avatarLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae596de1
    • Miguel Ojeda's avatar
      Compiler Attributes: naked was fixed in gcc 4.6 · d124b44f
      Miguel Ojeda authored
      Commit 9c695203 ("compiler-gcc.h: gcc-4.5 needs noclone
      and noinline on __naked functions") added noinline and noclone
      as a workaround for a gcc 4.5 bug, which was resolved in 4.6.0.
      
      Since now the minimum gcc supported version is 4.6,
      we can clean it up.
      
      See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44290
      and https://godbolt.org/z/h6NMIL
      
      Fixes: 815f0ddb ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Eli Friedman <efriedma@codeaurora.org>
      Cc: Christopher Li <sparse@chrisli.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Dominique Martinet <asmadeus@codewreck.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-sparse@vger.kernel.org
      Tested-by: default avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: default avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: default avatarLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d124b44f