1. 20 Feb, 2016 1 commit
    • Simon Guinot's avatar
      kernel/resource.c: fix muxed resource handling in __request_region() · 59ceeaaf
      Simon Guinot authored
      In __request_region, if a conflict with a BUSY and MUXED resource is
      detected, then the caller goes to sleep and waits for the resource to be
      released.  A pointer on the conflicting resource is kept.  At wake-up
      this pointer is used as a parent to retry to request the region.
      
      A first problem is that this pointer might well be invalid (if for
      example the conflicting resource have already been freed).  Another
      problem is that the next call to __request_region() fails to detect a
      remaining conflict.  The previously conflicting resource is passed as a
      parameter and __request_region() will look for a conflict among the
      children of this resource and not at the resource itself.  It is likely
      to succeed anyway, even if there is still a conflict.
      
      Instead, the parent of the conflicting resource should be passed to
      __request_region().
      
      As a fix, this patch doesn't update the parent resource pointer in the
      case we have to wait for a muxed region right after.
      Reported-and-tested-by: default avatarVincent Pelletier <plr.vincent@gmail.com>
      Signed-off-by: default avatarSimon Guinot <simon.guinot@sequanux.org>
      Tested-by: default avatarVincent Donnefort <vdonnefort@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59ceeaaf
  2. 19 Feb, 2016 23 commits
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 020ecbba
      Linus Torvalds authored
      Pull ext4 bugfixes from Ted Ts'o:
       "Miscellaneous ext4 bug fixes for v4.5"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix crashes in dioread_nolock mode
        ext4: fix bh->b_state corruption
        ext4: fix memleak in ext4_readdir()
        ext4: remove unused parameter "newblock" in convert_initialized_extent()
        ext4: don't read blocks from disk after extents being swapped
        ext4: fix potential integer overflow
        ext4: add a line break for proc mb_groups display
        ext4: ioctl: fix erroneous return value
        ext4: fix scheduling in atomic on group checksum failure
        ext4 crypto: move context consistency check to ext4_file_open()
        ext4 crypto: revalidate dentry after adding or removing the key
      020ecbba
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · ce6b7143
      Linus Torvalds authored
      Pull btrfs fix from Chris Mason:
       "My for-linus-4.5 branch has a btrfs DIO error passing fix.
      
        I know how much you love DIO, so I'm going to suggest against reading
        it.  We'll follow up with a patch to drop the error arg from
        dio_end_io in the next merge window."
      
      * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix direct IO requests not reporting IO error to user space
      ce6b7143
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 87d9ac71
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "10 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: slab: free kmem_cache_node after destroy sysfs file
        ipc/shm: handle removed segments gracefully in shm_mmap()
        MAINTAINERS: update Kselftest Framework mailing list
        devm_memremap_release(): fix memremap'd addr handling
        mm/hugetlb.c: fix incorrect proc nr_hugepages value
        mm, x86: fix pte_page() crash in gup_pte_range()
        fsnotify: turn fsnotify reaper thread into a workqueue job
        Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
        mm: fix regression in remap_file_pages() emulation
        thp, dax: do not try to withdraw pgtable from non-anon VMA
      87d9ac71
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 23300f65
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Here are some more arm64 fixes for 4.5.  This has mostly come from
        Yang Shi, who saw some issues under -rt that also affect mainline.
        The rest of it is pretty small, but still worth having.
      
        We've got an old issue outstanding with valid_user_regs which will
        likely wait until 4.6 (since it would really benefit from some time in
        -next) and another issue with kasan and idle which should be fixed
        next week.
      
        Apart from that, pretty quiet here (and still no sign of the THP issue
        reported on s390...)
      
        Summary:
      
         - Allow EFI stub to use strnlen(), which is required by recent libfdt
      
         - Avoid smp_processor_id() in preempt context during unwinding
      
         - Avoid false Kasan warnings during unwinding
      
         - Ensure early devices are picked up by the IOMMU DMA ops
      
         - Avoid rebuilding the kernel for the 'install' target
      
         - Run fixup handlers for alignment faults on userspace access"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: mm: allow the kernel to handle alignment faults on user accesses
        arm64: kbuild: make "make install" not depend on vmlinux
        arm64: dma-mapping: fix handling of devices registered before arch_initcall
        arm64/efi: Make strnlen() available to the EFI namespace
        arm/arm64: crypto: assure that ECB modes don't require an IV
        arm64: make irq_stack_ptr more robust
        arm64: debug: re-enable irqs before sending breakpoint SIGTRAP
        arm64: disable kasan when accessing frame->fp in unwind_frame
      23300f65
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · ff5f1682
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
       "Several bug fixes:
      
         - There are four different stack tracers, and three of them have
           bugs.  For 4.5 the bugs are fixed and we prepare a cleanup patch
           for the next merge window.
      
         - Three bug fixes for the dasd driver in regard to parallel access
           volumes and the new max_dev_sectors block device queue limit
      
         - The irq restore optimization needs a fixup for memcpy_real
      
         - The diagnose trace code has a conflict with lockdep"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/dasd: fix performance drop
        s390/maccess: reduce stnsm instructions
        s390/diag: avoid lockdep recursion
        s390/dasd: fix refcount for PAV reassignment
        s390/dasd: prevent incorrect length error under z/VM after PAV changes
        s390: fix DAT off memory access, e.g. on kdump
        s390/oprofile: fix address range for asynchronous stack
        s390/perf_event: fix address range for asynchronous stack
        s390/stacktrace: add save_stack_trace_regs()
        s390/stacktrace: save full stack traces
        s390/stacktrace: add missing end marker
        s390/stacktrace: fix address ranges for asynchronous and panic stack
        s390/stacktrace: fix save_stack_trace_tsk() for current task
      ff5f1682
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 409ee136
      Linus Torvalds authored
      Pull Pin control fixes from Linus Walleij:
       "Pin control fixes for the v4.5 series, all are individual driver
        fixes:
      
         - Fix the PXA2xx driver to export its init function so we do not
           break modular compiles.
         - Hide unused functions in the Nomadik driver.
         - Fix up direction control in the Mediatek driver.
         - Toggle the sunxi GPIO lines to input when you read them on the H3
           GPIO controller, lest you only get garbage.
         - Fix up the number of settings in the MVEBU driver.
         - Fix a serious SMP race condition in the Samsung driver"
      
      * tag 'pinctrl-v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: samsung: fix SMP race condition
        pinctrl: mvebu: fix num_settings in mpp group assignment
        pinctrl: sunxi: H3 requires irq_read_needs_mux
        pinctrl: mediatek: fix direction control issue
        pinctrl: nomadik: hide unused functions
        pinctrl: pxa: export pxa2xx_pinctrl_init()
      409ee136
    • Linus Torvalds's avatar
      Merge tag 'sound-4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 9001b8e4
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This update contains again a few more fixes for ALSA core stuff
        although it's no longer high flux: two race fixes in sequencer and one
        PCM race fix for non-atomic PCM ops.
      
        In addition, HD-audio gained a similar fix for race at reloading the
        driver"
      
      * tag 'sound-4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream
        ALSA: seq: Fix double port list deletion
        ALSA: hda - Cancel probe work instead of flush at remove
        ALSA: seq: Fix leak of pool buffer at concurrent writes
      9001b8e4
    • EunTaik Lee's avatar
      arm64: mm: allow the kernel to handle alignment faults on user accesses · 52d7523d
      EunTaik Lee authored
      Although we don't expect to take alignment faults on access to normal
      memory, misbehaving (i.e. buggy) user code can pass MMIO pointers into
      system calls, leading to things like get_user accessing device memory.
      
      Rather than OOPS the kernel, allow any exception fixups to run and
      return something like -EFAULT back to userspace. This makes the
      behaviour more consistent with userspace, even though applications with
      access to device mappings can easily cause other issues if they try
      hard enough.
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarEun Taik Lee <eun.taik.lee@samsung.com>
      [will: dropped __kprobes annotation and rewrote commit mesage]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      52d7523d
    • Masahiro Yamada's avatar
      arm64: kbuild: make "make install" not depend on vmlinux · 8684fa3e
      Masahiro Yamada authored
      For the same reason as commit 19514fc6 ("arm, kbuild: make "make
      install" not depend on vmlinux"), the install targets should never
      trigger the rebuild of the kernel.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      8684fa3e
    • Jan Kara's avatar
      ext4: fix crashes in dioread_nolock mode · 74dae427
      Jan Kara authored
      Competing overwrite DIO in dioread_nolock mode will just overwrite
      pointer to io_end in the inode. This may result in data corruption or
      extent conversion happening from IO completion interrupt because we
      don't properly set buffer_defer_completion() when unlocked DIO races
      with locked DIO to unwritten extent.
      
      Since unlocked DIO doesn't need io_end for anything, just avoid
      allocating it and corrupting pointer from inode for locked DIO.
      A cleaner fix would be to avoid these games with io_end pointer from the
      inode but that requires more intrusive changes so we leave that for
      later.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      74dae427
    • Jan Kara's avatar
      ext4: fix bh->b_state corruption · ed8ad838
      Jan Kara authored
      ext4 can update bh->b_state non-atomically in _ext4_get_block() and
      ext4_da_get_block_prep(). Usually this is fine since bh is just a
      temporary storage for mapping information on stack but in some cases it
      can be fully living bh attached to a page. In such case non-atomic
      update of bh->b_state can race with an atomic update which then gets
      lost. Usually when we are mapping bh and thus updating bh->b_state
      non-atomically, nobody else touches the bh and so things work out fine
      but there is one case to especially worry about: ext4_finish_bio() uses
      BH_Uptodate_Lock on the first bh in the page to synchronize handling of
      PageWriteback state. So when blocksize < pagesize, we can be atomically
      modifying bh->b_state of a buffer that actually isn't under IO and thus
      can race e.g. with delalloc trying to map that buffer. The result is
      that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
      corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
      
      Fix the problem by always updating bh->b_state bits atomically.
      
      CC: stable@vger.kernel.org
      Reported-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ed8ad838
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching · 705d43db
      Linus Torvalds authored
      Pull livepatching fixes from Jiri Kosina:
      
       - regression (from 4.4) fix for ordering issue, introduced by an
         earlier ftrace change, that broke live patching of modules.
      
         The fix replaces the ftrace module notifier by direct call in order
         to make the ordering guaranteed and well-defined.  The patch, from
         Jessica Yu, has been acked both by Steven and Rusty
      
       - error message fix from Miroslav Benes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
        ftrace/module: remove ftrace module notifier
        livepatch: change the error message in asm/livepatch.h header files
      705d43db
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · dd8fc10e
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two simple fixes.
      
        One prevents a soft lockup on some target removal scenarios and the
        other prevents us trying to probe the marvell console device, which
        causes it to time out and need the bus resetting"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: fix soft lockup in scsi_remove_target() on module removal
        SCSI: Add Marvell configuration device to VPD blacklist
      dd8fc10e
    • Dmitry Safonov's avatar
      mm: slab: free kmem_cache_node after destroy sysfs file · 52b4b950
      Dmitry Safonov authored
      When slub_debug alloc_calls_show is enabled we will try to track
      location and user of slab object on each online node, kmem_cache_node
      structure and cpu_cache/cpu_slub shouldn't be freed till there is the
      last reference to sysfs file.
      
      This fixes the following panic:
      
         BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
         IP:  list_locations+0x169/0x4e0
         PGD 257304067 PUD 438456067 PMD 0
         Oops: 0000 [#1] SMP
         CPU: 3 PID: 973074 Comm: cat ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.30-00007-japdoll-dirty #2 9.30
         Hardware name: DEPO Computers To Be Filled By O.E.M./H67DE3, BIOS L1.60c 07/14/2011
         task: ffff88042a5dc5b0 ti: ffff88037f8d8000 task.ti: ffff88037f8d8000
         RIP: list_locations+0x169/0x4e0
         Call Trace:
           alloc_calls_show+0x1d/0x30
           slab_attr_show+0x1b/0x30
           sysfs_read_file+0x9a/0x1a0
           vfs_read+0x9c/0x170
           SyS_read+0x58/0xb0
           system_call_fastpath+0x16/0x1b
         Code: 5e 07 12 00 b9 00 04 00 00 3d 00 04 00 00 0f 4f c1 3d 00 04 00 00 89 45 b0 0f 84 c3 00 00 00 48 63 45 b0 49 8b 9c c4 f8 00 00 00 <48> 8b 43 20 48 85 c0 74 b6 48 89 df e8 46 37 44 00 48 8b 53 10
         CR2: 0000000000000020
      
      Separated __kmem_cache_release from __kmem_cache_shutdown which now
      called on slab_kmem_cache_release (after the last reference to sysfs
      file object has dropped).
      
      Reintroduced locking in free_partial as sysfs file might access cache's
      partial list after shutdowning - partial revert of the commit
      69cb8e6b ("slub: free slabs without holding locks").  Zap
      __remove_partial and use remove_partial (w/o underscores) as
      free_partial now takes list_lock which s partial revert for commit
      1e4dd946 ("slub: do not assert not having lock in removing freed
      partial")
      Signed-off-by: default avatarDmitry Safonov <dsafonov@virtuozzo.com>
      Suggested-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52b4b950
    • Kirill A. Shutemov's avatar
      ipc/shm: handle removed segments gracefully in shm_mmap() · 1ac0b6de
      Kirill A. Shutemov authored
      remap_file_pages(2) emulation can reach file which represents removed
      IPC ID as long as a memory segment is mapped.  It breaks expectations of
      IPC subsystem.
      
      Test case (rewritten to be more human readable, originally autogenerated
      by syzkaller[1]):
      
      	#define _GNU_SOURCE
      	#include <stdlib.h>
      	#include <sys/ipc.h>
      	#include <sys/mman.h>
      	#include <sys/shm.h>
      
      	#define PAGE_SIZE 4096
      
      	int main()
      	{
      		int id;
      		void *p;
      
      		id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
      		p = shmat(id, NULL, 0);
      		shmctl(id, IPC_RMID, NULL);
      		remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);
      
      	        return 0;
      	}
      
      The patch changes shm_mmap() and code around shm_lock() to propagate
      locking error back to caller of shm_mmap().
      
      [1] http://github.com/google/syzkallerSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ac0b6de
    • Shuah Khan's avatar
      MAINTAINERS: update Kselftest Framework mailing list · 64f00850
      Shuah Khan authored
      Kselftest Framework now has a dedicated mailing list linux-kselftest.
      Update the entry in MAINTAINERS file.
      Signed-off-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64f00850
    • Toshi Kani's avatar
      devm_memremap_release(): fix memremap'd addr handling · 9273a8bb
      Toshi Kani authored
      The pmem driver calls devm_memremap() to map a persistent memory range.
      When the pmem driver is unloaded, this memremap'd range is not released
      so the kernel will leak a vma.
      
      Fix devm_memremap_release() to handle a given memremap'd address
      properly.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9273a8bb
    • Vaishali Thakkar's avatar
      mm/hugetlb.c: fix incorrect proc nr_hugepages value · f8b74815
      Vaishali Thakkar authored
      Currently incorrect default hugepage pool size is reported by proc
      nr_hugepages when number of pages for the default huge page size is
      specified twice.
      
      When multiple huge page sizes are supported, /proc/sys/vm/nr_hugepages
      indicates the current number of pre-allocated huge pages of the default
      size.  Basically /proc/sys/vm/nr_hugepages displays default_hstate->
      max_huge_pages and after boot time pre-allocation, max_huge_pages should
      equal the number of pre-allocated pages (nr_hugepages).
      
      Test case:
      
      Note that this is specific to x86 architecture.
      
      Boot the kernel with command line option 'default_hugepagesz=1G
      hugepages=X hugepagesz=2M hugepages=Y hugepagesz=1G hugepages=Z'.  After
      boot, 'cat /proc/sys/vm/nr_hugepages' and 'sysctl -a | grep hugepages'
      returns the value X.  However, dmesg output shows that Z huge pages were
      pre-allocated.
      
      So, the root cause of the problem here is that the global variable
      default_hstate_max_huge_pages is set if a default huge page size is
      specified (directly or indirectly) on the command line.  After the command
      line processing in hugetlb_init, if default_hstate_max_huge_pages is set,
      the value is assigned to default_hstae.max_huge_pages.  However,
      default_hstate.max_huge_pages may have already been set based on the
      number of pre-allocated huge pages of default_hstate size.
      
      The solution to this problem is if hstate->max_huge_pages is already set
      then it should not set as a result of global max_huge_pages value.
      Basically if the value of the variable hugepages is set multiple times on
      a command line for a specific supported hugepagesize then proc layer
      should consider the last specified value.
      Signed-off-by: default avatarVaishali Thakkar <vaishali.thakkar@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8b74815
    • Hugh Dickins's avatar
      mm, x86: fix pte_page() crash in gup_pte_range() · 457a98b0
      Hugh Dickins authored
      Commit 3565fce3 ("mm, x86: get_user_pages() for dax mappings") has
      moved up the pte_page(pte) in x86's fast gup_pte_range(), for no
      discernible reason: put it back where it belongs, after the pte_flags
      check and the pfn_valid cross-check.
      
      That may be the cause of the NULL pointer dereference in
      gup_pte_range(), seen when vfio called vaddr_get_pfn() when starting a
      qemu-kvm based VM.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarMichael Long <Harn-Solo@gmx.de>
      Tested-by: default avatarMichael Long <Harn-Solo@gmx.de>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      457a98b0
    • Jeff Layton's avatar
      fsnotify: turn fsnotify reaper thread into a workqueue job · 0918f1c3
      Jeff Layton authored
      We don't require a dedicated thread for fsnotify cleanup.  Switch it
      over to a workqueue job instead that runs on the system_unbound_wq.
      
      In the interest of not thrashing the queued job too often when there are
      a lot of marks being removed, we delay the reaper job slightly when
      queueing it, to allow several to gather on the list.
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Tested-by: default avatarEryu Guan <guaneryu@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0918f1c3
    • Jeff Layton's avatar
      Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" · 13d34ac6
      Jeff Layton authored
      This reverts commit c510eff6 ("fsnotify: destroy marks with
      call_srcu instead of dedicated thread").
      
      Eryu reported that he was seeing some OOM kills kick in when running a
      testcase that adds and removes inotify marks on a file in a tight loop.
      
      The above commit changed the code to use call_srcu to clean up the
      marks.  While that does (in principle) work, the srcu callback job is
      limited to cleaning up entries in small batches and only once per jiffy.
      It's easily possible to overwhelm that machinery with too many call_srcu
      callbacks, and Eryu's reproduer did just that.
      
      There's also another potential problem with using call_srcu here.  While
      you can obviously sleep while holding the srcu_read_lock, the callbacks
      run under local_bh_disable, so you can't sleep there.
      
      It's possible when putting the last reference to the fsnotify_mark that
      we'll end up putting a chain of references including the fsnotify_group,
      uid, and associated keys.  While I don't see any obvious ways that that
      could occurs, it's probably still best to avoid using call_srcu here
      after all.
      
      This patch reverts the above patch.  A later patch will take a different
      approach to eliminated the dedicated thread here.
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Reported-by: default avatarEryu Guan <guaneryu@gmail.com>
      Tested-by: default avatarEryu Guan <guaneryu@gmail.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13d34ac6
    • Kirill A. Shutemov's avatar
      mm: fix regression in remap_file_pages() emulation · 48f7df32
      Kirill A. Shutemov authored
      Grazvydas Ignotas has reported a regression in remap_file_pages()
      emulation.
      
      Testcase:
      	#define _GNU_SOURCE
      	#include <assert.h>
      	#include <stdlib.h>
      	#include <stdio.h>
      	#include <sys/mman.h>
      
      	#define SIZE    (4096 * 3)
      
      	int main(int argc, char **argv)
      	{
      		unsigned long *p;
      		long i;
      
      		p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
      				MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      		if (p == MAP_FAILED) {
      			perror("mmap");
      			return -1;
      		}
      
      		for (i = 0; i < SIZE / 4096; i++)
      			p[i * 4096 / sizeof(*p)] = i;
      
      		if (remap_file_pages(p, 4096, 0, 1, 0)) {
      			perror("remap_file_pages");
      			return -1;
      		}
      
      		if (remap_file_pages(p, 4096 * 2, 0, 1, 0)) {
      			perror("remap_file_pages");
      			return -1;
      		}
      
      		assert(p[0] == 1);
      
      		munmap(p, SIZE);
      
      		return 0;
      	}
      
      The second remap_file_pages() fails with -EINVAL.
      
      The reason is that remap_file_pages() emulation assumes that the target
      vma covers whole area we want to over map.  That assumption is broken by
      first remap_file_pages() call: it split the area into two vma.
      
      The solution is to check next adjacent vmas, if they map the same file
      with the same flags.
      
      Fixes: c8d78c18 ("mm: replace remap_file_pages() syscall with emulation")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Tested-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.0+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48f7df32
    • Kirill A. Shutemov's avatar
      thp, dax: do not try to withdraw pgtable from non-anon VMA · 69a8ec2d
      Kirill A. Shutemov authored
      DAX doesn't deposit pgtables when it maps huge pages: nothing to
      withdraw. It can lead to crash.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69a8ec2d
  3. 18 Feb, 2016 1 commit
    • Takashi Iwai's avatar
      ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream · 67ec1072
      Takashi Iwai authored
      A non-atomic PCM stream may take snd_pcm_link_rwsem rw semaphore twice
      in the same code path, e.g. one in snd_pcm_action_nonatomic() and
      another in snd_pcm_stream_lock().  Usually this is OK, but when a
      write lock is issued between these two read locks, the problem
      happens: the write lock is blocked due to the first reade lock, and
      the second read lock is also blocked by the write lock.  This
      eventually deadlocks.
      
      The reason is the way rwsem manages waiters; it's queued like FIFO, so
      even if the writer itself doesn't take the lock yet, it blocks all the
      waiters (including reads) queued after it.
      
      As a workaround, in this patch, we replace the standard down_write()
      with an spinning loop.  This is far from optimal, but it's good
      enough, as the spinning time is supposed to be relatively short for
      normal PCM operations, and the code paths requiring the write lock
      aren't called so often.
      Reported-by: default avatarVinod Koul <vinod.koul@intel.com>
      Tested-by: default avatarRamesh Babu <ramesh.babu@intel.com>
      Cc: <stable@vger.kernel.org> # v3.18+
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      67ec1072
  4. 17 Feb, 2016 15 commits
    • Jessica Yu's avatar
      ftrace/module: remove ftrace module notifier · 7dcd182b
      Jessica Yu authored
      Remove the ftrace module notifier in favor of directly calling
      ftrace_module_enable() and ftrace_release_mod() in the module loader.
      Hard-coding the function calls directly in the module loader removes
      dependence on the module notifier call chain and provides better
      visibility and control over what gets called when, which is important
      to kernel utilities such as livepatch.
      
      This fixes a notifier ordering issue in which the ftrace module notifier
      (and hence ftrace_module_enable()) for coming modules was being called
      after klp_module_notify(), which caused livepatch modules to initialize
      incorrectly. This patch removes dependence on the module notifier call
      chain in favor of hard coding the corresponding function calls in the
      module loader. This ensures that ftrace and livepatch code get called in
      the correct order on patch module load and unload.
      
      Fixes: 5156dca3 ("ftrace: Fix the race between ftrace and insmod")
      Signed-off-by: default avatarJessica Yu <jeyu@redhat.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.cz>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      7dcd182b
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 28507135
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A collection of fixes from the past few weeks that should go into 4.5.
        This contains:
      
         - Overflow fix for sysfs discard show function from Alan.
      
         - A stacking limit init fix for max_dev_sectors, so we don't end up
           artificially capping some use cases.  From Keith.
      
         - Have blk-mq proper end unstarted requests on a dying queue, instead
           of pushing that to the driver.  From Keith.
      
         - NVMe:
              - Update to Kconfig description for NVME_SCSI, since it was
                vague and having it on is important for some SUSE distros.
                From Christoph.
              - Set of fixes from Keith, around surprise removal. Also kills
                the no-merge flag, so it supports merging.
      
         - Set of fixes for lightnvm from Matias, Javier, and Wenwei.
      
         - Fix null_blk oops when asked for lightnvm, but not available.  From
           Matias.
      
         - Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails
           if interrupted by a signal.
      
         - Two floppy fixes from Jiri, fixing signal handling and blocking
           open.
      
         - A use-after-free fix for O_DIRECT, from Mike Krinkin.
      
         - A block module ref count fix from Roman Pen.
      
         - An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini.
      
         - Smaller reallo fix for xen-blkfront from Bob Liu.
      
         - Removal of an unused struct member in the deadline IO scheduler,
           from Tahsin.
      
         - Also from Tahsin, properly initialize inode struct members
           associated with cgroup writeback, if enabled.
      
         - From Tejun, ensure that we keep the superblock pinned during cgroup
           writeback"
      
      * 'for-linus' of git://git.kernel.dk/linux-block: (25 commits)
        blk: fix overflow in queue_discard_max_hw_show
        writeback: initialize inode members that track writeback history
        writeback: keep superblock pinned during cgroup writeback association switches
        bio: return EINTR if copying to user space got interrupted
        NVMe: Rate limit nvme IO warnings
        NVMe: Poll device while still active during remove
        NVMe: Requeue requests on suspended queues
        NVMe: Allow request merges
        NVMe: Fix io incapable return values
        blk-mq: End unstarted requests on dying queue
        block: Initialize max_dev_sectors to 0
        null_blk: oops when initializing without lightnvm
        block: fix module reference leak on put_disk() call for cgroups throttle
        nvme: fix Kconfig description for BLK_DEV_NVME_SCSI
        kernel/fs: fix I/O wait not accounted for RW O_DSYNC
        floppy: refactor open() flags handling
        lightnvm: allow to force mm initialization
        lightnvm: check overflow and correct mlc pairs
        lightnvm: fix request intersection locking in rrpc
        lightnvm: warn if irqs are disabled in lock laddr
        ...
      28507135
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · c28b947d
      Linus Torvalds authored
      Pull DeviceTree fixes from Rob Herring:
      
       - Fix irq msi-map calculation for nonzero rid-base.
      
       - Binding doc updates for GICv3, fsl-imx-uart, and S3C RTC.
      
      * tag 'devicetree-fixes-for-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        rtc: s3c: Document required clocks in the DT binding
        serial: fsl-imx-uart: Fix typo in fsl,dte-mode description
        dt-bindings: arm, gic-v3: require that reserved cells are always 0
        of/irq: Fix msi-map calculation for nonzero rid-base
      c28b947d
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 35683dd3
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "This has two main sets of fixes:
      
         - A bunch of Exynos fixes, mainly for their MIC component.
      
         - vblank regression fixes from Mario, apparantly some changes in 4.4
           caused some vblank breakage on radeon/nouveau, this set fixes all
           the issues seen.
      
        There is also a revert of one of the MST changse, that I was
        overzealous in including, that broke 30" MST monitors, and two qxl
        fixes"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/qxl: fix erroneous return value
        drm/nouveau/display: Enable vblank irqs after display engine is on again.
        drm/radeon/pm: Handle failure of drm_vblank_get.
        drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2)
        drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
        drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2)
        drm: No-Op redundant calls to drm_vblank_off() (v2)
        drm/qxl: use kmalloc_array to alloc reloc_info in qxl_process_single_command
        Revert "drm/dp/mst: change MST detection scheme"
        drm/exynos/decon: fix disable clocks order
        drm/exynos: fix incorrect cpu address for dma_mmap_attrs()
        drm/exynos: exynos5433_decon: fix wrong state in decon_vblank_enable
        drm/exynos: exynos5433_decon: fix wrong state assignment in decon_enable
        drm/exynos: dsi: restore support for drm bridge
        drm/exynos: mic: make all functions static
        drm/exynos: mic: convert to component framework
        drm/exynos: mic: use devm_clk interface
        drm/exynos: fix types for compilation on 64bit architectures
        drm/exynos: ipp: fix incorrect format specifiers in debug messages
        drm/exynos: depend on ARCH_EXYNOS for DRM_EXYNOS
      35683dd3
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v4.5-rc4' of... · a9f70bd4
      Linus Torvalds authored
      Merge tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing fixes from Steven Rostedt:
       "This includes two fixes.
      
        The first is something that has come up a few times and has been
        worked out individually, but it's come up now enough that the problem
        should be generic.  Tracepoints are protected by RCU sched.  There are
        several tracepoints within core infrastructure like kfree().  If a
        tracepoint is called when the CPU is going down, or when it's coming
        up but has yet to be recognized by RCU, a RCU warning is triggered.
      
        This is a true bug as that tracepoint is not protected by RCU.
        Usually, this is taken care of by testing for cpu online as a
        tracepoint condition.  But as this is happening more often, moving it
        from a individual tracepoint to a check in the tracepoint
        infrastructure is more robust.
      
        Note, there is now a duplicate of a cpu online test, because this
        update does not remove the individual checks.  But the overhead is
        small enough that the removal can be done in another release.
      
        The second change is strange linker breakage due to the branch
        tracer's builtin_constant_p() check failing, and treating the
        condition as a variable instead of a constant.  Arnd Bergmann found
        that this can be fixed by testing !!(cond) instead of just (cond)"
      
      * tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Fix freak link error caused by branch tracer
        tracepoints: Do not trace when cpu is offline
      a9f70bd4
    • Alan's avatar
      blk: fix overflow in queue_discard_max_hw_show · 18f922d0
      Alan authored
      We get this right for queue_discard_max_show but not max_hw_show. Follow the
      same pattern as queue_discard_max_show instead so that we don't truncate.
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      18f922d0
    • Marek Szyprowski's avatar
      arm64: dma-mapping: fix handling of devices registered before arch_initcall · 722ec35f
      Marek Szyprowski authored
      This patch ensures that devices, which got registered before arch_initcall
      will be handled correctly by IOMMU-based DMA-mapping code.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 13b8629f ("arm64: Add IOMMU dma_ops")
      Acked-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      722ec35f
    • Stefan Haberland's avatar
      s390/dasd: fix performance drop · 12d319b9
      Stefan Haberland authored
      Commit ca369d51 ("sd: Fix device-imposed transfer length limits")
      introduced a new queue limit max_dev_sectors which limits the maximum
      sectors for requests. The default value leads to small dasd requests
      and therefor to a performance drop.
      Set the max_dev_sectors value to the same value as the max_hw_sectors
      to use the maximum available request size for DASD devices.
      Signed-off-by: default avatarStefan Haberland <sth@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      12d319b9
    • Heiko Carstens's avatar
      s390/maccess: reduce stnsm instructions · 52499d93
      Heiko Carstens authored
      When fixing the DAT off bug ("s390: fix DAT off memory access, e.g.
      on kdump") both Christian and I missed that we can save an additional
      stnsm instruction.
      
      This saves us a couple of cycles which could improve the speed of
      memcpy_real.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      52499d93
    • Anton Protopopov's avatar
      drm/qxl: fix erroneous return value · dada168b
      Anton Protopopov authored
      The qxl_gem_prime_mmap() function returns ENOSYS instead of -ENOSYS
      Signed-off-by: default avatarAnton Protopopov <a.s.protopopov@gmail.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      dada168b
    • Mario Kleiner's avatar
      drm/nouveau/display: Enable vblank irqs after display engine is on again. · ff683df7
      Mario Kleiner authored
      In the display resume path, move the calls to drm_vblank_on()
      after the point when the display engine is running again.
      
      Since changes were made to drm_update_vblank_count() in Linux 4.4+
      to emulate hw vblank counters via vblank timestamping, the function
      drm_vblank_on() now needs working high precision vblank timestamping
      and therefore working scanout position queries at time of call.
      These don't work before the display engine gets restarted, causing
      miscalculation of vblank counter increments and thereby large forward
      jumps in vblank count at display resume. These jumps can cause client
      hangs on resume, or desktop hangs in the case of composited desktops.
      
      Fix this Linux 4.4 regression by reordering calls accordingly.
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Cc: <stable@vger.kernel.org> # 4.4+
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: ville.syrjala@linux.intel.com
      Cc: daniel.vetter@ffwll.ch
      Cc: dri-devel@lists.freedesktop.org
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      ff683df7
    • Mario Kleiner's avatar
      drm/radeon/pm: Handle failure of drm_vblank_get. · e0b34e38
      Mario Kleiner authored
      Make sure that drm_vblank_get/put() stay balanced in
      case drm_vblank_get fails, by skipping the corresponding
      put.
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: michel@daenzer.net
      Cc: dri-devel@lists.freedesktop.org
      Cc: alexander.deucher@amd.com
      Cc: christian.koenig@amd.com
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      e0b34e38
    • Mario Kleiner's avatar
      drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2) · bb74fc1b
      Mario Kleiner authored
      drm_vblank_offdelay can have three different types of values:
      
      < 0 is to be always treated the same as dev->vblank_disable_immediate
      = 0 is to be treated as "never disable vblanks"
      > 0 is to be treated as disable immediate if kms driver wants it
          that way via dev->vblank_disable_immediate. Otherwise it is
          a disable timeout in msecs.
      
      This got broken in Linux 3.18+ for the implementation of
      drm_vblank_on. If the user specified a value of zero which should
      always reenable vblank irqs in this function, a kms driver could
      override the users choice by setting vblank_disable_immediate
      to true. This patch fixes the regression and keeps the user in
      control.
      
      v2: Only reenable vblank if there are clients left or the user
          requested to "never disable vblanks" via offdelay 0. Enabling
          vblanks even in the "delayed disable" case (offdelay > 0) was
          specifically added by Ville in commit cd19e52a
          ("drm: Kick start vblank interrupts at drm_vblank_on()"),
          but after discussion it turns out that this was done by accident.
      
          Citing Ville: "I think it just ended up as a mess due to changing
          some of the semantics of offdelay<0 vs. offdelay==0 vs.
          disable_immediate during the review of the series. So yeah, given
          how drm_vblank_put() works now, I'd just make this check for
          offdelay==0."
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      
      Cc: <stable@vger.kernel.org> # 3.18+
      Cc: michel@daenzer.net
      Cc: vbabka@suse.cz
      Cc: ville.syrjala@linux.intel.com
      Cc: daniel.vetter@ffwll.ch
      Cc: dri-devel@lists.freedesktop.org
      Cc: alexander.deucher@amd.com
      Cc: christian.koenig@amd.com
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      bb74fc1b
    • Mario Kleiner's avatar
      drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 · c61934ed
      Mario Kleiner authored
      Changes to drm_update_vblank_count() in Linux 4.4 broke the
      behaviour of the pre/post modeset functions as the new update
      code doesn't deal with hw vblank counter resets inbetween calls
      to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
      should.
      
      This causes mistreatment of such hw counter resets as counter
      wraparound, and thereby large forward jumps of the software
      vblank counter which in turn cause vblank event dispatching
      and vblank waits to fail/hang --> userspace clients hang.
      
      This symptom was reported on radeon-kms to cause a infinite
      hang of KDE Plasma 5 shell's login procedure, preventing users
      from logging in.
      
      Fix this by detecting when drm_update_vblank_count() is called
      inside a pre->post modeset interval. If so, clamp valid vblank
      increments to the safe values 0 and 1, pretty much restoring
      the update behavior of the old update code of Linux 4.3 and
      earlier. Also reset the last recorded hw vblank count at call
      to drm_vblank_post_modeset() to be safe against hw that after
      modesetting, dpms on etc. only fires its first vblank irq after
      drm_vblank_post_modeset() was already called.
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Tested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      
      Cc: <stable@vger.kernel.org> # 4.4+
      Cc: michel@daenzer.net
      Cc: vbabka@suse.cz
      Cc: ville.syrjala@linux.intel.com
      Cc: daniel.vetter@ffwll.ch
      Cc: dri-devel@lists.freedesktop.org
      Cc: alexander.deucher@amd.com
      Cc: christian.koenig@amd.com
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      c61934ed
    • Mario Kleiner's avatar
      drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2) · 99b8e715
      Mario Kleiner authored
      This fixes a regression introduced by the new drm_update_vblank_count()
      implementation in Linux 4.4:
      
      Restrict the bump of the software vblank counter in drm_update_vblank_count()
      to a safe maximum value of +1 whenever there is the possibility that
      concurrent readers of vblank timestamps could be active at the moment,
      as the current implementation of the timestamp caching and updating is
      not safe against concurrent readers for calls to store_vblank() with a
      bump of anything but +1. A bump != 1 would very likely return corrupted
      timestamps to userspace, because the same slot in the cache could
      be concurrently written by store_vblank() and read by one of those
      readers in a non-atomic fashion and without the read-retry logic
      detecting this collision.
      
      Concurrent readers can exist while drm_update_vblank_count() is called
      from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
      irq callers. However, all those calls are happening with the vbl_lock
      locked thereby preventing a drm_vblank_get(), so the vblank refcount
      can't increase while drm_update_vblank_count() is executing. Therefore
      a zero vblank refcount during execution of that function signals that
      is safe for arbitrary counter bumps if called from outside vblank irq,
      whereas a non-zero count is not safe.
      
      Whenever the function is called from vblank irq, we have to assume concurrent
      readers could show up any time during its execution, even if the refcount
      is currently zero, as vblank irqs are usually only enabled due to the
      presence of readers, and because when it is called from vblank irq it
      can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
      Therefore also restrict bumps to +1 when the function is called from vblank
      irq.
      
      Such bumps of more than +1 can happen at other times than reenabling
      vblank irqs, e.g., when regular vblank interrupts get delayed by more
      than 1 frame due to long held locks, long irq off periods, realtime
      preemption on RT kernels, or system management interrupts.
      
      A better solution would be to rewrite the timestamp caching to use
      full seqlocks to allow concurrent writes and reads for arbitrary
      vblank counter increments.
      
      v2: Add code comment that this is essentially a hack and should
          be replaced by a full seqlock implementation for caching of
          timestamps.
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      
      Cc: <stable@vger.kernel.org> # 4.4+
      Cc: michel@daenzer.net
      Cc: vbabka@suse.cz
      Cc: ville.syrjala@linux.intel.com
      Cc: daniel.vetter@ffwll.ch
      Cc: dri-devel@lists.freedesktop.org
      Cc: alexander.deucher@amd.com
      Cc: christian.koenig@amd.com
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      99b8e715