1. 17 May, 2017 20 commits
  2. 16 May, 2017 6 commits
    • Pan Bian's avatar
      usb: dwc3: keystone: check return value · 018047a1
      Pan Bian authored
      Function devm_clk_get() returns an ERR_PTR when it fails. However, in
      function kdwc3_probe(), its return value is not checked, which may
      result in a bad memory access bug. This patch fixes the bug.
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      018047a1
    • William Wu's avatar
      usb: gadget: f_fs: avoid out of bounds access on comp_desc · b7f73850
      William Wu authored
      Companion descriptor is only used for SuperSpeed endpoints,
      if the endpoints are HighSpeed or FullSpeed, the Companion
      descriptor will not allocated, so we can only access it if
      gadget is SuperSpeed.
      
      I can reproduce this issue on Rockchip platform rk3368 SoC
      which supports USB 2.0, and use functionfs for ADB. Kernel
      build with CONFIG_KASAN=y and CONFIG_SLUB_DEBUG=y report
      the following BUG:
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in ffs_func_set_alt+0x224/0x3a0 at addr ffffffc0601f6509
      Read of size 1 by task swapper/0/0
      ============================================================================
      BUG kmalloc-256 (Not tainted): kasan: bad access detected
      ----------------------------------------------------------------------------
      
      Disabling lock debugging due to kernel taint
      INFO: Allocated in ffs_func_bind+0x52c/0x99c age=1275 cpu=0 pid=1
      alloc_debug_processing+0x128/0x17c
      ___slab_alloc.constprop.58+0x50c/0x610
      __slab_alloc.isra.55.constprop.57+0x24/0x34
      __kmalloc+0xe0/0x250
      ffs_func_bind+0x52c/0x99c
      usb_add_function+0xd8/0x1d4
      configfs_composite_bind+0x48c/0x570
      udc_bind_to_driver+0x6c/0x170
      usb_udc_attach_driver+0xa4/0xd0
      gadget_dev_desc_UDC_store+0xcc/0x118
      configfs_write_file+0x1a0/0x1f8
      __vfs_write+0x64/0x174
      vfs_write+0xe4/0x200
      SyS_write+0x68/0xc8
      el0_svc_naked+0x24/0x28
      INFO: Freed in inode_doinit_with_dentry+0x3f0/0x7c4 age=1275 cpu=7 pid=247
      ...
      Call trace:
      [<ffffff900808aab4>] dump_backtrace+0x0/0x230
      [<ffffff900808acf8>] show_stack+0x14/0x1c
      [<ffffff90084ad420>] dump_stack+0xa0/0xc8
      [<ffffff90082157cc>] print_trailer+0x188/0x198
      [<ffffff9008215948>] object_err+0x3c/0x4c
      [<ffffff900821b5ac>] kasan_report+0x324/0x4dc
      [<ffffff900821aa38>] __asan_load1+0x24/0x50
      [<ffffff90089eb750>] ffs_func_set_alt+0x224/0x3a0
      [<ffffff90089d3760>] composite_setup+0xdcc/0x1ac8
      [<ffffff90089d7394>] android_setup+0x124/0x1a0
      [<ffffff90089acd18>] _setup+0x54/0x74
      [<ffffff90089b6b98>] handle_ep0+0x3288/0x4390
      [<ffffff90089b9b44>] dwc_otg_pcd_handle_out_ep_intr+0x14dc/0x2ae4
      [<ffffff90089be85c>] dwc_otg_pcd_handle_intr+0x1ec/0x298
      [<ffffff90089ad680>] dwc_otg_pcd_irq+0x10/0x20
      [<ffffff9008116328>] handle_irq_event_percpu+0x124/0x3ac
      [<ffffff9008116610>] handle_irq_event+0x60/0xa0
      [<ffffff900811af30>] handle_fasteoi_irq+0x10c/0x1d4
      [<ffffff9008115568>] generic_handle_irq+0x30/0x40
      [<ffffff90081159b4>] __handle_domain_irq+0xac/0xdc
      [<ffffff9008080e9c>] gic_handle_irq+0x64/0xa4
      ...
      Memory state around the buggy address:
        ffffffc0601f6400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ffffffc0601f6480: 00 00 00 00 00 00 00 00 00 00 06 fc fc fc fc fc
       >ffffffc0601f6500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                             ^
        ffffffc0601f6580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffffffc0601f6600: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
      ==================================================================
      Signed-off-by: default avatarWilliam Wu <william.wu@rock-chips.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      b7f73850
    • Bogdan Mirea's avatar
      usb: gadget: gserial: check if console kthread exists · 844cf8a9
      Bogdan Mirea authored
      Check for bad pointer that may result because of kthread_create failure.
      This check is needed since the gserial setup callback function
      (gs_console_setup()) is only freeing the info->con_buf in case of
      kthread_create failure which will result into bad info->console_thread
      pointer.
      Without checking info->console_thread pointer validity in the
      gserial_console_exit() function, before calling kthread_stop(), the
      rmmod will generate Kernel Oops.
      Signed-off-by: default avatarBogdan Mirea <Bogdan-Stefan_mirea@mentor.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      844cf8a9
    • Thinh Nguyen's avatar
      usb: dwc3: gadget: Prevent losing events in event cache · d325a1de
      Thinh Nguyen authored
      The dwc3 driver can overwite its previous events if its top-half IRQ
      handler (TH) gets invoked again before processing the events in the
      cache. We see this as a hang in the file transfer and the host will
      attempt to reset the device. TH gets the event count and deasserts the
      interrupt line by writing DWC3_GEVNTSIZ_INTMASK to DWC3_GEVNTSIZ. If
      there's a new event coming between reading the event count and interrupt
      deassertion, dwc3 will lose previous pending events. More generally, we
      will see 0 event count, which should not affect anything.
      
      This shouldn't be possible in the current dwc3 implementation. However,
      through testing and reading the PCIe trace, the TH occasionally still
      gets invoked one more time after HW interrupt deassertion. (With PCIe
      legacy interrupts, TH is called repeatedly as long as the interrupt line
      is asserted). We suspect that there is a small detection delay in the
      SW.
      
      To avoid this issue, Check DWC3_EVENT_PENDING flag to determine if the
      events are processed in the bottom-half IRQ handler. If not, return
      IRQ_HANDLED and don't process new event.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThinh Nguyen <thinhn@synopsys.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      d325a1de
    • Roger Quadros's avatar
      usb: dwc3: gadget: Fix ISO transfer performance · f1d6826c
      Roger Quadros authored
      Commit 08a36b54 ("usb: dwc3: gadget: simplify __dwc3_gadget_ep_queue()")
      caused a small change in the way ISO transfer is handled in the case
      when XferInProgress event happens on Isoc EP with an active transfer.
      This caused a performance degradation of 50%. e.g. using g_webcam on DUT
      and luvcview on host the video frame rate dropped from 16fps to 8fps
      @high-speed.
      
      Make the ISO transfer handling equivalent to that prior to that commit
      to get back the original ISO performance numbers.
      
      Fixes: 08a36b54 ("usb: dwc3: gadget: simplify __dwc3_gadget_ep_queue()")
      Signed-off-by: default avatarRoger Quadros <rogerq@ti.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      f1d6826c
    • Heikki Krogerus's avatar
      usb: dwc3: pci: add Intel Cannonlake PCI IDs · 68217959
      Heikki Krogerus authored
      Intel Cannonlake PCH has the same DWC3 than Intel
      Sunrisepoint. Add the new IDs to the supported devices.
      Signed-off-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      68217959
  3. 13 May, 2017 5 commits
    • Linus Torvalds's avatar
      Linux 4.12-rc1 · 2ea659a9
      Linus Torvalds authored
      2ea659a9
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · cd636458
      Linus Torvalds authored
      Pull some more input subsystem updates from Dmitry Torokhov:
       "An updated xpad driver with a few more recognized device IDs, and a
        new psxpad-spi driver, allowing connecting Playstation 1 and 2 joypads
        via SPI bus"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: cros_ec_keyb - remove extraneous 'const'
        Input: add support for PlayStation 1/2 joypads connected via SPI
        Input: xpad - add USB IDs for Mad Catz Brawlstick and Razer Sabertooth
        Input: xpad - sync supported devices with xboxdrv
        Input: xpad - sort supported devices by USB ID
      cd636458
    • Linus Torvalds's avatar
      Merge tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs · b53c4d5e
      Linus Torvalds authored
      Pull UBI/UBIFS updates from Richard Weinberger:
      
       - new config option CONFIG_UBIFS_FS_SECURITY
      
       - minor improvements
      
       - random fixes
      
      * tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs:
        ubi: Add debugfs file for tracking PEB state
        ubifs: Fix a typo in comment of ioctl2ubifs & ubifs2ioctl
        ubifs: Remove unnecessary assignment
        ubifs: Fix cut and paste error on sb type comparisons
        ubi: fastmap: Fix slab corruption
        ubifs: Add CONFIG_UBIFS_FS_SECURITY to disable/enable security labels
        ubi: Make mtd parameter readable
        ubi: Fix section mismatch
      b53c4d5e
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · ec059019
      Linus Torvalds authored
      Pull UML fixes from Richard Weinberger:
       "No new stuff, just fixes"
      
      * 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
        um: Add missing NR_CPUS include
        um: Fix to call read_initrd after init_bootmem
        um: Include kbuild.h instead of duplicating its macros
        um: Fix PTRACE_POKEUSER on x86_64
        um: Set number of CPUs
        um: Fix _print_addr()
      ec059019
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 1251704a
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "15 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm, docs: update memory.stat description with workingset* entries
        mm: vmscan: scan until it finds eligible pages
        mm, thp: copying user pages must schedule on collapse
        dax: fix PMD data corruption when fault races with write
        dax: fix data corruption when fault races with write
        ext4: return to starting transaction in ext4_dax_huge_fault()
        mm: fix data corruption due to stale mmap reads
        dax: prevent invalidation of mapped DAX entries
        Tigran has moved
        mm, vmalloc: fix vmalloc users tracking properly
        mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
        gcov: support GCC 7.1
        mm, vmstat: Remove spurious WARN() during zoneinfo print
        time: delete current_fs_time()
        hwpoison, memcg: forcibly uncharge LRU pages
      1251704a
  4. 12 May, 2017 9 commits
    • Roman Gushchin's avatar
      mm, docs: update memory.stat description with workingset* entries · b340959e
      Roman Gushchin authored
      Commit 4b4cea91691d ("mm: vmscan: fix IO/refault regression in cache
      workingset transition") introduced three new entries in memory stat
      file:
      
       - workingset_refault
       - workingset_activate
       - workingset_nodereclaim
      
      This commit adds a corresponding description to the cgroup v2 docs.
      
      Link: http://lkml.kernel.org/r/1494530293-31236-1-git-send-email-guro@fb.comSigned-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b340959e
    • Minchan Kim's avatar
      mm: vmscan: scan until it finds eligible pages · 791b48b6
      Minchan Kim authored
      Although there are a ton of free swap and anonymous LRU page in elgible
      zones, OOM happened.
      
        balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
        CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        Call Trace:
         oom_kill_process+0x21d/0x3f0
         out_of_memory+0xd8/0x390
         __alloc_pages_slowpath+0xbc1/0xc50
         __alloc_pages_nodemask+0x1a5/0x1c0
         pte_alloc_one+0x20/0x50
         __pte_alloc+0x1e/0x110
         __handle_mm_fault+0x919/0x960
         handle_mm_fault+0x77/0x120
         __do_page_fault+0x27a/0x550
         trace_do_page_fault+0x43/0x150
         do_async_page_fault+0x2c/0x90
         async_page_fault+0x28/0x30
        Mem-Info:
        active_anon:424716 inactive_anon:65314 isolated_anon:0
         active_file:52 inactive_file:46 isolated_file:0
         unevictable:0 dirty:27 writeback:0 unstable:0
         slab_reclaimable:3967 slab_unreclaimable:4125
         mapped:133 shmem:43 pagetables:1674 bounce:0
         free:4637 free_pcp:225 free_cma:0
        Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
        DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
        lowmem_reserve[]: 0 992 992 1952
        DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
        lowmem_reserve[]: 0 0 0 959
        Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
        lowmem_reserve[]: 0 0 0 0
        DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
        DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
        Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
        378 total pagecache pages
        17 pages in swap cache
        Swap cache stats: add 17325, delete 17302, find 0/27
        Free swap  = 978940kB
        Total swap = 1048572kB
        524157 pages RAM
        0 pages HighMem/MovableOnly
        12629 pages reserved
        0 pages cma reserved
        0 pages hwpoisoned
        [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
        [  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
        [  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd
      
      With investigation, skipping page of isolate_lru_pages makes reclaim
      void because it returns zero nr_taken easily so LRU shrinking is
      effectively nothing and just increases priority aggressively.  Finally,
      OOM happens.
      
      The problem is that get_scan_count determines nr_to_scan with eligible
      zones so although priority drops to zero, it couldn't reclaim any pages
      if the LRU contains mostly ineligible pages.
      
      get_scan_count:
      
              size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
      	size = size >> sc->priority;
      
      Assumes sc->priority is 0 and LRU list is as follows.
      
      	N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H
      
      (Ie, small eligible pages are in the head of LRU but others are
       almost ineligible pages)
      
      In that case, size becomes 4 so VM want to scan 4 pages but 4 pages from
      tail of the LRU are not eligible pages.  If get_scan_count counts
      skipped pages, it doesn't reclaim any pages remained after scanning 4
      pages so it ends up OOM happening.
      
      This patch makes isolate_lru_pages try to scan pages until it encounters
      eligible zones's pages.
      
      [akpm@linux-foundation.org: clean up mind-bending `for' statement.  Tweak comment text]
      Fixes: 3db65812 ("Revert "mm, vmscan: account for skipped pages as a partial scan"")
      Link: http://lkml.kernel.org/r/1494457232-27401-1-git-send-email-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      791b48b6
    • David Rientjes's avatar
      mm, thp: copying user pages must schedule on collapse · 338a16ba
      David Rientjes authored
      We have encountered need_resched warnings in __collapse_huge_page_copy()
      while doing {clear,copy}_user_highpage() over HPAGE_PMD_NR source pages.
      
      mm->mmap_sem is held for write, but the iteration is well bounded.
      
      Reschedule as needed.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705101426380.109808@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      338a16ba
    • Ross Zwisler's avatar
      dax: fix PMD data corruption when fault races with write · 876f2946
      Ross Zwisler authored
      This is based on a patch from Jan Kara that fixed the equivalent race in
      the DAX PTE fault path.
      
      Currently DAX PMD read fault can race with write(2) in the following
      way:
      
      CPU1 - write(2)                 CPU2 - read fault
                                      dax_iomap_pmd_fault()
                                        ->iomap_begin() - sees hole
      
      dax_iomap_rw()
        iomap_apply()
          ->iomap_begin - allocates blocks
          dax_iomap_actor()
            invalidate_inode_pages2_range()
              - there's nothing to invalidate
      
                                        grab_mapping_entry()
      				  - we add huge zero page to the radix tree
      				    and map it to page tables
      
      The result is that hole page is mapped into page tables (and thus zeros
      are seen in mmap) while file has data written in that place.
      
      Fix the problem by locking exception entry before mapping blocks for the
      fault.  That way we are sure invalidate_inode_pages2_range() call for
      racing write will either block on entry lock waiting for the fault to
      finish (and unmap stale page tables after that) or read fault will see
      already allocated blocks by write(2).
      
      Fixes: 9f141d6e ("dax: Call ->iomap_begin without entry lock during dax fault")
      Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      876f2946
    • Jan Kara's avatar
      dax: fix data corruption when fault races with write · 13e451fd
      Jan Kara authored
      Currently DAX read fault can race with write(2) in the following way:
      
      CPU1 - write(2)			CPU2 - read fault
      				dax_iomap_pte_fault()
      				  ->iomap_begin() - sees hole
      dax_iomap_rw()
        iomap_apply()
          ->iomap_begin - allocates blocks
          dax_iomap_actor()
            invalidate_inode_pages2_range()
              - there's nothing to invalidate
      				  grab_mapping_entry()
      				  - we add zero page in the radix tree
      				    and map it to page tables
      
      The result is that hole page is mapped into page tables (and thus zeros
      are seen in mmap) while file has data written in that place.
      
      Fix the problem by locking exception entry before mapping blocks for the
      fault.  That way we are sure invalidate_inode_pages2_range() call for
      racing write will either block on entry lock waiting for the fault to
      finish (and unmap stale page tables after that) or read fault will see
      already allocated blocks by write(2).
      
      Fixes: 9f141d6e
      Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.czSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13e451fd
    • Jan Kara's avatar
      ext4: return to starting transaction in ext4_dax_huge_fault() · fb26a1cb
      Jan Kara authored
      DAX will return to locking exceptional entry before mapping blocks for a
      page fault to fix possible races with concurrent writes.  To avoid lock
      inversion between exceptional entry lock and transaction start, start
      the transaction already in ext4_dax_huge_fault().
      
      Fixes: 9f141d6e
      Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.czSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb26a1cb
    • Jan Kara's avatar
      mm: fix data corruption due to stale mmap reads · cd656375
      Jan Kara authored
      Currently, we didn't invalidate page tables during invalidate_inode_pages2()
      for DAX.  That could result in e.g. 2MiB zero page being mapped into
      page tables while there were already underlying blocks allocated and
      thus data seen through mmap were different from data seen by read(2).
      The following sequence reproduces the problem:
      
       - open an mmap over a 2MiB hole
      
       - read from a 2MiB hole, faulting in a 2MiB zero page
      
       - write to the hole with write(3p). The write succeeds but we
         incorrectly leave the 2MiB zero page mapping intact.
      
       - via the mmap, read the data that was just written. Since the zero
         page mapping is still intact we read back zeroes instead of the new
         data.
      
      Fix the problem by unconditionally calling invalidate_inode_pages2_range()
      in dax_iomap_actor() for new block allocations and by properly
      invalidating page tables in invalidate_inode_pages2_range() for DAX
      mappings.
      
      Fixes: c6dcf52c
      Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.czSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd656375
    • Ross Zwisler's avatar
      dax: prevent invalidation of mapped DAX entries · 4636e70b
      Ross Zwisler authored
      Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
      v4.
      
      This series fixes data corruption that can happen for DAX mounts when
      page faults race with write(2) and as a result page tables get out of
      sync with block mappings in the filesystem and thus data seen through
      mmap is different from data seen through read(2).
      
      The series passes testing with t_mmap_stale test program from Ross and
      also other mmap related tests on DAX filesystem.
      
      This patch (of 4):
      
      dax_invalidate_mapping_entry() currently removes DAX exceptional entries
      only if they are clean and unlocked.  This is done via:
      
        invalidate_mapping_pages()
          invalidate_exceptional_entry()
            dax_invalidate_mapping_entry()
      
      However, for page cache pages removed in invalidate_mapping_pages()
      there is an additional criteria which is that the page must not be
      mapped.  This is noted in the comments above invalidate_mapping_pages()
      and is checked in invalidate_inode_page().
      
      For DAX entries this means that we can can end up in a situation where a
      DAX exceptional entry, either a huge zero page or a regular DAX entry,
      could end up mapped but without an associated radix tree entry.  This is
      inconsistent with the rest of the DAX code and with what happens in the
      page cache case.
      
      We aren't able to unmap the DAX exceptional entry because according to
      its comments invalidate_mapping_pages() isn't allowed to block, and
      unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
      
      Since we essentially never have unmapped DAX entries to evict from the
      radix tree, just remove dax_invalidate_mapping_entry().
      
      Fixes: c6dcf52c ("mm: Invalidate DAX radix tree entries only if appropriate")
      Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.czSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reported-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>    [4.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4636e70b
    • Andrew Morton's avatar
      Tigran has moved · cea58224
      Andrew Morton authored
      Cc: Tigran Aivazian <aivazian.tigran@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cea58224