1. 26 Aug, 2020 29 commits
    • Zhe Li's avatar
      jffs2: fix UAF problem · 96de3dbf
      Zhe Li authored
      [ Upstream commit 798b7347 ]
      
      The log of UAF problem is listed below.
      BUG: KASAN: use-after-free in jffs2_rmdir+0xa4/0x1cc [jffs2] at addr c1f165fc
      Read of size 4 by task rm/8283
      =============================================================================
      BUG kmalloc-32 (Tainted: P    B      O   ): kasan: bad access detected
      -----------------------------------------------------------------------------
      
      INFO: Allocated in 0xbbbbbbbb age=3054364 cpu=0 pid=0
              0xb0bba6ef
              jffs2_write_dirent+0x11c/0x9c8 [jffs2]
              __slab_alloc.isra.21.constprop.25+0x2c/0x44
              __kmalloc+0x1dc/0x370
              jffs2_write_dirent+0x11c/0x9c8 [jffs2]
              jffs2_do_unlink+0x328/0x5fc [jffs2]
              jffs2_rmdir+0x110/0x1cc [jffs2]
              vfs_rmdir+0x180/0x268
              do_rmdir+0x2cc/0x300
              ret_from_syscall+0x0/0x3c
      INFO: Freed in 0x205b age=3054364 cpu=0 pid=0
              0x2e9173
              jffs2_add_fd_to_list+0x138/0x1dc [jffs2]
              jffs2_add_fd_to_list+0x138/0x1dc [jffs2]
              jffs2_garbage_collect_dirent.isra.3+0x21c/0x288 [jffs2]
              jffs2_garbage_collect_live+0x16bc/0x1800 [jffs2]
              jffs2_garbage_collect_pass+0x678/0x11d4 [jffs2]
              jffs2_garbage_collect_thread+0x1e8/0x3b0 [jffs2]
              kthread+0x1a8/0x1b0
              ret_from_kernel_thread+0x5c/0x64
      Call Trace:
      [c17ddd20] [c02452d4] kasan_report.part.0+0x298/0x72c (unreliable)
      [c17ddda0] [d2509680] jffs2_rmdir+0xa4/0x1cc [jffs2]
      [c17dddd0] [c026da04] vfs_rmdir+0x180/0x268
      [c17dde00] [c026f4e4] do_rmdir+0x2cc/0x300
      [c17ddf40] [c001a658] ret_from_syscall+0x0/0x3c
      
      The root cause is that we don't get "jffs2_inode_info.sem" before
      we scan list "jffs2_inode_info.dents" in function jffs2_rmdir.
      This patch add codes to get "jffs2_inode_info.sem" before we scan
      "jffs2_inode_info.dents" to slove the UAF problem.
      Signed-off-by: default avatarZhe Li <lizhe67@huawei.com>
      Reviewed-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      96de3dbf
    • Darrick J. Wong's avatar
      xfs: fix inode quota reservation checks · 1bc31e52
      Darrick J. Wong authored
      [ Upstream commit f959b5d0 ]
      
      xfs_trans_dqresv is the function that we use to make reservations
      against resource quotas.  Each resource contains two counters: the
      q_core counter, which tracks resources allocated on disk; and the dquot
      reservation counter, which tracks how much of that resource has either
      been allocated or reserved by threads that are working on metadata
      updates.
      
      For disk blocks, we compare the proposed reservation counter against the
      hard and soft limits to decide if we're going to fail the operation.
      However, for inodes we inexplicably compare against the q_core counter,
      not the incore reservation count.
      
      Since the q_core counter is always lower than the reservation count and
      we unlock the dquot between reservation and transaction commit, this
      means that multiple threads can reserve the last inode count before we
      hit the hard limit, and when they commit, we'll be well over the hard
      limit.
      
      Fix this by checking against the incore inode reservation counter, since
      we would appear to maintain that correctly (and that's what we report in
      GETQUOTA).
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarAllison Collins <allison.henderson@oracle.com>
      Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1bc31e52
    • Chuck Lever's avatar
      svcrdma: Fix another Receive buffer leak · 88f7857c
      Chuck Lever authored
      [ Upstream commit 64d26422 ]
      
      During a connection tear down, the Receive queue is flushed before
      the device resources are freed. Typically, all the Receives flush
      with IB_WR_FLUSH_ERR.
      
      However, any pending successful Receives flush with IB_WR_SUCCESS,
      and the server automatically posts a fresh Receive to replace the
      completing one. This happens even after the connection has closed
      and the RQ is drained. Receives that are posted after the RQ is
      drained appear never to complete, causing a Receive resource leak.
      The leaked Receive buffer is left DMA-mapped.
      
      To prevent these late-posted recv_ctxt's from leaking, block new
      Receive posting after XPT_CLOSE is set.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      88f7857c
    • Greg Ungerer's avatar
      m68knommu: fix overwriting of bits in ColdFire V3 cache control · 465bc917
      Greg Ungerer authored
      [ Upstream commit bdee0e79 ]
      
      The Cache Control Register (CACR) of the ColdFire V3 has bits that
      control high level caching functions, and also enable/disable the use
      of the alternate stack pointer register (the EUSP bit) to provide
      separate supervisor and user stack pointer registers. The code as
      it is today will blindly clear the EUSP bit on cache actions like
      invalidation. So it is broken for this case - and that will result
      in failed booting (interrupt entry and exit processing will be
      completely hosed).
      
      This only affects ColdFire V3 parts that support the alternate stack
      register (like the 5329 for example) - generally speaking new parts do,
      older parts don't. It has no impact on ColdFire V3 parts with the single
      stack pointer, like the 5307 for example.
      
      Fix the cache bit defines used, so they maintain the EUSP bit when
      carrying out cache actions through the CACR register.
      Signed-off-by: default avatarGreg Ungerer <gerg@linux-m68k.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      465bc917
    • Xiongfeng Wang's avatar
      Input: psmouse - add a newline when printing 'proto' by sysfs · c4af8c25
      Xiongfeng Wang authored
      [ Upstream commit 4aec14de ]
      
      When I cat parameter 'proto' by sysfs, it displays as follows. It's
      better to add a newline for easy reading.
      
      root@syzkaller:~# cat /sys/module/psmouse/parameters/proto
      autoroot@syzkaller:~#
      Signed-off-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
      Link: https://lore.kernel.org/r/20200720073846.120724-1-wangxiongfeng2@huawei.comSigned-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c4af8c25
    • Evgeny Novikov's avatar
      media: vpss: clean up resources in init · 773ae06f
      Evgeny Novikov authored
      [ Upstream commit 9c487b0b ]
      
      If platform_driver_register() fails within vpss_init() resources are not
      cleaned up. The patch fixes this issue by introducing the corresponding
      error handling.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarEvgeny Novikov <novikov@ispras.ru>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      773ae06f
    • Huacai Chen's avatar
      rtc: goldfish: Enable interrupt in set_alarm() when necessary · b5a5b21f
      Huacai Chen authored
      [ Upstream commit 22f8d5a1 ]
      
      When use goldfish rtc, the "hwclock" command fails with "select() to
      /dev/rtc to wait for clock tick timed out". This is because "hwclock"
      need the set_alarm() hook to enable interrupt when alrm->enabled is
      true. This operation is missing in goldfish rtc (but other rtc drivers,
      such as cmos rtc, enable interrupt here), so add it.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Link: https://lore.kernel.org/r/1592654683-31314-1-git-send-email-chenhc@lemote.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      b5a5b21f
    • Chuhong Yuan's avatar
      media: budget-core: Improve exception handling in budget_register() · b07b9521
      Chuhong Yuan authored
      [ Upstream commit fc045645 ]
      
      budget_register() has no error handling after its failure.
      Add the missed undo functions for error handling to fix it.
      Signed-off-by: default avatarChuhong Yuan <hslester96@gmail.com>
      Signed-off-by: default avatarSean Young <sean@mess.org>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b07b9521
    • Bodo Stroesser's avatar
      scsi: target: tcmu: Fix crash in tcmu_flush_dcache_range on ARM · 7d057ec3
      Bodo Stroesser authored
      [ Upstream commit 3145550a ]
      
      This patch fixes the following crash (see
      https://bugzilla.kernel.org/show_bug.cgi?id=208045)
      
       Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a)
       CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic
              #202004230533
       Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
       pstate: 80400005 (Nzcv daif +PAN -UAO)
       pc : flush_dcache_page+0x18/0x40
       lr : is_ring_space_avail+0x68/0x2f8 [target_core_user]
       sp : ffff000015123a80
       x29: ffff000015123a80 x28: 0000000000000000
       x27: 0000000000001000 x26: ffff000023ea5000
       x25: ffffcfa25bbe08b8 x24: 0000000000000078
       x23: ffff7e0000000000 x22: ffff000023ea5001
       x21: ffffcfa24b79c000 x20: 0000000000000fff
       x19: ffff7e00008fa940 x18: 0000000000000000
       x17: 0000000000000000 x16: ffff2d047e709138
       x15: 0000000000000000 x14: 0000000000000000
       x13: 0000000000000000 x12: ffff2d047fbd0a40
       x11: 0000000000000000 x10: 0000000000000030
       x9 : 0000000000000000 x8 : ffffc9a254820a00
       x7 : 00000000000013b0 x6 : 000000000000003f
       x5 : 0000000000000040 x4 : ffffcfa25bbe08e8
       x3 : 0000000000001000 x2 : 0000000000000078
       x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18
       Call trace:
        flush_dcache_page+0x18/0x40
        is_ring_space_avail+0x68/0x2f8 [target_core_user]
        queue_cmd_ring+0x1f8/0x680 [target_core_user]
        tcmu_queue_cmd+0xe4/0x158 [target_core_user]
        __target_execute_cmd+0x30/0xf0 [target_core_mod]
        target_execute_cmd+0x294/0x390 [target_core_mod]
        transport_generic_new_cmd+0x1e8/0x358 [target_core_mod]
        transport_handle_cdb_direct+0x50/0xb0 [target_core_mod]
        iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod]
        iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod]
        iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod]
        iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod]
        iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod]
        kthread+0x130/0x138
        ret_from_fork+0x10/0x18
       Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260)
       ---[ end trace 1e451c73f4266776 ]---
      
      The solution is based on patch:
      
        "scsi: target: tcmu: Optimize use of flush_dcache_page"
      
      which restricts the use of tcmu_flush_dcache_range() to addresses from
      vmalloc'ed areas only.
      
      This patch now replaces the virt_to_page() call in
      tcmu_flush_dcache_range() - which is wrong for vmalloced addrs - by
      vmalloc_to_page().
      
      The patch was tested on ARM with kernel 4.19.118 and 5.7.2
      
      Link: https://lore.kernel.org/r/20200618131632.32748-3-bstroesser@ts.fujitsu.comTested-by: default avatarJiangYu <lnsyyj@hotmail.com>
      Tested-by: default avatarDaniel Meyerholt <dxm523@gmail.com>
      Acked-by: default avatarMike Christie <michael.christie@oracle.com>
      Signed-off-by: default avatarBodo Stroesser <bstroesser@ts.fujitsu.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7d057ec3
    • Stanley Chu's avatar
      scsi: ufs: Add DELAY_BEFORE_LPM quirk for Micron devices · 2ba7c21c
      Stanley Chu authored
      [ Upstream commit c0a18ee0 ]
      
      It is confirmed that Micron device needs DELAY_BEFORE_LPM quirk to have a
      delay before VCC is powered off. Sdd Micron vendor ID and this quirk for
      Micron devices.
      
      Link: https://lore.kernel.org/r/20200612012625.6615-2-stanley.chu@mediatek.comReviewed-by: default avatarBean Huo <beanhuo@micron.com>
      Reviewed-by: default avatarAlim Akhtar <alim.akhtar@samsung.com>
      Signed-off-by: default avatarStanley Chu <stanley.chu@mediatek.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2ba7c21c
    • Lukas Wunner's avatar
      spi: Prevent adding devices below an unregistering controller · 30872ac7
      Lukas Wunner authored
      [ Upstream commit ddf75be4 ]
      
      CONFIG_OF_DYNAMIC and CONFIG_ACPI allow adding SPI devices at runtime
      using a DeviceTree overlay or DSDT patch.  CONFIG_SPI_SLAVE allows the
      same via sysfs.
      
      But there are no precautions to prevent adding a device below a
      controller that's being removed.  Such a device is unusable and may not
      even be able to unbind cleanly as it becomes inaccessible once the
      controller has been torn down.  E.g. it is then impossible to quiesce
      the device's interrupt.
      
      of_spi_notify() and acpi_spi_notify() do hold a ref on the controller,
      but otherwise run lockless against spi_unregister_controller().
      
      Fix by holding the spi_add_lock in spi_unregister_controller() and
      bailing out of spi_add_device() if the controller has been unregistered
      concurrently.
      
      Fixes: ce79d54a ("spi/of: Add OF notifier handler")
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org # v3.19+
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Octavian Purdila <octavian.purdila@intel.com>
      Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
      Link: https://lore.kernel.org/r/a8c3205088a969dc8410eec1eba9aface60f36af.1596451035.git.lukas@wunner.deSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      30872ac7
    • Liang Chen's avatar
      kthread: Do not preempt current task if it is going to call schedule() · 1c263d0e
      Liang Chen authored
      commit 26c7295b upstream.
      
      when we create a kthread with ktrhead_create_on_cpu(),the child thread
      entry is ktread.c:ktrhead() which will be preempted by the parent after
      call complete(done) while schedule() is not called yet,then the parent
      will call wait_task_inactive(child) but the child is still on the runqueue,
      so the parent will schedule_hrtimeout() for 1 jiffy,it will waste a lot of
      time,especially on startup.
      
        parent                             child
      ktrhead_create_on_cpu()
        wait_fo_completion(&done) -----> ktread.c:ktrhead()
                                   |----- complete(done);--wakeup and preempted by parent
       kthread_bind() <------------|  |-> schedule();--dequeue here
        wait_task_inactive(child)     |
         schedule_hrtimeout(1 jiffy) -|
      
      So we hope the child just wakeup parent but not preempted by parent, and the
      child is going to call schedule() soon,then the parent will not call
      schedule_hrtimeout(1 jiffy) as the child is already dequeue.
      
      The same issue for ktrhead_park()&&kthread_parkme().
      This patch can save 120ms on rk312x startup with CONFIG_HZ=300.
      Signed-off-by: default avatarLiang Chen <cl@rock-chips.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Link: https://lkml.kernel.org/r/20200306070133.18335-2-cl@rock-chips.comSigned-off-by: default avatarChanho Park <chanho61.park@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c263d0e
    • Krunoslav Kovac's avatar
      drm/amd/display: fix pow() crashing when given base 0 · e05c6786
      Krunoslav Kovac authored
      commit d2e59d0f upstream.
      
      [Why&How]
      pow(a,x) is implemented as exp(x*log(a)). log(0) will crash.
      So return 0^x = 0, unless x=0, convention seems to be 0^0 = 1.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKrunoslav Kovac <Krunoslav.Kovac@amd.com>
      Reviewed-by: default avatarAnthony Koo <Anthony.Koo@amd.com>
      Acked-by: default avatarRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e05c6786
    • Steffen Maier's avatar
      scsi: zfcp: Fix use-after-free in request timeout handlers · d25a2b92
      Steffen Maier authored
      commit 2d9a2c5f upstream.
      
      Before v4.15 commit 75492a51 ("s390/scsi: Convert timers to use
      timer_setup()"), we intentionally only passed zfcp_adapter as context
      argument to zfcp_fsf_request_timeout_handler(). Since we only trigger
      adapter recovery, it was unnecessary to sync against races between timeout
      and (late) completion.  Likewise, we only passed zfcp_erp_action as context
      argument to zfcp_erp_timeout_handler(). Since we only wakeup an ERP action,
      it was unnecessary to sync against races between timeout and (late)
      completion.
      
      Meanwhile the timeout handlers get timer_list as context argument and do a
      timer-specific container-of to zfcp_fsf_req which can have been freed.
      
      Fix it by making sure that any request timeout handlers, that might just
      have started before del_timer(), are completed by using del_timer_sync()
      instead. This ensures the request free happens afterwards.
      
      Space time diagram of potential use-after-free:
      
      Basic idea is to have 2 or more pending requests whose timeouts run out at
      almost the same time.
      
      req 1 timeout     ERP thread        req 2 timeout
      ----------------  ----------------  ---------------------------------------
      zfcp_fsf_request_timeout_handler
      fsf_req = from_timer(fsf_req, t, timer)
      adapter = fsf_req->adapter
      zfcp_qdio_siosl(adapter)
      zfcp_erp_adapter_reopen(adapter,...)
                        zfcp_erp_strategy
                        ...
                        zfcp_fsf_req_dismiss_all
                        list_for_each_entry_safe
                          zfcp_fsf_req_complete 1
                          del_timer 1
                          zfcp_fsf_req_free 1
                          zfcp_fsf_req_complete 2
                                          zfcp_fsf_request_timeout_handler
                          del_timer 2
                                          fsf_req = from_timer(fsf_req, t, timer)
                          zfcp_fsf_req_free 2
                                          adapter = fsf_req->adapter
                                                    ^^^^^^^ already freed
      
      Link: https://lore.kernel.org/r/20200813152856.50088-1-maier@linux.ibm.com
      Fixes: 75492a51 ("s390/scsi: Convert timers to use timer_setup()")
      Cc: <stable@vger.kernel.org> #4.15+
      Suggested-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d25a2b92
    • zhangyi (F)'s avatar
      jbd2: add the missing unlock_buffer() in the error path of jbd2_write_superblock() · 402ff143
      zhangyi (F) authored
      commit ef3f5830 upstream.
      
      jbd2_write_superblock() is under the buffer lock of journal superblock
      before ending that superblock write, so add a missing unlock_buffer() in
      in the error path before submitting buffer.
      
      Fixes: 742b06b5 ("jbd2: check superblock mapped prior to committing")
      Signed-off-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: default avatarRitesh Harjani <riteshh@linux.ibm.com>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/20200620061948.2049579-1-yi.zhang@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      402ff143
    • Jan Kara's avatar
      ext4: fix checking of directory entry validity for inline directories · a5d3f789
      Jan Kara authored
      commit 7303cb5b upstream.
      
      ext4_search_dir() and ext4_generic_delete_entry() can be called both for
      standard director blocks and for inline directories stored inside inode
      or inline xattr space. For the second case we didn't call
      ext4_check_dir_entry() with proper constraints that could result in
      accepting corrupted directory entry as well as false positive filesystem
      errors like:
      
      EXT4-fs error (device dm-0): ext4_search_dir:1395: inode #28320400:
      block 113246792: comm dockerd: bad entry in directory: directory entry too
      close to block end - offset=0, inode=28320403, rec_len=32, name_len=8,
      size=4096
      
      Fix the arguments passed to ext4_check_dir_entry().
      
      Fixes: 109ba779 ("ext4: check for directory entries too close to block end")
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20200731162135.8080-1-jack@suse.czSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5d3f789
    • Charan Teja Reddy's avatar
      mm, page_alloc: fix core hung in free_pcppages_bulk() · c666936d
      Charan Teja Reddy authored
      commit 88e8ac11 upstream.
      
      The following race is observed with the repeated online, offline and a
      delay between two successive online of memory blocks of movable zone.
      
      P1						P2
      
      Online the first memory block in
      the movable zone. The pcp struct
      values are initialized to default
      values,i.e., pcp->high = 0 &
      pcp->batch = 1.
      
      					Allocate the pages from the
      					movable zone.
      
      Try to Online the second memory
      block in the movable zone thus it
      entered the online_pages() but yet
      to call zone_pcp_update().
      					This process is entered into
      					the exit path thus it tries
      					to release the order-0 pages
      					to pcp lists through
      					free_unref_page_commit().
      					As pcp->high = 0, pcp->count = 1
      					proceed to call the function
      					free_pcppages_bulk().
      Update the pcp values thus the
      new pcp values are like, say,
      pcp->high = 378, pcp->batch = 63.
      					Read the pcp's batch value using
      					READ_ONCE() and pass the same to
      					free_pcppages_bulk(), pcp values
      					passed here are, batch = 63,
      					count = 1.
      
      					Since num of pages in the pcp
      					lists are less than ->batch,
      					then it will stuck in
      					while(list_empty(list)) loop
      					with interrupts disabled thus
      					a core hung.
      
      Avoid this by ensuring free_pcppages_bulk() is called with proper count of
      pcp list pages.
      
      The mentioned race is some what easily reproducible without [1] because
      pcp's are not updated for the first memory block online and thus there is
      a enough race window for P2 between alloc+free and pcp struct values
      update through onlining of second memory block.
      
      With [1], the race still exists but it is very narrow as we update the pcp
      struct values for the first memory block online itself.
      
      This is not limited to the movable zone, it could also happen in cases
      with the normal zone (e.g., hotplug to a node that only has DMA memory, or
      no other memory yet).
      
      [1]: https://patchwork.kernel.org/patch/11696389/
      
      Fixes: 5f8dcc21 ("page-allocator: split per-cpu list into one-list-per-migrate-type")
      Signed-off-by: default avatarCharan Teja Reddy <charante@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Cc: <stable@vger.kernel.org> [2.6+]
      Link: http://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaurora.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c666936d
    • Doug Berger's avatar
      mm: include CMA pages in lowmem_reserve at boot · 84b8dc23
      Doug Berger authored
      commit e08d3fdf upstream.
      
      The lowmem_reserve arrays provide a means of applying pressure against
      allocations from lower zones that were targeted at higher zones.  Its
      values are a function of the number of pages managed by higher zones and
      are assigned by a call to the setup_per_zone_lowmem_reserve() function.
      
      The function is initially called at boot time by the function
      init_per_zone_wmark_min() and may be called later by accesses of the
      /proc/sys/vm/lowmem_reserve_ratio sysctl file.
      
      The function init_per_zone_wmark_min() was moved up from a module_init to
      a core_initcall to resolve a sequencing issue with khugepaged.
      Unfortunately this created a sequencing issue with CMA page accounting.
      
      The CMA pages are added to the managed page count of a zone when
      cma_init_reserved_areas() is called at boot also as a core_initcall.  This
      makes it uncertain whether the CMA pages will be added to the managed page
      counts of their zones before or after the call to
      init_per_zone_wmark_min() as it becomes dependent on link order.  With the
      current link order the pages are added to the managed count after the
      lowmem_reserve arrays are initialized at boot.
      
      This means the lowmem_reserve values at boot may be lower than the values
      used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the
      ratio values are unchanged.
      
      In many cases the difference is not significant, but for example
      an ARM platform with 1GB of memory and the following memory layout
      
        cma: Reserved 256 MiB at 0x0000000030000000
        Zone ranges:
          DMA      [mem 0x0000000000000000-0x000000002fffffff]
          Normal   empty
          HighMem  [mem 0x0000000030000000-0x000000003fffffff]
      
      would result in 0 lowmem_reserve for the DMA zone.  This would allow
      userspace to deplete the DMA zone easily.
      
      Funnily enough
      
        $ cat /proc/sys/vm/lowmem_reserve_ratio
      
      would fix up the situation because as a side effect it forces
      setup_per_zone_lowmem_reserve.
      
      This commit breaks the link order dependency by invoking
      init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages
      have the chance to be properly accounted in their zone(s) and allowing
      the lowmem_reserve arrays to receive consistent values.
      
      Fixes: bc22af74 ("mm: update min_free_kbytes from khugepaged after core initialization")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1597423766-27849-1-git-send-email-opendmb@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84b8dc23
    • Wei Yongjun's avatar
      kernel/relay.c: fix memleak on destroy relay channel · 3a54b901
      Wei Yongjun authored
      commit 71e84329 upstream.
      
      kmemleak report memory leak as follows:
      
        unreferenced object 0x607ee4e5f948 (size 8):
        comm "syz-executor.1", pid 2098, jiffies 4295031601 (age 288.468s)
        hex dump (first 8 bytes):
        00 00 00 00 00 00 00 00 ........
        backtrace:
           relay_open kernel/relay.c:583 [inline]
           relay_open+0xb6/0x970 kernel/relay.c:563
           do_blk_trace_setup+0x4a8/0xb20 kernel/trace/blktrace.c:557
           __blk_trace_setup+0xb6/0x150 kernel/trace/blktrace.c:597
           blk_trace_ioctl+0x146/0x280 kernel/trace/blktrace.c:738
           blkdev_ioctl+0xb2/0x6a0 block/ioctl.c:613
           block_ioctl+0xe5/0x120 fs/block_dev.c:1871
           vfs_ioctl fs/ioctl.c:48 [inline]
           __do_sys_ioctl fs/ioctl.c:753 [inline]
           __se_sys_ioctl fs/ioctl.c:739 [inline]
           __x64_sys_ioctl+0x170/0x1ce fs/ioctl.c:739
           do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      'chan->buf' is malloced in relay_open() by alloc_percpu() but not free
      while destroy the relay channel.  Fix it by adding free_percpu() before
      return from relay_destroy_channel().
      
      Fixes: 017c59c0 ("relay: Use per CPU constructs for the relay channel buffer pointers")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Akash Goel <akash.goel@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200817122826.48518-1-weiyongjun1@huawei.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a54b901
    • Jann Horn's avatar
      romfs: fix uninitialized memory leak in romfs_dev_read() · 96609837
      Jann Horn authored
      commit bcf85fce upstream.
      
      romfs has a superblock field that limits the size of the filesystem; data
      beyond that limit is never accessed.
      
      romfs_dev_read() fetches a caller-supplied number of bytes from the
      backing device.  It returns 0 on success or an error code on failure;
      therefore, its API can't represent short reads, it's all-or-nothing.
      
      However, when romfs_dev_read() detects that the requested operation would
      cross the filesystem size limit, it currently silently truncates the
      requested number of bytes.  This e.g.  means that when the content of a
      file with size 0x1000 starts one byte before the filesystem size limit,
      ->readpage() will only fill a single byte of the supplied page while
      leaving the rest uninitialized, leaking that uninitialized memory to
      userspace.
      
      Fix it by returning an error code instead of truncating the read when the
      requested read operation would go beyond the end of the filesystem.
      
      Fixes: da4458bd ("NOMMU: Make it possible for RomFS to use MTD devices directly")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200818013202.2246365-1-jannh@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96609837
    • Josef Bacik's avatar
      btrfs: sysfs: use NOFS for device creation · 76c38196
      Josef Bacik authored
      [ Upstream commit a47bd78d ]
      
      Dave hit this splat during testing btrfs/078:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.8.0-rc6-default+ #1191 Not tainted
        ------------------------------------------------------
        kswapd0/75 is trying to acquire lock:
        ffffa040e9d04ff8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
      
        but task is already holding lock:
        ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (fs_reclaim){+.+.}-{0:0}:
      	 __lock_acquire+0x56f/0xaa0
      	 lock_acquire+0xa3/0x440
      	 fs_reclaim_acquire.part.0+0x25/0x30
      	 __kmalloc_track_caller+0x49/0x330
      	 kstrdup+0x2e/0x60
      	 __kernfs_new_node.constprop.0+0x44/0x250
      	 kernfs_new_node+0x25/0x50
      	 kernfs_create_link+0x34/0xa0
      	 sysfs_do_create_link_sd+0x5e/0xd0
      	 btrfs_sysfs_add_devices_dir+0x65/0x100 [btrfs]
      	 btrfs_init_new_device+0x44c/0x12b0 [btrfs]
      	 btrfs_ioctl+0xc3c/0x25c0 [btrfs]
      	 ksys_ioctl+0x68/0xa0
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x50/0xe0
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
      	 __lock_acquire+0x56f/0xaa0
      	 lock_acquire+0xa3/0x440
      	 __mutex_lock+0xa0/0xaf0
      	 btrfs_chunk_alloc+0x137/0x3e0 [btrfs]
      	 find_free_extent+0xb44/0xfb0 [btrfs]
      	 btrfs_reserve_extent+0x9b/0x180 [btrfs]
      	 btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
      	 alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
      	 __btrfs_cow_block+0x143/0x7a0 [btrfs]
      	 btrfs_cow_block+0x15f/0x310 [btrfs]
      	 push_leaf_right+0x150/0x240 [btrfs]
      	 split_leaf+0x3cd/0x6d0 [btrfs]
      	 btrfs_search_slot+0xd14/0xf70 [btrfs]
      	 btrfs_insert_empty_items+0x64/0xc0 [btrfs]
      	 __btrfs_commit_inode_delayed_items+0xb2/0x840 [btrfs]
      	 btrfs_async_run_delayed_root+0x10e/0x1d0 [btrfs]
      	 btrfs_work_helper+0x2f9/0x650 [btrfs]
      	 process_one_work+0x22c/0x600
      	 worker_thread+0x50/0x3b0
      	 kthread+0x137/0x150
      	 ret_from_fork+0x1f/0x30
      
        -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
      	 check_prev_add+0x98/0xa20
      	 validate_chain+0xa8c/0x2a00
      	 __lock_acquire+0x56f/0xaa0
      	 lock_acquire+0xa3/0x440
      	 __mutex_lock+0xa0/0xaf0
      	 __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
      	 btrfs_evict_inode+0x3bf/0x560 [btrfs]
      	 evict+0xd6/0x1c0
      	 dispose_list+0x48/0x70
      	 prune_icache_sb+0x54/0x80
      	 super_cache_scan+0x121/0x1a0
      	 do_shrink_slab+0x175/0x420
      	 shrink_slab+0xb1/0x2e0
      	 shrink_node+0x192/0x600
      	 balance_pgdat+0x31f/0x750
      	 kswapd+0x206/0x510
      	 kthread+0x137/0x150
      	 ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
        Chain exists of:
          &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(fs_reclaim);
      				 lock(&fs_info->chunk_mutex);
      				 lock(fs_reclaim);
          lock(&delayed_node->mutex);
      
         *** DEADLOCK ***
      
        3 locks held by kswapd0/75:
         #0: ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
         #1: ffffffff8b0b50b8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x54/0x2e0
         #2: ffffa040e057c0e8 (&type->s_umount_key#26){++++}-{3:3}, at: trylock_super+0x16/0x50
      
        stack backtrace:
        CPU: 2 PID: 75 Comm: kswapd0 Not tainted 5.8.0-rc6-default+ #1191
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
        Call Trace:
         dump_stack+0x78/0xa0
         check_noncircular+0x16f/0x190
         check_prev_add+0x98/0xa20
         validate_chain+0xa8c/0x2a00
         __lock_acquire+0x56f/0xaa0
         lock_acquire+0xa3/0x440
         ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         __mutex_lock+0xa0/0xaf0
         ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         ? __lock_acquire+0x56f/0xaa0
         ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         ? lock_acquire+0xa3/0x440
         ? btrfs_evict_inode+0x138/0x560 [btrfs]
         ? btrfs_evict_inode+0x2fe/0x560 [btrfs]
         ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
         btrfs_evict_inode+0x3bf/0x560 [btrfs]
         evict+0xd6/0x1c0
         dispose_list+0x48/0x70
         prune_icache_sb+0x54/0x80
         super_cache_scan+0x121/0x1a0
         do_shrink_slab+0x175/0x420
         shrink_slab+0xb1/0x2e0
         shrink_node+0x192/0x600
         balance_pgdat+0x31f/0x750
         kswapd+0x206/0x510
         ? _raw_spin_unlock_irqrestore+0x3e/0x50
         ? finish_wait+0x90/0x90
         ? balance_pgdat+0x750/0x750
         kthread+0x137/0x150
         ? kthread_stop+0x2a0/0x2a0
         ret_from_fork+0x1f/0x30
      
      This is because we're holding the chunk_mutex while adding this device
      and adding its sysfs entries.  We actually hold different locks in
      different places when calling this function, the dev_replace semaphore
      for instance in dev replace, so instead of moving this call around
      simply wrap it's operations in NOFS.
      
      CC: stable@vger.kernel.org # 4.14+
      Reported-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      76c38196
    • Qu Wenruo's avatar
      btrfs: inode: fix NULL pointer dereference if inode doesn't need compression · 35c15768
      Qu Wenruo authored
      [ Upstream commit 1e6e238c ]
      
      [BUG]
      There is a bug report of NULL pointer dereference caused in
      compress_file_extent():
      
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
        NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs]
        LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs]
        Call Trace:
        [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable)
        [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs]
        [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs]
        [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0
        [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660
        [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0
        [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
        ---[ end trace f16954aa20d822f6 ]---
      
      [CAUSE]
      For the following execution route of compress_file_range(), it's
      possible to hit NULL pointer dereference:
      
       compress_file_extent()
       |- pages = NULL;
       |- start = async_chunk->start = 0;
       |- end = async_chunk = 4095;
       |- nr_pages = 1;
       |- inode_need_compress() == false; <<< Possible, see later explanation
       |  Now, we have nr_pages = 1, pages = NULL
       |- cont:
       |- 		ret = cow_file_range_inline();
       |- 		if (ret <= 0) {
       |-		for (i = 0; i < nr_pages; i++) {
       |-			WARN_ON(pages[i]->mapping);	<<< Crash
      
      To enter above call execution branch, we need the following race:
      
          Thread 1 (chattr)     |            Thread 2 (writeback)
      --------------------------+------------------------------
                                | btrfs_run_delalloc_range
                                | |- inode_need_compress = true
                                | |- cow_file_range_async()
      btrfs_ioctl_set_flag()    |
      |- binode_flags |=        |
         BTRFS_INODE_NOCOMPRESS |
                                | compress_file_range()
                                | |- inode_need_compress = false
                                | |- nr_page = 1 while pages = NULL
                                | |  Then hit the crash
      
      [FIX]
      This patch will fix it by checking @pages before doing accessing it.
      This patch is only designed as a hot fix and easy to backport.
      
      More elegant fix may make btrfs only check inode_need_compress() once to
      avoid such race, but that would be another story.
      Reported-by: default avatarLuciano Chavez <chavez@us.ibm.com>
      Fixes: 4d3a800e ("btrfs: merge nr_pages input and output parameter in compress_pages")
      CC: stable@vger.kernel.org # 4.14.x: cecc8d90: btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      35c15768
    • Nikolay Borisov's avatar
      btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range · 2a3d84f1
      Nikolay Borisov authored
      [ Upstream commit cecc8d90 ]
      
      This label is only executed if compress_file_range fails to create an
      inline extent. So move its code in the semantically related inline
      extent handling branch. No functional changes.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a3d84f1
    • Josef Bacik's avatar
      btrfs: don't show full path of bind mounts in subvol= · f50a0aba
      Josef Bacik authored
      [ Upstream commit 3ef3959b ]
      
      Chris Murphy reported a problem where rpm ostree will bind mount a bunch
      of things for whatever voodoo it's doing.  But when it does this
      /proc/mounts shows something like
      
        /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
        /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo/bar 0 0
      
      Despite subvolid=256 being subvol=/foo.  This is because we're just
      spitting out the dentry of the mount point, which in the case of bind
      mounts is the source path for the mountpoint.  Instead we should spit
      out the path to the actual subvol.  Fix this by looking up the name for
      the subvolid we have mounted.  With this fix the same test looks like
      this
      
        /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
        /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
      Reported-by: default avatarChris Murphy <chris@colorremedies.com>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f50a0aba
    • Marcos Paulo de Souza's avatar
      btrfs: export helpers for subvolume name/id resolution · dd39b6f6
      Marcos Paulo de Souza authored
      [ Upstream commit c0c907a4 ]
      
      The functions will be used outside of export.c and super.c to allow
      resolving subvolume name from a given id, eg. for subvolume deletion by
      id ioctl.
      Signed-off-by: default avatarMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ split from the next patch ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dd39b6f6
    • Hugh Dickins's avatar
      khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() · 2ef7ebb1
      Hugh Dickins authored
      [ Upstream commit f3f99d63 ]
      
      syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
      __khugepaged_enter(): yes, when one thread is about to dump core, has set
      core_state, and is waiting for others, another might do something calling
      __khugepaged_enter(), which now crashes because I lumped the core_state
      test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
      think it's best to lump them together, so just in this exceptional case,
      check mm->mm_users directly instead of khugepaged_test_exit().
      
      Fixes: bbe98f9c ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008141503370.18085@eggly.anvilsSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2ef7ebb1
    • Hugh Dickins's avatar
      khugepaged: khugepaged_test_exit() check mmget_still_valid() · 17c08ee0
      Hugh Dickins authored
      [ Upstream commit bbe98f9c ]
      
      Move collapse_huge_page()'s mmget_still_valid() check into
      khugepaged_test_exit() itself.  collapse_huge_page() is used for anon THP
      only, and earned its mmget_still_valid() check because it inserts a huge
      pmd entry in place of the page table's pmd entry; whereas
      collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
      merely clears the page table's pmd entry.  But core dumping without mmap
      lock must have been as open to mistaking a racily cleared pmd entry for a
      page table at physical page 0, as exit_mmap() was.  And we certainly have
      no interest in mapping as a THP once dumping core.
      
      Fixes: 59ea6d06 ("coredump: fix race condition between collapse_huge_page() and core dumping")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvilsSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      17c08ee0
    • Masami Hiramatsu's avatar
      perf probe: Fix memory leakage when the probe point is not found · 6cb22ed4
      Masami Hiramatsu authored
      [ Upstream commit 12d572e7 ]
      
      Fix the memory leakage in debuginfo__find_trace_events() when the probe
      point is not found in the debuginfo. If there is no probe point found in
      the debuginfo, debuginfo__find_probes() will NOT return -ENOENT, but 0.
      
      Thus the caller of debuginfo__find_probes() must check the tf.ntevs and
      release the allocated memory for the array of struct probe_trace_event.
      
      The current code releases the memory only if the debuginfo__find_probes()
      hits an error but not checks tf.ntevs. In the result, the memory allocated
      on *tevs are not released if tf.ntevs == 0.
      
      This fixes the memory leakage by checking tf.ntevs == 0 in addition to
      ret < 0.
      
      Fixes: ff741783 ("perf probe: Introduce debuginfo to encapsulate dwarf information")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/159438668346.62703.10887420400718492503.stgit@devnote2Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6cb22ed4
    • Chris Wilson's avatar
      drm/vgem: Replace opencoded version of drm_gem_dumb_map_offset() · b93a3871
      Chris Wilson authored
      [ Upstream commit 119c53d2 ]
      
      drm_gem_dumb_map_offset() now exists and does everything
      vgem_gem_dump_map does and *ought* to do.
      
      In particular, vgem_gem_dumb_map() was trying to reject mmapping an
      imported dmabuf by checking the existence of obj->filp. Unfortunately,
      we always allocated an obj->filp, even if unused for an imported dmabuf.
      Instead, the drm_gem_dumb_map_offset(), since commit 90378e58
      ("drm/gem: drm_gem_dumb_map_offset(): reject dma-buf"), uses the
      obj->import_attach to reject such invalid mmaps.
      
      This prevents vgem from allowing userspace mmapping the dumb handle and
      attempting to incorrectly fault in remote pages belonging to another
      device, where there may not even be a struct page.
      
      v2: Use the default drm_gem_dumb_map_offset() callback
      
      Fixes: af33a919 ("drm/vgem: Enable dmabuf import interfaces")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: <stable@vger.kernel.org> # v4.13+
      Link: https://patchwork.freedesktop.org/patch/msgid/20200708154911.21236-1-chris@chris-wilson.co.ukSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      b93a3871
  2. 21 Aug, 2020 11 commits