1. 27 Mar, 2019 24 commits
    • Peter Zijlstra's avatar
      lib/int_sqrt: optimize small argument · e6008a05
      Peter Zijlstra authored
      commit 3f329570 upstream.
      
      The current int_sqrt() computation is sub-optimal for the case of small
      @x.  Which is the interesting case when we're going to do cumulative
      distribution functions on idle times, which we assume to be a random
      variable, where the target residency of the deepest idle state gives an
      upper bound on the variable (5e6ns on recent Intel chips).
      
      In the case of small @x, the compute loop:
      
      	while (m != 0) {
      		b = y + m;
      		y >>= 1;
      
      		if (x >= b) {
      			x -= b;
      			y += m;
      		}
      		m >>= 2;
      	}
      
      can be reduced to:
      
      	while (m > x)
      		m >>= 2;
      
      Because y==0, b==m and until x>=m y will remain 0.
      
      And while this is computationally equivalent, it runs much faster
      because there's less code, in particular less branches.
      
            cycles:                 branches:              branch-misses:
      
      OLD:
      
      hot:   45.109444 +- 0.044117  44.333392 +- 0.002254  0.018723 +- 0.000593
      cold: 187.737379 +- 0.156678  44.333407 +- 0.002254  6.272844 +- 0.004305
      
      PRE:
      
      hot:   67.937492 +- 0.064124  66.999535 +- 0.000488  0.066720 +- 0.001113
      cold: 232.004379 +- 0.332811  66.999527 +- 0.000488  6.914634 +- 0.006568
      
      POST:
      
      hot:   43.633557 +- 0.034373  45.333132 +- 0.002277  0.023529 +- 0.000681
      cold: 207.438411 +- 0.125840  45.333132 +- 0.002277  6.976486 +- 0.004219
      
      Averages computed over all values <128k using a LFSR to generate order.
      Cold numbers have a LFSR based branch trace buffer 'confuser' ran between
      each int_sqrt() invocation.
      
      Link: http://lkml.kernel.org/r/20171020164644.876503355@infradead.org
      Fixes: 30493cc9 ("lib/int_sqrt.c: optimize square root algorithm")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Suggested-by: default avatarAnshul Garg <aksgarg1989@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michael Davidson <md@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6008a05
    • Lanqing Liu's avatar
      serial: sprd: clear timeout interrupt only rather than all interrupts · 503b4cac
      Lanqing Liu authored
      commit 43507825 upstream.
      
      On Spreadtrum's serial device, nearly all of interrupts would be cleared
      by hardware except timeout interrupt.  This patch removed the operation
      of clearing all interrupt in irq handler, instead added an if statement
      to check if the timeout interrupt is supposed to be cleared.
      
      Wrongly clearing timeout interrupt would lead to uart data stay in rx
      fifo, that means the driver cannot read them out anymore.
      Signed-off-by: default avatarLanqing Liu <lanqing.liu@spreadtrum.com>
      Signed-off-by: default avatarChunyan Zhang <chunyan.zhang@spreadtrum.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      503b4cac
    • Qiao Zhou's avatar
      arm64: traps: disable irq in die() · fc421499
      Qiao Zhou authored
      commit 6f44a0ba upstream.
      
      In current die(), the irq is disabled for __die() handle, not
      including the possible panic() handling. Since the log in __die()
      can take several hundreds ms, new irq might come and interrupt
      current die().
      
      If the process calling die() holds some critical resource, and some
      other process scheduled later also needs it, then it would deadlock.
      The first panic will not be executed.
      
      So here disable irq for the whole flow of die().
      Signed-off-by: default avatarQiao Zhou <qiaozhou@asrmicro.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc421499
    • Al Viro's avatar
      Hang/soft lockup in d_invalidate with simultaneous calls · 1987172d
      Al Viro authored
      commit 81be24d2 upstream.
      
      It's not hard to trigger a bunch of d_invalidate() on the same
      dentry in parallel.  They end up fighting each other - any
      dentry picked for removal by one will be skipped by the rest
      and we'll go for the next iteration through the entire
      subtree, even if everything is being skipped.  Morevoer, we
      immediately go back to scanning the subtree.  The only thing
      we really need is to dissolve all mounts in the subtree and
      as soon as we've nothing left to do, we can just unhash the
      dentry and bugger off.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1987172d
    • Wei Qiao's avatar
      serial: sprd: adjust TIMEOUT to a big value · 356b5e16
      Wei Qiao authored
      commit e1dc9b08 upstream.
      
      SPRD_TIMEOUT was 256, which is too small to wait until the status
      switched to workable in a while loop, so that the earlycon could
      not work correctly.
      Signed-off-by: default avatarWei Qiao <wei.qiao@spreadtrum.com>
      Signed-off-by: default avatarChunyan Zhang <chunyan.zhang@spreadtrum.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      356b5e16
    • Eric Dumazet's avatar
      tcp/dccp: drop SYN packets if accept queue is full · dfe4f69f
      Eric Dumazet authored
      commit 5ea8ea2c upstream.
      
      Per listen(fd, backlog) rules, there is really no point accepting a SYN,
      sending a SYNACK, and dropping the following ACK packet if accept queue
      is full, because application is not draining accept queue fast enough.
      
      This behavior is fooling TCP clients that believe they established a
      flow, while there is nothing at server side. They might then send about
      10 MSS (if using IW10) that will be dropped anyway while server is under
      stress.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfe4f69f
    • Hui Wang's avatar
      ALSA: hda - Enforces runtime_resume after S3 and S4 for each codec · 9e9e3a46
      Hui Wang authored
      commit b5a236c1 upstream.
      
      Recently we found the audio jack detection stop working after suspend
      on many machines with Realtek codec. Sometimes the audio selection
      dialogue didn't show up after users plugged headhphone/headset into
      the headset jack, sometimes after uses plugged headphone/headset, then
      click the sound icon on the upper-right corner of gnome-desktop, it
      also showed the speaker rather than the headphone.
      
      The root cause is that before suspend, the codec already call the
      runtime_suspend since this codec is not used by any apps, then in
      resume, it will not call runtime_resume for this codec. But for some
      realtek codec (so far, alc236, alc255 and alc891) with the specific
      BIOS, if it doesn't run runtime_resume after suspend, all codec
      functions including jack detection stop working anymore.
      
      This problem existed for a long time, but it was not exposed, that is
      because when problem happens, if users play sound or open
      sound-setting to check audio device, this will trigger calling to
      runtime_resume (via snd_hda_power_up), then the codec starts working
      again before users notice this problem.
      
      Since we don't know how many codec and BIOS combinations have this
      problem, to fix it, let the driver call runtime_resume for all codecs
      in pm_resume, maybe for some codecs, this is not needed, but it is
      harmless. After a codec is runtime resumed, if it is not used by any
      apps, it will be runtime suspended soon and furthermore we don't run
      suspend frequently, this change will not add much power consumption.
      
      Fixes: cc72da7d ("ALSA: hda - Use standard runtime PM for codec power-save control")
      Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e9e3a46
    • Takashi Iwai's avatar
      ALSA: hda - Record the current power state before suspend/resume calls · 5ee86945
      Takashi Iwai authored
      commit 98081ca6 upstream.
      
      Currently we deal with single codec and suspend codec callbacks for
      all S3, S4 and runtime PM handling.  But it turned out that we want
      distinguish the call patterns sometimes, e.g. for applying some init
      sequence only at probing and restoring from hibernate.
      
      This patch slightly modifies the common PM callbacks for HD-audio
      codec and stores the currently processed PM event in power_state of
      the codec's device.power field, which is currently unused.  The codec
      callback can take a look at this event value and judges which purpose
      it's being called.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ee86945
    • Waiman Long's avatar
      locking/lockdep: Add debug_locks check in __lock_downgrade() · 670d934a
      Waiman Long authored
      commit 71492580 upstream.
      
      Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
      warning right after a previous lockdep warning. It is likely that the
      previous warning turned off lock debugging causing the lockdep to have
      inconsistency states leading to the lock downgrade warning.
      
      Fix that by add a check for debug_locks at the beginning of
      __lock_downgrade().
      Debugged-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Reported-by: syzbot+53383ae265fb161ef488@syzkaller.appspotmail.com
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      670d934a
    • Myungho Jung's avatar
      Bluetooth: Fix decrementing reference count twice in releasing socket · bd140b03
      Myungho Jung authored
      commit e20a2e9c upstream.
      
      When releasing socket, it is possible to enter hci_sock_release() and
      hci_sock_dev_event(HCI_DEV_UNREG) at the same time in different thread.
      The reference count of hdev should be decremented only once from one of
      them but if storing hdev to local variable in hci_sock_release() before
      detached from socket and setting to NULL in hci_sock_dev_event(),
      hci_dev_put(hdev) is unexpectedly called twice. This is resolved by
      referencing hdev from socket after bt_sock_unlink() in
      hci_sock_release().
      
      Reported-by: syzbot+fdc00003f4efff43bc5b@syzkaller.appspotmail.com
      Signed-off-by: default avatarMyungho Jung <mhjungk@gmail.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd140b03
    • Hans Verkuil's avatar
      media: v4l2-ctrls.c/uvc: zero v4l2_event · d54a3963
      Hans Verkuil authored
      commit f45f3f75 upstream.
      
      Control events can leak kernel memory since they do not fully zero the
      event. The same code is present in both v4l2-ctrls.c and uvc_ctrl.c, so
      fix both.
      
      It appears that all other event code is properly zeroing the structure,
      it's these two places.
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Reported-by: syzbot+4f021cf3697781dbd9fb@syzkaller.appspotmail.com
      Reviewed-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d54a3963
    • zhangyi (F)'s avatar
      ext4: brelse all indirect buffer in ext4_ind_remove_space() · 7002d0e5
      zhangyi (F) authored
      commit 674a2b27 upstream.
      
      All indirect buffers get by ext4_find_shared() should be released no
      mater the branch should be freed or not. But now, we forget to release
      the lower depth indirect buffers when removing space from the same
      higher depth indirect block. It will lead to buffer leak and futher
      more, it may lead to quota information corruption when using old quota,
      consider the following case.
      
       - Create and mount an empty ext4 filesystem without extent and quota
         features,
       - quotacheck and enable the user & group quota,
       - Create some files and write some data to them, and then punch hole
         to some files of them, it may trigger the buffer leak problem
         mentioned above.
       - Disable quota and run quotacheck again, it will create two new
         aquota files and write the checked quota information to them, which
         probably may reuse the freed indirect block(the buffer and page
         cache was not freed) as data block.
       - Enable quota again, it will invoke
         vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
         buffers and pagecache. Unfortunately, because of the buffer of quota
         data block is still referenced, quota code cannot read the up to date
         quota info from the device and lead to quota information corruption.
      
      This problem can be reproduced by xfstests generic/231 on ext3 file
      system or ext4 file system without extent and quota features.
      
      This patch fix this problem by releasing the missing indirect buffers,
      in ext4_ind_remove_space().
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7002d0e5
    • Lukas Czerner's avatar
      ext4: fix data corruption caused by unaligned direct AIO · 8651fa1e
      Lukas Czerner authored
      commit 372a03e0 upstream.
      
      Ext4 needs to serialize unaligned direct AIO because the zeroing of
      partial blocks of two competing unaligned AIOs can result in data
      corruption.
      
      However it decides not to serialize if the potentially unaligned aio is
      past i_size with the rationale that no pending writes are possible past
      i_size. Unfortunately if the i_size is not block aligned and the second
      unaligned write lands past i_size, but still into the same block, it has
      the potential of corrupting the previous unaligned write to the same
      block.
      
      This is (very simplified) reproducer from Frank
      
          // 41472 = (10 * 4096) + 512
          // 37376 = 41472 - 4096
      
          ftruncate(fd, 41472);
          io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
          io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);
      
          io_submit(io_ctx, 1, &iocbs[1]);
          io_submit(io_ctx, 1, &iocbs[2]);
      
          io_getevents(io_ctx, 2, 2, events, NULL);
      
      Without this patch the 512B range from 40960 up to the start of the
      second unaligned write (41472) is going to be zeroed overwriting the data
      written by the first write. This is a data corruption.
      
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
      *
      0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
      
      With this patch the data corruption is avoided because we will recognize
      the unaligned_aio and wait for the unwritten extent conversion.
      
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
      *
      0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
      *
      0000b200
      Reported-by: default avatarFrank Sorenson <fsorenso@redhat.com>
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Fixes: e9e3bcec ("ext4: serialize unaligned asynchronous DIO")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8651fa1e
    • Jiufei Xue's avatar
      ext4: fix NULL pointer dereference while journal is aborted · d9f0ce85
      Jiufei Xue authored
      commit fa30dde3 upstream.
      
      We see the following NULL pointer dereference while running xfstests
      generic/475:
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
      Oops: 0000 [#1] SMP PTI
      CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
      RIP: 0010:ext4_do_update_inode+0x4ec/0x760
      ...
      Call Trace:
      ? jbd2_journal_get_write_access+0x42/0x50
      ? __ext4_journal_get_write_access+0x2c/0x70
      ? ext4_truncate+0x186/0x3f0
      ext4_mark_iloc_dirty+0x61/0x80
      ext4_mark_inode_dirty+0x62/0x1b0
      ext4_truncate+0x186/0x3f0
      ? unmap_mapping_pages+0x56/0x100
      ext4_setattr+0x817/0x8b0
      notify_change+0x1df/0x430
      do_truncate+0x5e/0x90
      ? generic_permission+0x12b/0x1a0
      
      This is triggered because the NULL pointer handle->h_transaction was
      dereferenced in function ext4_update_inode_fsync_trans().
      I found that the h_transaction was set to NULL in jbd2__journal_restart
      but failed to attached to a new transaction while the journal is aborted.
      
      Fix this by checking the handle before updating the inode.
      
      Fixes: b436b9be ("ext4: Wait for proper transaction commit on fsync")
      Signed-off-by: default avatarJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9f0ce85
    • Josh Poimboeuf's avatar
      objtool: Move objtool_file struct off the stack · c818b2fa
      Josh Poimboeuf authored
      commit 0c671812 upstream.
      
      Objtool uses over 512k of stack, thanks to the hash table embedded in
      the objtool_file struct.  This causes an unnecessarily large stack
      allocation and breaks users with low stack limits.
      
      Move the struct off the stack.
      
      Fixes: 042ba73f ("objtool: Add several performance improvements")
      Reported-by: default avatarVassili Karpov <moosotc@gmail.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/df92dcbc4b84b02ffa252f46876df125fb56e2d7.1552954176.git.jpoimboe@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c818b2fa
    • Chen Jie's avatar
      futex: Ensure that futex address is aligned in handle_futex_death() · 726c28f3
      Chen Jie authored
      commit 5a07168d upstream.
      
      The futex code requires that the user space addresses of futexes are 32bit
      aligned. sys_futex() checks this in futex_get_keys() but the robust list
      code has no alignment check in place.
      
      As a consequence the kernel crashes on architectures with strict alignment
      requirements in handle_futex_death() when trying to cmpxchg() on an
      unaligned futex address which was retrieved from the robust list.
      
      [ tglx: Rewrote changelog, proper sizeof() based alignement check and add
        	comment ]
      
      Fixes: 0771dfef ("[PATCH] lightweight robust futexes: core")
      Signed-off-by: default avatarChen Jie <chenjie6@huawei.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <dvhart@infradead.org>
      Cc: <peterz@infradead.org>
      Cc: <zengweilin@huawei.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1552621478-119787-1-git-send-email-chenjie6@huawei.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      726c28f3
    • Archer Yan's avatar
      MIPS: Fix kernel crash for R6 in jump label branch function · b84089b2
      Archer Yan authored
      commit 47c25036 upstream.
      
      Insert Branch instruction instead of NOP to make sure assembler don't
      patch code in forbidden slot. In jump label function, it might
      be possible to patch Control Transfer Instructions(CTIs) into
      forbidden slot, which will generate Reserved Instruction exception
      in MIPS release 6.
      Signed-off-by: default avatarArcher Yan <ayan@wavecomp.com>
      Reviewed-by: default avatarPaul Burton <paul.burton@mips.com>
      [paul.burton@mips.com:
        - Add MIPS prefix to subject.
        - Mark for stable from v4.0, which introduced r6 support, onwards.]
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b84089b2
    • Yasha Cherikovsky's avatar
      MIPS: Ensure ELF appended dtb is relocated · c7ac334f
      Yasha Cherikovsky authored
      commit 3f0a53bc upstream.
      
      This fixes booting with the combination of CONFIG_RELOCATABLE=y
      and CONFIG_MIPS_ELF_APPENDED_DTB=y.
      
      Sections that appear after the relocation table are not relocated
      on system boot (except .bss, which has special handling).
      
      With CONFIG_MIPS_ELF_APPENDED_DTB, the dtb is part of the
      vmlinux ELF, so it must be relocated together with everything else.
      
      Fixes: 069fd766 ("MIPS: Reserve space for relocation table")
      Signed-off-by: default avatarYasha Cherikovsky <yasha.che3@gmail.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7ac334f
    • Yifeng Li's avatar
      mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction. · f6b55e77
      Yifeng Li authored
      commit 5f5f67da upstream.
      
      Timekeeping IRQs from CS5536 MFGPT are routed to i8259, which then
      triggers the "cascade" IRQ on MIPS CPU. Without IRQF_NO_SUSPEND in
      cascade_irqaction, MFGPT interrupts will be masked in suspend mode,
      and the machine would be unable to resume once suspended.
      
      Previously, MIPS IRQs were not disabled properly, so the original
      code appeared to work. Commit a3e6c1ef ("MIPS: IRQ: Fix disable_irq on
      CPU IRQs") uncovers the bug. To fix it, add IRQF_NO_SUSPEND to
      cascade_irqaction.
      
      This commit is functionally identical to 0add9c2f ("MIPS:
      Loongson-3: Add IRQF_NO_SUSPEND to Cascade irqaction"), but it forgot
      to apply the same fix to Loongson2.
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org # v3.19+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6b55e77
    • Jan Kara's avatar
      udf: Fix crash on IO error during truncate · 92477062
      Jan Kara authored
      commit d3ca4651 upstream.
      
      When truncate(2) hits IO error when reading indirect extent block the
      code just bugs with:
      
      kernel BUG at linux-4.15.0/fs/udf/truncate.c:249!
      ...
      
      Fix the problem by bailing out cleanly in case of IO error.
      
      CC: stable@vger.kernel.org
      Reported-by: default avatarjean-luc malet <jeanluc.malet@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92477062
    • Ilya Dryomov's avatar
      libceph: wait for latest osdmap in ceph_monc_blacklist_add() · 9c32ada4
      Ilya Dryomov authored
      commit bb229bbb upstream.
      
      Because map updates are distributed lazily, an OSD may not know about
      the new blacklist for quite some time after "osd blacklist add" command
      is completed.  This makes it possible for a blacklisted but still alive
      client to overwrite a post-blacklist update, resulting in data
      corruption.
      
      Waiting for latest osdmap in ceph_monc_blacklist_add() and thus using
      the post-blacklist epoch for all post-blacklist requests ensures that
      all such requests "wait" for the blacklist to come into force on their
      respective OSDs.
      
      Cc: stable@vger.kernel.org
      Fixes: 6305a3b4 ("libceph: support for blacklisting clients")
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJason Dillaman <dillaman@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c32ada4
    • Stanislaw Gruszka's avatar
      iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE · cfa2d25f
      Stanislaw Gruszka authored
      commit 4e50ce03 upstream.
      
      Take into account that sg->offset can be bigger than PAGE_SIZE when
      setting segment sg->dma_address. Otherwise sg->dma_address will point
      at diffrent page, what makes DMA not possible with erros like this:
      
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa70c0 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7040 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7080 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7100 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7000 flags=0x0020]
      
      Additinally with wrong sg->dma_address unmap_sg will free wrong pages,
      what what can cause crashes like this:
      
      Feb 28 19:27:45 kernel: BUG: Bad page state in process cinnamon  pfn:39e8b1
      Feb 28 19:27:45 kernel: Disabling lock debugging due to kernel taint
      Feb 28 19:27:45 kernel: flags: 0x2ffff0000000000()
      Feb 28 19:27:45 kernel: raw: 02ffff0000000000 0000000000000000 ffffffff00000301 0000000000000000
      Feb 28 19:27:45 kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      Feb 28 19:27:45 kernel: page dumped because: nonzero _refcount
      Feb 28 19:27:45 kernel: Modules linked in: ccm fuse arc4 nct6775 hwmon_vid amdgpu nls_iso8859_1 nls_cp437 edac_mce_amd vfat fat kvm_amd ccp rng_core kvm mt76x0u mt76x0_common mt76x02_usb irqbypass mt76_usb mt76x02_lib mt76 crct10dif_pclmul crc32_pclmul chash mac80211 amd_iommu_v2 ghash_clmulni_intel gpu_sched i2c_algo_bit ttm wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper snd_hda_codec_hdmi snd_hda_intel drm snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd snd_pcm cfg80211 cryptd mousedev snd_timer glue_helper pcspkr r8169 input_leds realtek agpgart libphy rfkill snd syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore sp5100_tco k10temp i2c_piix4 wmi evdev gpio_amdpt pinctrl_amd mac_hid pcc_cpufreq acpi_cpufreq sg ip_tables x_tables ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) serio_raw(E) atkbd(E) libps2(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) xhci_pci(E) xhci_hcd(E)
      Feb 28 19:27:45 kernel:  scsi_mod(E) i8042(E) serio(E) bcache(E) crc64(E)
      Feb 28 19:27:45 kernel: CPU: 2 PID: 896 Comm: cinnamon Tainted: G    B   W   E     4.20.12-arch1-1-custom #1
      Feb 28 19:27:45 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P1.20 06/26/2018
      Feb 28 19:27:45 kernel: Call Trace:
      Feb 28 19:27:45 kernel:  dump_stack+0x5c/0x80
      Feb 28 19:27:45 kernel:  bad_page.cold.29+0x7f/0xb2
      Feb 28 19:27:45 kernel:  __free_pages_ok+0x2c0/0x2d0
      Feb 28 19:27:45 kernel:  skb_release_data+0x96/0x180
      Feb 28 19:27:45 kernel:  __kfree_skb+0xe/0x20
      Feb 28 19:27:45 kernel:  tcp_recvmsg+0x894/0xc60
      Feb 28 19:27:45 kernel:  ? reuse_swap_page+0x120/0x340
      Feb 28 19:27:45 kernel:  ? ptep_set_access_flags+0x23/0x30
      Feb 28 19:27:45 kernel:  inet_recvmsg+0x5b/0x100
      Feb 28 19:27:45 kernel:  __sys_recvfrom+0xc3/0x180
      Feb 28 19:27:45 kernel:  ? handle_mm_fault+0x10a/0x250
      Feb 28 19:27:45 kernel:  ? syscall_trace_enter+0x1d3/0x2d0
      Feb 28 19:27:45 kernel:  ? __audit_syscall_exit+0x22a/0x290
      Feb 28 19:27:45 kernel:  __x64_sys_recvfrom+0x24/0x30
      Feb 28 19:27:45 kernel:  do_syscall_64+0x5b/0x170
      Feb 28 19:27:45 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Cc: stable@vger.kernel.org
      Reported-and-tested-by: default avatarJan Viktorin <jan.viktorin@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Fixes: 80187fd3 ('iommu/amd: Optimize map_sg and unmap_sg')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfa2d25f
    • Thomas Zimmermann's avatar
      drm/vmwgfx: Don't double-free the mode stored in par->set_mode · eea23ca2
      Thomas Zimmermann authored
      commit c2d31155 upstream.
      
      When calling vmw_fb_set_par(), the mode stored in par->set_mode gets free'd
      twice. The first free is in vmw_fb_kms_detach(), the second is near the
      end of vmw_fb_set_par() under the name of 'old_mode'. The mode-setting code
      only works correctly if the mode doesn't actually change. Removing
      'old_mode' in favor of using par->set_mode directly fixes the problem.
      
      Cc: <stable@vger.kernel.org>
      Fixes: a278724a ("drm/vmwgfx: Implement fbdev on kms v2")
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Reviewed-by: default avatarDeepak Rawat <drawat@vmware.com>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eea23ca2
    • Arnd Bergmann's avatar
      mmc: pxamci: fix enum type confusion · c179e6de
      Arnd Bergmann authored
      commit e60a582b upstream.
      
      clang points out several instances of mismatched types in this drivers,
      all coming from a single declaration:
      
      drivers/mmc/host/pxamci.c:193:15: error: implicit conversion from enumeration type 'enum dma_transfer_direction' to
            different enumeration type 'enum dma_data_direction' [-Werror,-Wenum-conversion]
                      direction = DMA_DEV_TO_MEM;
                                ~ ^~~~~~~~~~~~~~
      drivers/mmc/host/pxamci.c:212:62: error: implicit conversion from enumeration type 'enum dma_data_direction' to
            different enumeration type 'enum dma_transfer_direction' [-Werror,-Wenum-conversion]
              tx = dmaengine_prep_slave_sg(chan, data->sg, host->dma_len, direction,
      
      The behavior is correct, so this must be a simply typo from
      dma_data_direction and dma_transfer_direction being similarly named
      types with a similar purpose.
      
      Fixes: 6464b714 ("mmc: pxamci: switch over to dmaengine use")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Acked-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c179e6de
  2. 23 Mar, 2019 16 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.165 · 1c453afc
      Greg Kroah-Hartman authored
      1c453afc
    • Wanpeng Li's avatar
      KVM: X86: Fix residual mmio emulation request to userspace · 5e29da06
      Wanpeng Li authored
      commit bbeac283 upstream.
      
      Reported by syzkaller:
      
      The kvm-intel.unrestricted_guest=0
      
         WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         CPU: 5 PID: 1014 Comm: warn_test Tainted: G        W  OE   4.13.0-rc3+ #8
         RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         Call Trace:
          ? put_pid+0x3a/0x50
          ? rcu_read_lock_sched_held+0x79/0x80
          ? kmem_cache_free+0x2f2/0x350
          kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? __fget+0xfc/0x210
          do_vfs_ioctl+0xa4/0x6a0
          ? __fget+0x11d/0x210
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x23/0xc2
          ? __this_cpu_preempt_check+0x13/0x20
      
      The syszkaller folks reported a residual mmio emulation request to userspace
      due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and
      incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true
      and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs
      several threads to launch the same vCPU, the thread which lauch this vCPU after
      the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will
      trigger the warning.
      
         #define _GNU_SOURCE
         #include <pthread.h>
         #include <stdio.h>
         #include <stdlib.h>
         #include <string.h>
         #include <sys/wait.h>
         #include <sys/types.h>
         #include <sys/stat.h>
         #include <sys/mman.h>
         #include <fcntl.h>
         #include <unistd.h>
         #include <linux/kvm.h>
         #include <stdio.h>
      
         int kvmcpu;
         struct kvm_run *run;
      
         void* thr(void* arg)
         {
           int res;
           res = ioctl(kvmcpu, KVM_RUN, 0);
           printf("ret1=%d exit_reason=%d suberror=%d\n",
               res, run->exit_reason, run->internal.suberror);
           return 0;
         }
      
         void test()
         {
           int i, kvm, kvmvm;
           pthread_t th[4];
      
           kvm = open("/dev/kvm", O_RDWR);
           kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
           kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
           run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0);
           srand(getpid());
           for (i = 0; i < 4; i++) {
             pthread_create(&th[i], 0, thr, 0);
             usleep(rand() % 10000);
           }
           for (i = 0; i < 4; i++)
             pthread_join(th[i], 0);
         }
      
         int main()
         {
           for (;;) {
             int pid = fork();
             if (pid < 0)
               exit(1);
             if (pid == 0) {
               test();
               exit(0);
             }
             int status;
             while (waitpid(pid, &status, __WALL) != pid) {}
           }
           return 0;
         }
      
      This patch fixes it by resetting the vcpu->mmio_needed once we receive
      the triple fault to avoid the residue.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Zubin Mithra <zsm@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e29da06
    • Sean Christopherson's avatar
      KVM: nVMX: Ignore limit checks on VMX instructions using flat segments · 7b3c6c48
      Sean Christopherson authored
      commit 34333cc6 upstream.
      
      Regarding segments with a limit==0xffffffff, the SDM officially states:
      
          When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
          or may not cause the indicated exceptions.  Behavior is
          implementation-specific and may vary from one execution to another.
      
      In practice, all CPUs that support VMX ignore limit checks for "flat
      segments", i.e. an expand-up data or code segment with base=0 and
      limit=0xffffffff.  This is subtly different than wrapping the effective
      address calculation based on the address size, as the flat segment
      behavior also applies to accesses that would wrap the 4g boundary, e.g.
      a 4-byte access starting at 0xffffffff will access linear addresses
      0xffffffff, 0x0, 0x1 and 0x2.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b3c6c48
    • Sean Christopherson's avatar
      KVM: nVMX: Sign extend displacements of VMX instr's mem operands · 9748354a
      Sean Christopherson authored
      commit 946c522b upstream.
      
      The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
      operands for various instructions, including VMX instructions, as a
      naturally sized unsigned value, but masks the value by the addr size,
      e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
      reported as 0xffffffd8 for a 32-bit address size.  Despite some weird
      wording regarding sign extension, the SDM explicitly states that bits
      beyond the instructions address size are undefined:
      
          In all cases, bits of this field beyond the instructionâ€s address
          size are undefined.
      
      Failure to sign extend the displacement results in KVM incorrectly
      treating a negative displacement as a large positive displacement when
      the address size of the VMX instruction is smaller than KVM's native
      size, e.g. a 32-bit address size on a 64-bit KVM.
      
      The very original decoding, added by commit 064aea77 ("KVM: nVMX:
      Decoding memory operands of VMX instructions"), sort of modeled sign
      extension by truncating the final virtual/linear address for a 32-bit
      address size.  I.e. it messed up the effective address but made it work
      by adjusting the final address.
      
      When segmentation checks were added, the truncation logic was kept
      as-is and no sign extension logic was introduced.  In other words, it
      kept calculating the wrong effective address while mostly generating
      the correct virtual/linear address.  As the effective address is what's
      used in the segment limit checks, this results in KVM incorreclty
      injecting #GP/#SS faults due to non-existent segment violations when
      a nested VMM uses negative displacements with an address size smaller
      than KVM's native address size.
      
      Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
      KVM using 0x100000fd8 as the effective address when checking for a
      segment limit violation.  This causes a 100% failure rate when running
      a 32-bit KVM build as L1 on top of a 64-bit KVM L0.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9748354a
    • Gustavo A. R. Silva's avatar
      drm/radeon/evergreen_cs: fix missing break in switch statement · 45fe916e
      Gustavo A. R. Silva authored
      commit cc5034a5 upstream.
      
      Add missing break statement in order to prevent the code from falling
      through to case CB_TARGET_MASK.
      
      This bug was found thanks to the ongoing efforts to enable
      -Wimplicit-fallthrough.
      
      Fixes: dd220a00 ("drm/radeon/kms: add support for streamout v7")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45fe916e
    • Sakari Ailus's avatar
      media: uvcvideo: Avoid NULL pointer dereference at the end of streaming · 7e1b5809
      Sakari Ailus authored
      commit 9dd0627d upstream.
      
      The UVC video driver converts the timestamp from hardware specific unit
      to one known by the kernel at the time when the buffer is dequeued. This
      is fine in general, but the streamoff operation consists of the
      following steps (among other things):
      
      1. uvc_video_clock_cleanup --- the hardware clock sample array is
         released and the pointer to the array is set to NULL,
      
      2. buffers in active state are returned to the user and
      
      3. buf_finish callback is called on buffers that are prepared.
         buf_finish includes calling uvc_video_clock_update that accesses the
         hardware clock sample array.
      
      The above is serialised by a queue specific mutex. Address the problem
      by skipping the clock conversion if the hardware clock sample array is
      already released.
      
      Fixes: 9c0863b1 ("[media] vb2: call buf_finish from __queue_cancel")
      Reported-by: default avatarChiranjeevi Rapolu <chiranjeevi.rapolu@intel.com>
      Tested-by: default avatarChiranjeevi Rapolu <chiranjeevi.rapolu@intel.com>
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e1b5809
    • Zhang, Jun's avatar
      rcu: Do RCU GP kthread self-wakeup from softirq and interrupt · 3b2bbd1b
      Zhang, Jun authored
      commit 1d1f898d upstream.
      
      The rcu_gp_kthread_wake() function is invoked when it might be necessary
      to wake the RCU grace-period kthread.  Because self-wakeups are normally
      a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
      this kthread, it naturally refuses to do the wakeup.
      
      Unfortunately, natural though it might be, this heuristic fails when
      rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
      that interrupted the grace-period kthread just after the final check of
      the wait-event condition but just before the schedule() call.  In this
      case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
      is within the RCU grace-period kthread's context.  Failing to provide
      this wakeup can result in grace periods failing to start, which in turn
      results in out-of-memory conditions.
      
      This race window is quite narrow, but it actually did happen during real
      testing.  It would of course need to be fixed even if it was strictly
      theoretical in nature.
      
      This patch does not Cc stable because it does not apply cleanly to
      earlier kernel versions.
      
      Fixes: 48a7639c ("rcu: Make callers awaken grace-period kthread")
      Reported-by: default avatar"He, Bo" <bo.he@intel.com>
      Co-developed-by: default avatar"Zhang, Jun" <jun.zhang@intel.com>
      Co-developed-by: default avatar"He, Bo" <bo.he@intel.com>
      Co-developed-by: default avatar"xiao, jin" <jin.xiao@intel.com>
      Co-developed-by: default avatarBai, Jie A <jie.a.bai@intel.com>
      Signed-off: "Zhang, Jun" <jun.zhang@intel.com>
      Signed-off: "He, Bo" <bo.he@intel.com>
      Signed-off: "xiao, jin" <jin.xiao@intel.com>
      Signed-off: Bai, Jie A <jie.a.bai@intel.com>
      Signed-off-by: default avatar"Zhang, Jun" <jun.zhang@intel.com>
      [ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
        !in_serving_softirq() to avoid redundant wakeups and to also handle the
        interrupt-handler scenario as well as the softirq-handler scenario that
        actually occurred in testing. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      3b2bbd1b
    • Aditya Pakki's avatar
      md: Fix failed allocation of md_register_thread · f61b68e1
      Aditya Pakki authored
      commit e406f12d upstream.
      
      mddev->sync_thread can be set to NULL on kzalloc failure downstream.
      The patch checks for such a scenario and frees allocated resources.
      
      Committer node:
      
      Added similar fix to raid5.c, as suggested by Guoqing.
      
      Cc: stable@vger.kernel.org # v3.16+
      Acked-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarAditya Pakki <pakki001@umn.edu>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f61b68e1
    • Adrian Hunter's avatar
      perf intel-pt: Fix divide by zero when TSC is not available · 5ed7a8f6
      Adrian Hunter authored
      commit 07633387 upstream.
      
      When TSC is not available, "timeless" decoding is used but a divide by
      zero occurs if perf_time_to_tsc() is called.
      
      Ensure the divisor is not zero.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.9+
      Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ed7a8f6
    • Adrian Hunter's avatar
      perf intel-pt: Fix overlap calculation for padding · 4f7c16b5
      Adrian Hunter authored
      commit 5a99d99e upstream.
      
      Auxtrace records might have up to 7 bytes of padding appended. Adjust
      the overlap accordingly.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190206103947.15750-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f7c16b5
    • Adrian Hunter's avatar
      perf auxtrace: Define auxtrace record alignment · 300ef83e
      Adrian Hunter authored
      commit c3fcadf0 upstream.
      
      Define auxtrace record alignment so that it can be referenced elsewhere.
      
      Note this is preparation for patch "perf intel-pt: Fix overlap calculation
      for padding"
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190206103947.15750-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      300ef83e
    • Adrian Hunter's avatar
      perf intel-pt: Fix CYC timestamp calculation after OVF · d07d5160
      Adrian Hunter authored
      commit 03997612 upstream.
      
      CYC packet timestamp calculation depends upon CBR which was being
      cleared upon overflow (OVF). That can cause errors due to failing to
      synchronize with sideband events. Even if a CBR change has been lost,
      the old CBR is still a better estimate than zero. So remove the clearing
      of CBR.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190206103947.15750-4-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d07d5160
    • Daniel Axtens's avatar
      bcache: never writeback a discard operation · 7fb9a25c
      Daniel Axtens authored
      commit 9951379b upstream.
      
      Some users see panics like the following when performing fstrim on a
      bcached volume:
      
      [  529.803060] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [  530.183928] #PF error: [normal kernel read fault]
      [  530.412392] PGD 8000001f42163067 P4D 8000001f42163067 PUD 1f42168067 PMD 0
      [  530.750887] Oops: 0000 [#1] SMP PTI
      [  530.920869] CPU: 10 PID: 4167 Comm: fstrim Kdump: loaded Not tainted 5.0.0-rc1+ #3
      [  531.290204] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
      [  531.693137] RIP: 0010:blk_queue_split+0x148/0x620
      [  531.922205] Code: 60 38 89 55 a0 45 31 db 45 31 f6 45 31 c9 31 ff 89 4d 98 85 db 0f 84 7f 04 00 00 44 8b 6d 98 4c 89 ee 48 c1 e6 04 49 03 70 78 <8b> 46 08 44 8b 56 0c 48
      8b 16 44 29 e0 39 d8 48 89 55 a8 0f 47 c3
      [  532.838634] RSP: 0018:ffffb9b708df39b0 EFLAGS: 00010246
      [  533.093571] RAX: 00000000ffffffff RBX: 0000000000046000 RCX: 0000000000000000
      [  533.441865] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000000
      [  533.789922] RBP: ffffb9b708df3a48 R08: ffff940d3b3fdd20 R09: 0000000000000000
      [  534.137512] R10: ffffb9b708df3958 R11: 0000000000000000 R12: 0000000000000000
      [  534.485329] R13: 0000000000000000 R14: 0000000000000000 R15: ffff940d39212020
      [  534.833319] FS:  00007efec26e3840(0000) GS:ffff940d1f480000(0000) knlGS:0000000000000000
      [  535.224098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  535.504318] CR2: 0000000000000008 CR3: 0000001f4e256004 CR4: 00000000001606e0
      [  535.851759] Call Trace:
      [  535.970308]  ? mempool_alloc_slab+0x15/0x20
      [  536.174152]  ? bch_data_insert+0x42/0xd0 [bcache]
      [  536.403399]  blk_mq_make_request+0x97/0x4f0
      [  536.607036]  generic_make_request+0x1e2/0x410
      [  536.819164]  submit_bio+0x73/0x150
      [  536.980168]  ? submit_bio+0x73/0x150
      [  537.149731]  ? bio_associate_blkg_from_css+0x3b/0x60
      [  537.391595]  ? _cond_resched+0x1a/0x50
      [  537.573774]  submit_bio_wait+0x59/0x90
      [  537.756105]  blkdev_issue_discard+0x80/0xd0
      [  537.959590]  ext4_trim_fs+0x4a9/0x9e0
      [  538.137636]  ? ext4_trim_fs+0x4a9/0x9e0
      [  538.324087]  ext4_ioctl+0xea4/0x1530
      [  538.497712]  ? _copy_to_user+0x2a/0x40
      [  538.679632]  do_vfs_ioctl+0xa6/0x600
      [  538.853127]  ? __do_sys_newfstat+0x44/0x70
      [  539.051951]  ksys_ioctl+0x6d/0x80
      [  539.212785]  __x64_sys_ioctl+0x1a/0x20
      [  539.394918]  do_syscall_64+0x5a/0x110
      [  539.568674]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      We have observed it where both:
      1) LVM/devmapper is involved (bcache backing device is LVM volume) and
      2) writeback cache is involved (bcache cache_mode is writeback)
      
      On one machine, we can reliably reproduce it with:
      
       # echo writeback > /sys/block/bcache0/bcache/cache_mode
         (not sure whether above line is required)
       # mount /dev/bcache0 /test
       # for i in {0..10}; do
      	file="$(mktemp /test/zero.XXX)"
      	dd if=/dev/zero of="$file" bs=1M count=256
      	sync
      	rm $file
          done
        # fstrim -v /test
      
      Observing this with tracepoints on, we see the following writes:
      
      fstrim-18019 [022] .... 91107.302026: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4260112 + 196352 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302050: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4456464 + 262144 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302075: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4718608 + 81920 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302094: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5324816 + 180224 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302121: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5505040 + 262144 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302145: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5767184 + 81920 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.308777: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 6373392 + 180224 hit 1 bypass 0
      <crash>
      
      Note the final one has different hit/bypass flags.
      
      This is because in should_writeback(), we were hitting a case where
      the partial stripe condition was returning true and so
      should_writeback() was returning true early.
      
      If that hadn't been the case, it would have hit the would_skip test, and
      as would_skip == s->iop.bypass == true, should_writeback() would have
      returned false.
      
      Looking at the git history from 'commit 72c27061 ("bcache: Write out
      full stripes")', it looks like the idea was to optimise for raid5/6:
      
             * If a stripe is already dirty, force writes to that stripe to
      	 writeback mode - to help build up full stripes of dirty data
      
      To fix this issue, make sure that should_writeback() on a discard op
      never returns true.
      
      More details of debugging:
      https://www.spinics.net/lists/linux-bcache/msg06996.html
      
      Previous reports:
       - https://bugzilla.kernel.org/show_bug.cgi?id=201051
       - https://bugzilla.kernel.org/show_bug.cgi?id=196103
       - https://www.spinics.net/lists/linux-bcache/msg06885.html
      
      (Coly Li: minor modification to follow maximum 75 chars per line rule)
      
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 72c27061 ("bcache: Write out full stripes")
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fb9a25c
    • Viresh Kumar's avatar
      PM / wakeup: Rework wakeup source timer cancellation · 6f76eeca
      Viresh Kumar authored
      commit 1fad17fb upstream.
      
      If wakeup_source_add() is called right after wakeup_source_remove()
      for the same wakeup source, timer_setup() may be called for a
      potentially scheduled timer which is incorrect.
      
      To avoid that, move the wakeup source timer cancellation from
      wakeup_source_drop() to wakeup_source_remove().
      
      Moreover, make wakeup_source_remove() clear the timer function after
      canceling the timer to let wakeup_source_not_registered() treat
      unregistered wakeup sources in the same way as the ones that have
      never been registered.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
      [ rjw: Subject, changelog, merged two patches together ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f76eeca
    • Yihao Wu's avatar
      nfsd: fix wrong check in write_v4_end_grace() · 33c164d5
      Yihao Wu authored
      commit dd838821 upstream.
      
      Commit 62a063b8 "nfsd4: fix crash on writing v4_end_grace before
      nfsd startup" is trying to fix a NULL dereference issue, but it
      mistakenly checks if the nfsd server is started. So fix it.
      
      Fixes: 62a063b8 "nfsd4: fix crash on writing v4_end_grace before nfsd startup"
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: default avatarYihao Wu <wuyihao@linux.alibaba.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33c164d5
    • NeilBrown's avatar
      nfsd: fix memory corruption caused by readdir · e4ea22f9
      NeilBrown authored
      commit b602345d upstream.
      
      If the result of an NFSv3 readdir{,plus} request results in the
      "offset" on one entry having to be split across 2 pages, and is sized
      so that the next directory entry doesn't fit in the requested size,
      then memory corruption can happen.
      
      When encode_entry() is called after encoding the last entry that fits,
      it notices that ->offset and ->offset1 are set, and so stores the
      offset value in the two pages as required.  It clears ->offset1 but
      *does not* clear ->offset.
      
      Normally this omission doesn't matter as encode_entry_baggage() will
      be called, and will set ->offset to a suitable value (not on a page
      boundary).
      But in the case where cd->buflen < elen and nfserr_toosmall is
      returned, ->offset is not reset.
      
      This means that nfsd3proc_readdirplus will see ->offset with a value 4
      bytes before the end of a page, and ->offset1 set to NULL.
      It will try to write 8bytes to ->offset.
      If we are lucky, the next page will be read-only, and the system will
        BUG: unable to handle kernel paging request at...
      
      If we are unlucky, some innocent page will have the first 4 bytes
      corrupted.
      
      nfsd3proc_readdir() doesn't even check for ->offset1, it just blindly
      writes 8 bytes to the offset wherever it is.
      
      Fix this by clearing ->offset after it is used, and copying the
      ->offset handling code from nfsd3_proc_readdirplus into
      nfsd3_proc_readdir.
      
      (Note that the commit hash in the Fixes tag is from the 'history'
       tree - this bug predates git).
      
      Fixes: 0b1d57cf ("[PATCH] kNFSd: Fix nfs3 dentry encoding")
      Fixes-URL: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0b1d57cf7654
      Cc: stable@vger.kernel.org (v2.6.12+)
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4ea22f9