1. 26 Sep, 2019 4 commits
    • Tejun Heo's avatar
      iocost: improve nr_lagging handling · 7cd806a9
      Tejun Heo authored
      Some IOs may span multiple periods.  As latencies are collected on
      completion, the inbetween periods won't register them and may
      incorrectly decide to increase vrate.  nr_lagging tracks these IOs to
      avoid those situations.  Currently, whenever there are IOs which are
      spanning from the previous period, busy_level is reset to 0 if
      negative thus suppressing vrate increase.
      
      This has the following two problems.
      
      * When latency target percentiles aren't set, vrate adjustment should
        only be governed by queue depth depletion; however, the current code
        keeps nr_lagging active which pulls in latency results and can keep
        down vrate unexpectedly.
      
      * When lagging condition is detected, it resets the entire negative
        busy_level.  This turned out to be way too aggressive on some
        devices which sometimes experience extended latencies on a small
        subset of commands.  In addition, a lagging IO will be accounted as
        latency target miss on completion anyway and resetting busy_level
        amplifies its impact unnecessarily.
      
      This patch fixes the above two problems by disabling nr_lagging
      counting when latency target percentiles aren't set and blocking vrate
      increases when there are lagging IOs while leaving busy_level as-is.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7cd806a9
    • Tejun Heo's avatar
      iocost: better trace vrate changes · 25d41e4a
      Tejun Heo authored
      vrate_adj tracepoint traces vrate changes; however, it does so only
      when busy_level is non-zero.  busy_level turning to zero can sometimes
      be as interesting an event.  This patch also enables vrate_adj
      tracepoint on other vrate related events - busy_level changes and
      non-zero nr_lagging.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      25d41e4a
    • Ming Lei's avatar
      block: don't release queue's sysfs lock during switching elevator · b89f625e
      Ming Lei authored
      cecf5d87 ("block: split .sysfs_lock into two locks") starts to
      release & acquire sysfs_lock before registering/un-registering elevator
      queue during switching elevator for avoiding potential deadlock from
      showing & storing 'queue/iosched' attributes and removing elevator's
      kobject.
      
      Turns out there isn't such deadlock because 'q->sysfs_lock' isn't
      required in .show & .store of queue/iosched's attributes, and just
      elevator's sysfs lock is acquired in elv_iosched_store() and
      elv_iosched_show(). So it is safe to hold queue's sysfs lock when
      registering/un-registering elevator queue.
      
      The biggest issue is that commit cecf5d87 assumes that concurrent
      write on 'queue/scheduler' can't happen. However, this assumption isn't
      true, because kernfs_fop_write() only guarantees that concurrent write
      aren't called on the same open file, but the write could be from
      different open on the file. So we can't release & re-acquire queue's
      sysfs lock during switching elevator, otherwise use-after-free on
      elevator could be triggered.
      
      Fixes the issue by not releasing queue's sysfs lock during switching
      elevator.
      
      Fixes: cecf5d87 ("block: split .sysfs_lock into two locks")
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b89f625e
    • Ming Lei's avatar
      blk-mq: move lockdep_assert_held() into elevator_exit · 284b94be
      Ming Lei authored
      Commit c48dac13 ("block: don't hold q->sysfs_lock in elevator_init_mq")
      removes q->sysfs_lock from elevator_init_mq(), but forgot to deal with
      lockdep_assert_held() called in blk_mq_sched_free_requests() which is
      run in failure path of elevator_init_mq().
      
      blk_mq_sched_free_requests() is called in the following 3 functions:
      
      	elevator_init_mq()
      	elevator_exit()
      	blk_cleanup_queue()
      
      In blk_cleanup_queue(), blk_mq_sched_free_requests() is followed exactly
      by 'mutex_lock(&q->sysfs_lock)'.
      
      So moving the lockdep_assert_held() from blk_mq_sched_free_requests()
      into elevator_exit() for fixing the report by syzbot.
      
      Reported-by: syzbot+da3b7677bb913dc1b737@syzkaller.appspotmail.com
      Fixed: c48dac13 ("block: don't hold q->sysfs_lock in elevator_init_mq")
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      284b94be
  2. 25 Sep, 2019 4 commits
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client · f41def39
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "The highlights are:
      
         - automatic recovery of a blacklisted filesystem session (Zheng Yan).
           This is disabled by default and can be enabled by mounting with the
           new "recover_session=clean" option.
      
         - serialize buffered reads and O_DIRECT writes (Jeff Layton). Care is
           taken to avoid serializing O_DIRECT reads and writes with each
           other, this is based on the exclusion scheme from NFS.
      
         - handle large osdmaps better in the face of fragmented memory
           (myself)
      
         - don't limit what security.* xattrs can be get or set (Jeff Layton).
           We were overly restrictive here, unnecessarily preventing things
           like file capability sets stored in security.capability from
           working.
      
         - allow copy_file_range() within the same inode and across different
           filesystems within the same cluster (Luis Henriques)"
      
      * tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client: (41 commits)
        ceph: call ceph_mdsc_destroy from destroy_fs_client
        libceph: use ceph_kvmalloc() for osdmap arrays
        libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc()
        ceph: allow object copies across different filesystems in the same cluster
        ceph: include ceph_debug.h in cache.c
        ceph: move static keyword to the front of declarations
        rbd: pull rbd_img_request_create() dout out into the callers
        ceph: reconnect connection if session hang in opening state
        libceph: drop unused con parameter of calc_target()
        ceph: use release_pages() directly
        rbd: fix response length parameter for encoded strings
        ceph: allow arbitrary security.* xattrs
        ceph: only set CEPH_I_SEC_INITED if we got a MAC label
        ceph: turn ceph_security_invalidate_secctx into static inline
        ceph: add buffered/direct exclusionary locking for reads and writes
        libceph: handle OSD op ceph_pagelist_append() errors
        ceph: don't return a value from void function
        ceph: don't freeze during write page faults
        ceph: update the mtime when truncating up
        ceph: fix indentation in __get_snap_name()
        ...
      f41def39
    • Linus Torvalds's avatar
      Merge tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 7b1373dd
      Linus Torvalds authored
      Pull fuse updates from Miklos Szeredi:
      
       - Continue separating the transport (user/kernel communication) and the
         filesystem layers of fuse. Getting rid of most layering violations
         will allow for easier cleanup and optimization later on.
      
       - Prepare for the addition of the virtio-fs filesystem. The actual
         filesystem will be introduced by a separate pull request.
      
       - Convert to new mount API.
      
       - Various fixes, optimizations and cleanups.
      
      * tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (55 commits)
        fuse: Make fuse_args_to_req static
        fuse: fix memleak in cuse_channel_open
        fuse: fix beyond-end-of-page access in fuse_parse_cache()
        fuse: unexport fuse_put_request
        fuse: kmemcg account fs data
        fuse: on 64-bit store time in d_fsdata directly
        fuse: fix missing unlock_page in fuse_writepage()
        fuse: reserve byteswapped init opcodes
        fuse: allow skipping control interface and forced unmount
        fuse: dissociate DESTROY from fuseblk
        fuse: delete dentry if timeout is zero
        fuse: separate fuse device allocation and installation in fuse_conn
        fuse: add fuse_iqueue_ops callbacks
        fuse: extract fuse_fill_super_common()
        fuse: export fuse_dequeue_forget() function
        fuse: export fuse_get_unique()
        fuse: export fuse_send_init_request()
        fuse: export fuse_len_args()
        fuse: export fuse_end_request()
        fuse: fix request limit
        ...
      7b1373dd
    • Linus Torvalds's avatar
      Merge tag 'tpmdd-next-20190925' of git://git.infradead.org/users/jjs/linux-tpmdd · 301310c6
      Linus Torvalds authored
      Pull tpm fixes from Jarkko Sakkinen.
      
      * tag 'tpmdd-next-20190925' of git://git.infradead.org/users/jjs/linux-tpmdd:
        tpm: Wrap the buffer from the caller to tpm_buf in tpm_send()
        MAINTAINERS: keys: Update path to trusted.h
        KEYS: trusted: correctly initialize digests and fix locking issue
        selftests/tpm2: Add log and *.pyc to .gitignore
        selftests/tpm2: Add the missing TEST_FILES assignment
      301310c6
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.4-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 4ef5b13a
      Linus Torvalds authored
      Pull iomap updates from Darrick Wong:
       "After last week's failed pull request attempt, I scuttled everything
        in the branch except for the directio endio api changes, which were
        trivial. Everything else will simply have to wait for the next cycle.
      
        Summary:
      
         - Report both io errors and short io results to the directio endio
           handler.
      
         - Allow directio callers to pass an ops structure to iomap_dio_rw"
      
      * tag 'iomap-5.4-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: move the iomap_dio_rw ->end_io callback into a structure
        iomap: split size and error for iomap_dio_rw ->end_io
      4ef5b13a
  3. 24 Sep, 2019 32 commits