1. 23 Sep, 2020 15 commits
  2. 15 Sep, 2020 5 commits
  3. 14 Sep, 2020 1 commit
    • Tejun Heo's avatar
      iocost: fix infinite loop bug in adjust_inuse_and_calc_cost() · aa67db24
      Tejun Heo authored
      adjust_inuse_and_calc_cost() is responsible for reducing the amount of
      donated weights dynamically in period as the budget runs low. Because we
      don't want to do full donation calculation in period, we keep latching up
      inuse by INUSE_ADJ_STEP_PCT of the active weight of the cgroup until the
      resulting hweight_inuse is satisfactory.
      
      Unfortunately, the adj_step calculation was reading the active weight before
      acquiring ioc->lock. Because the current thread could have lost race to
      activate the iocg to another thread before entering this function, it may
      read the active weight as zero before acquiring ioc->lock. When this
      happens, the adj_step is calculated as zero and the incremental adjustment
      loop becomes an infinite one.
      
      Fix it by fetching the active weight after acquiring ioc->lock.
      
      Fixes: b0853ab4 ("blk-iocost: revamp in-period donation snapbacks")
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      aa67db24
  4. 11 Sep, 2020 6 commits
  5. 10 Sep, 2020 13 commits
    • Xianting Tian's avatar
      blkcg: add plugging support for punt bio · 192f1c6b
      Xianting Tian authored
      The test and the explaination of the patch as bellow.
      
      Before test we added more debug code in blkg_async_bio_workfn():
      	int count = 0
      	if (bios.head && bios.head->bi_next) {
      		need_plug = true;
      		blk_start_plug(&plug);
      	}
      	while ((bio = bio_list_pop(&bios))) {
      		/*io_punt is a sysctl user interface to control the print*/
      		if(io_punt) {
      			printk("[%s:%d] bio start,size:%llu,%d count=%d plug?%d\n",
      				current->comm, current->pid, bio->bi_iter.bi_sector,
      				(bio->bi_iter.bi_size)>>9, count++, need_plug);
      		}
      		submit_bio(bio);
      	}
      	if (need_plug)
      		blk_finish_plug(&plug);
      
      Steps that need to be set to trigger *PUNT* io before testing:
      	mount -t btrfs -o compress=lzo /dev/sda6 /btrfs
      	mount -t cgroup2 nodev /cgroup2
      	mkdir /cgroup2/cg3
      	echo "+io" > /cgroup2/cgroup.subtree_control
      	echo "8:0 wbps=1048576000" > /cgroup2/cg3/io.max #1000M/s
      	echo $$ > /cgroup2/cg3/cgroup.procs
      
      Then use dd command to test btrfs PUNT io in current shell:
      	dd if=/dev/zero of=/btrfs/file bs=64K count=100000
      
      Test hardware environment as below:
      	[root@localhost btrfs]# lscpu
      	Architecture:          x86_64
      	CPU op-mode(s):        32-bit, 64-bit
      	Byte Order:            Little Endian
      	CPU(s):                32
      	On-line CPU(s) list:   0-31
      	Thread(s) per core:    2
      	Core(s) per socket:    8
      	Socket(s):             2
      	NUMA node(s):          2
      	Vendor ID:             GenuineIntel
      
      With above debug code, test command and test environment, I did the
      tests under 3 different system loads, which are triggered by stress:
      1, Run 64 threads by command "stress -c 64 &"
      	[53615.975974] [kworker/u66:18:1490] bio start,size:45583056,8 count=0 plug?1
      	[53615.975980] [kworker/u66:18:1490] bio start,size:45583064,8 count=1 plug?1
      	[53615.975984] [kworker/u66:18:1490] bio start,size:45583072,8 count=2 plug?1
      	[53615.975987] [kworker/u66:18:1490] bio start,size:45583080,8 count=3 plug?1
      	[53615.975990] [kworker/u66:18:1490] bio start,size:45583088,8 count=4 plug?1
      	[53615.975993] [kworker/u66:18:1490] bio start,size:45583096,8 count=5 plug?1
      	... ...
      	[53615.977041] [kworker/u66:18:1490] bio start,size:45585480,8 count=303 plug?1
      	[53615.977044] [kworker/u66:18:1490] bio start,size:45585488,8 count=304 plug?1
      	[53615.977047] [kworker/u66:18:1490] bio start,size:45585496,8 count=305 plug?1
      	[53615.977050] [kworker/u66:18:1490] bio start,size:45585504,8 count=306 plug?1
      	[53615.977053] [kworker/u66:18:1490] bio start,size:45585512,8 count=307 plug?1
      	[53615.977056] [kworker/u66:18:1490] bio start,size:45585520,8 count=308 plug?1
      	[53615.977058] [kworker/u66:18:1490] bio start,size:45585528,8 count=309 plug?1
      
      2, Run 32 threads by command "stress -c 32 &"
      	[50586.290521] [kworker/u66:6:32351] bio start,size:45806496,8 count=0 plug?1
      	[50586.290526] [kworker/u66:6:32351] bio start,size:45806504,8 count=1 plug?1
      	[50586.290529] [kworker/u66:6:32351] bio start,size:45806512,8 count=2 plug?1
      	[50586.290531] [kworker/u66:6:32351] bio start,size:45806520,8 count=3 plug?1
      	[50586.290533] [kworker/u66:6:32351] bio start,size:45806528,8 count=4 plug?1
      	[50586.290535] [kworker/u66:6:32351] bio start,size:45806536,8 count=5 plug?1
      	... ...
      	[50586.299640] [kworker/u66:5:32350] bio start,size:45808576,8 count=252 plug?1
      	[50586.299643] [kworker/u66:5:32350] bio start,size:45808584,8 count=253 plug?1
      	[50586.299646] [kworker/u66:5:32350] bio start,size:45808592,8 count=254 plug?1
      	[50586.299649] [kworker/u66:5:32350] bio start,size:45808600,8 count=255 plug?1
      	[50586.299652] [kworker/u66:5:32350] bio start,size:45808608,8 count=256 plug?1
      	[50586.299663] [kworker/u66:5:32350] bio start,size:45808616,8 count=257 plug?1
      	[50586.299665] [kworker/u66:5:32350] bio start,size:45808624,8 count=258 plug?1
      	[50586.299668] [kworker/u66:5:32350] bio start,size:45808632,8 count=259 plug?1
      
      3, Don't run thread by stress
      	[50861.355246] [kworker/u66:19:32376] bio start,size:13544504,8 count=0 plug?0
      	[50861.355288] [kworker/u66:19:32376] bio start,size:13544512,8 count=0 plug?0
      	[50861.355322] [kworker/u66:19:32376] bio start,size:13544520,8 count=0 plug?0
      	[50861.355353] [kworker/u66:19:32376] bio start,size:13544528,8 count=0 plug?0
      	[50861.355392] [kworker/u66:19:32376] bio start,size:13544536,8 count=0 plug?0
      	[50861.355431] [kworker/u66:19:32376] bio start,size:13544544,8 count=0 plug?0
      	[50861.355468] [kworker/u66:19:32376] bio start,size:13544552,8 count=0 plug?0
      	[50861.355499] [kworker/u66:19:32376] bio start,size:13544560,8 count=0 plug?0
      	[50861.355532] [kworker/u66:19:32376] bio start,size:13544568,8 count=0 plug?0
      	[50861.355575] [kworker/u66:19:32376] bio start,size:13544576,8 count=0 plug?0
      	[50861.355618] [kworker/u66:19:32376] bio start,size:13544584,8 count=0 plug?0
      	[50861.355659] [kworker/u66:19:32376] bio start,size:13544592,8 count=0 plug?0
      	[50861.355740] [kworker/u66:0:32346] bio start,size:13544600,8 count=0 plug?1
      	[50861.355748] [kworker/u66:0:32346] bio start,size:13544608,8 count=1 plug?1
      	[50861.355962] [kworker/u66:2:32347] bio start,size:13544616,8 count=0 plug?0
      	[50861.356272] [kworker/u66:7:31962] bio start,size:13544624,8 count=0 plug?0
      	[50861.356446] [kworker/u66:7:31962] bio start,size:13544632,8 count=0 plug?0
      	[50861.356567] [kworker/u66:7:31962] bio start,size:13544640,8 count=0 plug?0
      	[50861.356707] [kworker/u66:19:32376] bio start,size:13544648,8 count=0 plug?0
      	[50861.356748] [kworker/u66:15:32355] bio start,size:13544656,8 count=0 plug?0
      	[50861.356825] [kworker/u66:17:31970] bio start,size:13544664,8 count=0 plug?0
      
      Analysis of above 3 test results with different system load:
      >From above test, we can see more and more continuous bios can be plugged
      with system load increasing. When run "stress -c 64 &", 310 continuous
      bios are plugged; When run "stress -c 32 &", 260 continuous bios are
      plugged; When don't run stress, at most only 2 continuous bios are
      plugged, in most cases, bio_list only contains one single bio.
      
      How to explain above phenomenon:
      We know, in submit_bio(), if the bio is a REQ_CGROUP_PUNT io, it will
      queue a work to workqueue blkcg_punt_bio_wq. But when the workqueue is
      scheduled, it depends on the system load.  When system load is low, the
      workqueue will be quickly scheduled, and the bio in bio_list will be
      quickly processed in blkg_async_bio_workfn(), so there is less chance
      that the same io submit thread can add multiple continuous bios to
      bio_list before workqueue is scheduled to run. The analysis aligned with
      above test "3".
      When system load is high, there is some delay before the workqueue can
      be scheduled to run, the higher the system load the greater the delay.
      So there is more chance that the same io submit thread can add multiple
      continuous bios to bio_list. Then when the workqueue is scheduled to run,
      there are more continuous bios in bio_list, which will be processed in
      blkg_async_bio_workfn(). The analysis aligned with above test "1" and "2".
      
      According to test, we can get io performance improved with the patch,
      especially when system load is higher. Another optimazition is to use
      the plug only when bio_list contains at least 2 bios.
      Signed-off-by: default avatarXianting Tian <tian.xianting@h3c.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      192f1c6b
    • Christoph Hellwig's avatar
      block: remove check_disk_change · b92b5307
      Christoph Hellwig authored
      Remove the now unused check_disk_change helper.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b92b5307
    • Christoph Hellwig's avatar
      sr: simplify sr_block_revalidate_disk · 38a2b557
      Christoph Hellwig authored
      Both callers have a valid CD struture available, so rely on that instead
      of getting another reference.  Also move the function to avoid a forward
      declaration.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      38a2b557
    • Christoph Hellwig's avatar
      sr: use bdev_check_media_change · afd35c4f
      Christoph Hellwig authored
      Switch to use bdev_check_media_change instead of check_disk_change and
      call sr_block_revalidate_disk manually.  Also add an explicit call to
      sr_block_revalidate_disk just before disk_add() to ensure we always
      read check for a ready unit and read the TOC and then stop wiring up
      ->revalidate_disk.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      afd35c4f
    • Christoph Hellwig's avatar
      sd: use bdev_check_media_change · 471bd0af
      Christoph Hellwig authored
      Switch to use bdev_check_media_change instead of check_disk_change and
      call sd_revalidate_disk manually.  As sd also calls sd_revalidate_disk
      manually during probe and open, the extra call into ->revalidate_disk
      from bdev_disk_changed is not required either, so stop wiring up the
      method.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      471bd0af
    • Christoph Hellwig's avatar
      md: use bdev_check_media_change · 818077d6
      Christoph Hellwig authored
      The md driver does not have a ->revalidate_disk method, so it can just
      use bdev_check_media_change without any additional changes.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Acked-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      818077d6
    • Christoph Hellwig's avatar
      ide-gd: stop using the disk events mechanism · fec2cf60
      Christoph Hellwig authored
      ide-gd is only using the disk events mechanism to be able to force an
      invalidation and partition scan on opening removable media.  Just open
      code the logic without invoving the block layer.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fec2cf60
    • Christoph Hellwig's avatar
      ide-cd: remove idecd_revalidate_disk · a367e440
      Christoph Hellwig authored
      Just merge the trivial function into its only caller.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a367e440
    • Christoph Hellwig's avatar
      ide-cd: use bdev_check_media_changed · a22be69d
      Christoph Hellwig authored
      Switch to use bdev_check_media_changed instead of check_disk_change and
      call idecd_revalidate_disk manually.  Given that idecd_revalidate_disk
      only re-reads the TOC, and we already do the same at probe time, the
      extra call into ->revalidate_disk from bdev_disk_changed is not required
      either, so stop wiring up the method.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a22be69d
    • Christoph Hellwig's avatar
      gdrom: use bdev_check_media_change · faf04138
      Christoph Hellwig authored
      The Sega GD-ROM driver does not have a ->revalidate_disk method, so it
      can just use bdev_check_media_change without any additional changes.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      faf04138
    • Christoph Hellwig's avatar
      paride/pcd: use bdev_check_media_change · 1570d14f
      Christoph Hellwig authored
      The pcd driver does not have a ->revalidate_disk method, so it can just
      use bdev_check_media_change without any additional changes.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1570d14f
    • Christoph Hellwig's avatar
      xsysace: simplify media change handling · 77f93bfd
      Christoph Hellwig authored
      Pass a struct ace_device to ace_revalidate_disk, move the media changed
      check into the one caller that needs it, and give the routine a better
      name.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      77f93bfd
    • Christoph Hellwig's avatar
      xsysace: use bdev_check_media_change · f094225b
      Christoph Hellwig authored
      Switch to use bdev_check_media_change instead of check_disk_change and
      call ace_revalidate_disk manually.  Given that ace_revalidate_disk only
      deals with media change events, the extra call into ->revalidate_disk
      from bdev_disk_changed is not required either, so stop wiring up the
      method.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f094225b