- 07 Feb, 2018 40 commits
-
-
Jens Axboe authored
* for-linus: block, bfq: add requeue-request hook bcache: fix for data collapse after re-attaching an attached device bcache: return attach error when no cache set exist bcache: set writeback_rate_update_seconds in range [1, 60] seconds bcache: fix for allocator and register thread race bcache: set error_limit correctly bcache: properly set task state in bch_writeback_thread() bcache: fix high CPU occupancy during journal bcache: add journal statistic block: Add should_fail_bio() for bpf error injection blk-wbt: account flush requests correctly
-
Jens Axboe authored
* master: (1190 commits) ASoC: stm32: add of dependency for stm32 drivers ASoC: mt8173-rt5650: fix child-node lookup ASoC: dapm: fix debugfs read using path->connected platform/x86: samsung-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro platform/x86: ideapad-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro platform/x86: dell-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro seq_file: Introduce DEFINE_SHOW_ATTRIBUTE() helper macro Documentation/sysctl/user.txt: fix typo MAINTAINERS: update ARM/QUALCOMM SUPPORT patterns MAINTAINERS: update various PALM patterns MAINTAINERS: update "ARM/OXNAS platform support" patterns MAINTAINERS: update Cortina/Gemini patterns MAINTAINERS: remove ARM/CLKDEV SUPPORT file pattern MAINTAINERS: remove ANDROID ION pattern mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors mm: docs: fix parameter names mismatch mm: docs: fixup punctuation pipe: read buffer limits atomically pipe: simplify round_pipe_size() pipe: reject F_SETPIPE_SZ with size over UINT_MAX ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linuxLinus Torvalds authored
Pull modules updates from Jessica Yu: "Minor code cleanups and MAINTAINERS update" * tag 'modules-for-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: modpost: Remove trailing semicolon ftrace/module: Move ftrace_release_mod() to ddebug_cleanup label MAINTAINERS: Remove from module & paravirt maintenance
-
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linuxLinus Torvalds authored
Pull inode->i_version cleanup from Jeff Layton: "Goffredo went ahead and sent a patch to rename this function, and reverse its sense, as we discussed last week. The patch is very straightforward and I figure it's probably best to go ahead and merge this to get the API as settled as possible" * tag 'iversion-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw}
-
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds authored
Pull UDF and ext2 fixlets from Jan Kara: "A UDF fix and an ext2 cleanup" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: drop unneeded newline udf: Sanitize nanoseconds for time stamps
-
Paolo Valente authored
Commit 'a6a252e6 ("blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device be re-inserted into the active I/O scheduler for that device. As a consequence, I/O schedulers may get the same request inserted again, even several times, without a finish_request invoked on that request before each re-insertion. This fact is the cause of the failure reported in [1]. For an I/O scheduler, every re-insertion of the same re-prepared request is equivalent to the insertion of a new request. For schedulers like mq-deadline or kyber, this fact causes no harm. In contrast, it confuses a stateful scheduler like BFQ, which keeps state for an I/O request, until the finish_request hook is invoked on the request. In particular, BFQ may get stuck, waiting forever for the number of request dispatches, of the same request, to be balanced by an equal number of request completions (while there will be one completion for that request). In this state, BFQ may refuse to serve I/O requests from other bfq_queues. The hang reported in [1] then follows. However, the above re-prepared requests undergo a requeue, thus the requeue_request hook of the active elevator is invoked for these requests, if set. This commit then addresses the above issue by properly implementing the hook requeue_request in BFQ. [1] https://marc.info/?l=linux-block&m=151211117608676Reported-by: Ivan Kozik <ivan@ludios.org> Reported-by: Alban Browaeys <alban.browaeys@gmail.com> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Serena Ziviani <ziviani.serena@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Linus Torvalds authored
Merge tag 'regulator-fix-v4.16-suspend' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "Fix suspend to idle. Testing on mainline after the initial regulator pull request went in identified a regression for suspend to idle due to it calling the suspend operations with states that it wasn't realized could happen, this patch fixes the problem" * tag 'regulator-fix-v4.16-suspend' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: Fix suspend to idle
-
git://github.com/bzolnier/linuxLinus Torvalds authored
Pull fbdev updates from Bartlomiej Zolnierkiewicz: "There is nothing really major here: - fix display-timings lookup in the Device Tree in atmel_lcdfb driver (Johan Hovold) - fix video mode and line_length to be set correctly in vfb driver (Pieter "PoroCYon" Sluys) - fix returning nonsensical values to the user-space on GIO_FONTX ioctl when using dummy console (Nicolas Pitre) - add missing license tag to mmpfb driver (Arnd Bergmann) - convert radeonfb and pxa3xx_gcu drivers to use ktime_get[_ts64]() instead of the deprecated do_gettimeofday() (Arnd Bergmann) - switch udlfb driver from using the pr_*() logging functions to the dev_*() ones + related cleanups (Ladislav Michl) - use __raw I/O accessors also on arm64 (Ji Zhang) - fix Kconfig help text for intelfb driver (Randy Dunlap) - do not duplicate features data in omapfb driver (Ladislav Michl) - misc cleanups (Colin Ian King, Markus Elfring, Rasmus Villemoes, Vasyl Gomonovych, Himanshu Jha, Michael Trimarchi)" * tag 'fbdev-v4.16' of git://github.com/bzolnier/linux: (25 commits) video: udlfb: Switch from the pr_*() to the dev_*() logging functions video: udlfb: Constify read only data video: fbdev/mmp: add MODULE_LICENSE console/dummy: leave .con_font_get set to NULL fbdev: mxsfb: use framebuffer_alloc in the correct way video: udlfb: Do not name private data 'dev' video: udlfb: Remove noisy warnings video: udlfb: Remove redundant gdev variable video: udlfb: Remove unnecessary local variable fbdev: auo_k190x: Use zeroing memory allocator instead of allocator/memset vfb: fix video mode and line_length being set when loaded fbdev: arm64 use __raw I/O memory api omapfb: dss: Do not duplicate features data video: fbdev: omap2: Use PTR_ERR_OR_ZERO() fbdev: au1200fb: delete duplicate header contents fbdev: pxa3xx: use ktime_get_ts64 for time stamps fbdev: radeon: use ktime_get() for HZ calibration video: smscufx: Improve a size determination in two functions video: udlfb: Delete an unnecessary return statement in two functions video: udlfb: Improve a size determination in dlfb_alloc_urb_list() ...
-
git://git.infradead.org/linux-platform-drivers-x86Linus Torvalds authored
Pull more x86 platform-drivers updates from Andy Shevchenko: "The DEFINE_SHOW_ATTRIBUTE() macro was defined privately in three locations and is useful for new and old users to avoid a lot of code duplication. Move the macro to seq_file.h. Along with above, clean up three drivers to use that macro. This, due to dependencies, was sent separately since affected changes weren't upstream originally yet. The rationale of doing this now is to allow use of new macro in v4.17 cycle in a conflictless manner" * tag 'platform-drivers-x86-v4.16-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: samsung-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro platform/x86: ideapad-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro platform/x86: dell-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro seq_file: Introduce DEFINE_SHOW_ATTRIBUTE() helper macro
-
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/soundLinus Torvalds authored
Pull more ASoC updates from Mark Brown: "With the merge window having been delayed for another week here's another batch of updates that came in during that week. There's a few important fixes in here, mainly a fix for I/O on a number of devices caused by some of the component rework and a fix for a potential issue if more than one component in a link provides compressed operations. The I/O fixes are particularly important as the problem causes a power regression on a number of OMAP platforms" * tag 'asoc-v4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound: (22 commits) ASoC: stm32: add of dependency for stm32 drivers ASoC: mt8173-rt5650: fix child-node lookup ASoC: dapm: fix debugfs read using path->connected ASoC: compress: Fixup error messages ASoC: compress: Remove some extraneous blank lines ASoC: compress: Correct handling of copy callback ASoC: Intel: kbl: Enable mclk and ssp sclk early ASoC: Intel: Skylake: Add extended I2S config blob support in Clock driver ASoC: Intel: Skylake: Add ssp clock driver ASoC: Fix twl4030 and 6040 regression by adding back read and write ASoC: sun8i-codec: Add ADC support for a33 ASoC: rockchip: Use dummy_dai for rt5514 dsp dailink ASoC: soc-pcm: rename .pmdown_time to .use_pmdown_time for Component ASoC: ak4613: call dummy write for PW_MGMT1/3 when Playback ASoC: soc-pcm: don't call flush_delayed_work() many times in soc_pcm_private_free() ASoC: soc-core: snd_soc_rtdcom_lookup() cares component driver name ASoC: sam9x5_wm8731: Drop 'ASoC' prefix from error messages ASoC: sam9g20_wm8731: use dev_*() logging functions ASoC: max98373 Changed SPDX header in C++ comments style ASoC: dmic: Fix check of return value from read of 'num-channels' ...
-
git://www.linux-watchdog.org/linux-watchdogLinus Torvalds authored
Pull watchdog updates from Wim Van Sebroeck: - new watchdog device drivers for Realtek RTD1295 and Spreadtrum SC9860 platform - add support for the following devices: jz4780 SoC, AST25xx series SoC and r8a77970 SoC - convert to watchdog framework: i6300esb_wdt, xen_wdt and sp5100_tco - several fixes for watchdog core - remove at32ap700x and obsolete documentation - gpio: Convert to use GPIO descriptors - rename gemini into FTWDT010 as this IP block is generc from Faraday Technology - various clean-ups and small bugfixes - add Guenter Roeck as co-maintainer - change maintainers e-mail address * tag 'linux-watchdog-4.16-rc1' of git://www.linux-watchdog.org/linux-watchdog: (74 commits) documentation: watchdog: remove documentation of w83697hf_wdt/w83697ug_wdt documentation: watchdog: remove documentation for ixp2000 documentation: watchdog: remove documentation of at32ap700x_wdt watchdog: remove at32ap700x_wdt watchdog: sp5100_tco: Add support for recent FCH versions watchdog: sp5100-tco: Abort if watchdog is disabled by hardware watchdog: sp5100_tco: Use bit operations watchdog: sp5100_tco: Convert to use watchdog subsystem watchdog: sp5100_tco: Clean up function and variable names watchdog: sp5100_tco: Use dev_ print functions where possible watchdog: sp5100_tco: Match PCI device early watchdog: sp5100_tco: Clean up sp5100_tco_setupdevice watchdog: sp5100_tco: Use standard error codes watchdog: sp5100_tco: Use request_muxed_region where possible watchdog: sp5100_tco: Fix watchdog disable bit watchdog: sp5100_tco: Always use SP5100_IO_PM_{INDEX_REG,DATA_REG} watchdog: core: make sure the watchdog_worker is not deferred watchdog: mt7621: switch to using managed devm_watchdog_register_device() watchdog: mt7621: set WDOG_HW_RUNNING bit when appropriate watchdog: imx2_wdt: restore previous timeout after suspend+resume ...
-
Tang Junhui authored
back-end device sdm has already attached a cache_set with ID f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with another cache set, and it returns with an error: [root]# cd /sys/block/sdm/bcache [root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f > attach -bash: echo: write error: Invalid argument After that, execute a command to modify the label of bcache device: [root]# echo data_disk1 > label Then we reboot the system, when the system power on, the back-end device can not attach to cache_set, a messages show in the log: Feb 5 12:05:52 ceph152 kernel: [922385.508498] bcache: bch_cached_dev_attach() couldn't find uuid for sdm in set In sysfs_attach(), dc->sb.set_uuid was assigned to the value which input through sysfs, no matter whether it is success or not in bch_cached_dev_attach(). For example, If the back-end device has already attached to an cache set, bch_cached_dev_attach() would fail, but dc->sb.set_uuid was changed. Then modify the label of bcache device, it will call bch_write_bdev_super(), which would write the dc->sb.set_uuid to the super block, so we record a wrong cache set ID in the super block, after the system reboot, the cache set couldn't find the uuid of the back-end device, so the bcache device couldn't exist and use any more. In this patch, we don't assigned cache set ID to dc->sb.set_uuid in sysfs_attach() directly, but input it into bch_cached_dev_attach(), and assigned dc->sb.set_uuid to the cache set ID after the back-end device attached to the cache set successful. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tang Junhui authored
I attach a back-end device to a cache set, and the cache set is not registered yet, this back-end device did not attach successfully, and no error returned: [root]# echo 87859280-fec6-4bcc-20df7ca8f86b > /sys/block/sde/bcache/attach [root]# In sysfs_attach(), the return value "v" is initialized to "size" in the beginning, and if no cache set exist in bch_cache_sets, the "v" value would not change any more, and return to sysfs, sysfs regard it as success since the "size" is a positive number. This patch fixes this issue by assigning "v" with "-ENOENT" in the initialization. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Coly Li authored
dc->writeback_rate_update_seconds can be set via sysfs and its value can be set to [1, ULONG_MAX]. It does not make sense to set such a large value, 60 seconds is long enough value considering the default 5 seconds works well for long time. Because dc->writeback_rate_update is a special delayed work, it re-arms itself inside the delayed work routine update_writeback_rate(). When stopping it by cancel_delayed_work_sync(), there should be a timeout to wait and make sure the re-armed delayed work is stopped too. A small max value of dc->writeback_rate_update_seconds is also helpful to decide a reasonable small timeout. This patch limits sysfs interface to set dc->writeback_rate_update_seconds in range of [1, 60] seconds, and replaces the hand-coded number by macros. Changelog: v2: fix a rebase typo in v4, which is pointed out by Michael Lyle. v1: initial version. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tang Junhui authored
After long time running of random small IO writing, I reboot the machine, and after the machine power on, I found bcache got stuck, the stack is: [root@ceph153 ~]# cat /proc/2510/task/*/stack [<ffffffffa06b2455>] closure_sync+0x25/0x90 [bcache] [<ffffffffa06b6be8>] bch_journal+0x118/0x2b0 [bcache] [<ffffffffa06b6dc7>] bch_journal_meta+0x47/0x70 [bcache] [<ffffffffa06be8f7>] bch_prio_write+0x237/0x340 [bcache] [<ffffffffa06a8018>] bch_allocator_thread+0x3c8/0x3d0 [bcache] [<ffffffff810a631f>] kthread+0xcf/0xe0 [<ffffffff8164c318>] ret_from_fork+0x58/0x90 [<ffffffffffffffff>] 0xffffffffffffffff [root@ceph153 ~]# cat /proc/2038/task/*/stack [<ffffffffa06b1abd>] __bch_btree_map_nodes+0x12d/0x150 [bcache] [<ffffffffa06b1bd1>] bch_btree_insert+0xf1/0x170 [bcache] [<ffffffffa06b637f>] bch_journal_replay+0x13f/0x230 [bcache] [<ffffffffa06c75fe>] run_cache_set+0x79a/0x7c2 [bcache] [<ffffffffa06c0cf8>] register_bcache+0xd48/0x1310 [bcache] [<ffffffff812f702f>] kobj_attr_store+0xf/0x20 [<ffffffff8125b216>] sysfs_write_file+0xc6/0x140 [<ffffffff811dfbfd>] vfs_write+0xbd/0x1e0 [<ffffffff811e069f>] SyS_write+0x7f/0xe0 [<ffffffff8164c3c9>] system_call_fastpath+0x16/0x1 The stack shows the register thread and allocator thread were getting stuck when registering cache device. I reboot the machine several times, the issue always exsit in this machine. I debug the code, and found the call trace as bellow: register_bcache() ==>run_cache_set() ==>bch_journal_replay() ==>bch_btree_insert() ==>__bch_btree_map_nodes() ==>btree_insert_fn() ==>btree_split() //node need split ==>btree_check_reserve() In btree_check_reserve(), It will check if there is enough buckets of RESERVE_BTREE type, since allocator thread did not work yet, so no buckets of RESERVE_BTREE type allocated, so the register thread waits on c->btree_cache_wait, and goes to sleep. Then the allocator thread initialized, the call trace is bellow: bch_allocator_thread() ==>bch_prio_write() ==>bch_journal_meta() ==>bch_journal() ==>journal_wait_for_write() In journal_wait_for_write(), It will check if journal is full by journal_full(), but the long time random small IO writing causes the exhaustion of journal buckets(journal.blocks_free=0), In order to release the journal buckets, the allocator calls btree_flush_write() to flush keys to btree nodes, and waits on c->journal.wait until btree nodes writing over or there has already some journal buckets space, then the allocator thread goes to sleep. but in btree_flush_write(), since bch_journal_replay() is not finished, so no btree nodes have journal (condition "if (btree_current_write(b)->journal)" never satisfied), so we got no btree node to flush, no journal bucket released, and allocator sleep all the times. Through the above analysis, we can see that: 1) Register thread wait for allocator thread to allocate buckets of RESERVE_BTREE type; 2) Alloctor thread wait for register thread to replay journal, so it can flush btree nodes and get journal bucket. then they are all got stuck by waiting for each other. Hua Rui provided a patch for me, by allocating some buckets of RESERVE_BTREE type in advance, so the register thread can get bucket when btree node splitting and no need to waiting for the allocator thread. I tested it, it has effect, and register thread run a step forward, but finally are still got stuck, the reason is only 8 bucket of RESERVE_BTREE type were allocated, and in bch_journal_replay(), after 2 btree nodes splitting, only 4 bucket of RESERVE_BTREE type left, then btree_check_reserve() is not satisfied anymore, so it goes to sleep again, and in the same time, alloctor thread did not flush enough btree nodes to release a journal bucket, so they all got stuck again. So we need to allocate more buckets of RESERVE_BTREE type in advance, but how much is enough? By experience and test, I think it should be as much as journal buckets. Then I modify the code as this patch, and test in the machine, and it works. This patch modified base on Hua Rui’s patch, and allocate more buckets of RESERVE_BTREE type in advance to avoid register thread and allocate thread going to wait for each other. [patch v2] ca->sb.njournal_buckets would be 0 in the first time after cache creation, and no journal exists, so just 8 btree buckets is OK. Signed-off-by: Hua Rui <huarui.dev@gmail.com> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Coly Li authored
Struct cache uses io_errors for two purposes, - Error decay: when cache set error_decay is set, io_errors is used to generate a small piece of delay when I/O error happens. - I/O errors counter: in order to generate big enough value for error decay, I/O errors counter value is stored by left shifting 20 bits (a.k.a IO_ERROR_SHIFT). In function bch_count_io_errors(), if I/O errors counter reaches cache set error limit, bch_cache_set_error() will be called to retire the whold cache set. But current code is problematic when checking the error limit, see the following code piece from bch_count_io_errors(), 90 if (error) { 91 char buf[BDEVNAME_SIZE]; 92 unsigned errors = atomic_add_return(1 << IO_ERROR_SHIFT, 93 &ca->io_errors); 94 errors >>= IO_ERROR_SHIFT; 95 96 if (errors < ca->set->error_limit) 97 pr_err("%s: IO error on %s, recovering", 98 bdevname(ca->bdev, buf), m); 99 else 100 bch_cache_set_error(ca->set, 101 "%s: too many IO errors %s", 102 bdevname(ca->bdev, buf), m); 103 } At line 94, errors is right shifting IO_ERROR_SHIFT bits, now it is real errors counter to compare at line 96. But ca->set->error_limit is initia- lized with an amplified value in bch_cache_set_alloc(), 1545 c->error_limit = 8 << IO_ERROR_SHIFT; It means by default, in bch_count_io_errors(), before 8<<20 errors happened bch_cache_set_error() won't be called to retire the problematic cache device. If the average request size is 64KB, it means bcache won't handle failed device until 512GB data is requested. This is too large to be an I/O threashold. So I believe the correct error limit should be much less. This patch sets default cache set error limit to 8, then in bch_count_io_errors() when errors counter reaches 8 (if it is default value), function bch_cache_set_error() will be called to retire the whole cache set. This patch also removes bits shifting when store or show io_error_limit value via sysfs interface. Nowadays most of SSDs handle internal flash failure automatically by LBA address re-indirect mapping. If an I/O error can be observed by upper layer code, it will be a notable error because that SSD can not re-indirect map the problematic LBA address to an available flash block. This situation indicates the whole SSD will be failed very soon. Therefore setting 8 as the default io error limit value makes sense, it is enough for most of cache devices. Changelog: v2: add reviewed-by from Hannes. v1: initial version for review. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Cc: Junhui Tang <tang.junhui@zte.com.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Coly Li authored
Kernel thread routine bch_writeback_thread() has the following code block, 447 down_write(&dc->writeback_lock); 448~450 if (check conditions) { 451 up_write(&dc->writeback_lock); 452 set_current_state(TASK_INTERRUPTIBLE); 453 454 if (kthread_should_stop()) 455 return 0; 456 457 schedule(); 458 continue; 459 } If condition check is true, its task state is set to TASK_INTERRUPTIBLE and call schedule() to wait for others to wake up it. There are 2 issues in current code, 1, Task state is set to TASK_INTERRUPTIBLE after the condition checks, if another process changes the condition and call wake_up_process(dc-> writeback_thread), then at line 452 task state is set back to TASK_INTERRUPTIBLE, the writeback kernel thread will lose a chance to be waken up. 2, At line 454 if kthread_should_stop() is true, writeback kernel thread will return to kernel/kthread.c:kthread() with TASK_INTERRUPTIBLE and call do_exit(). It is not good to enter do_exit() with task state TASK_INTERRUPTIBLE, in following code path might_sleep() is called and a warning message is reported by __might_sleep(): "WARNING: do not call blocking ops when !TASK_RUNNING; state=1 set at [xxxx]". For the first issue, task state should be set before condition checks. Ineed because dc->writeback_lock is required when modifying all the conditions, calling set_current_state() inside code block where dc-> writeback_lock is hold is safe. But this is quite implicit, so I still move set_current_state() before all the condition checks. For the second issue, frankley speaking it does not hurt when kernel thread exits with TASK_INTERRUPTIBLE state, but this warning message scares users, makes them feel there might be something risky with bcache and hurt their data. Setting task state to TASK_RUNNING before returning fixes this problem. In alloc.c:allocator_wait(), there is also a similar issue, and is also fixed in this patch. Changelog: v3: merge two similar fixes into one patch v2: fix the race issue in v1 patch. v1: initial buggy fix. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Michael Lyle <mlyle@lyle.org> Cc: Michael Lyle <mlyle@lyle.org> Cc: Junhui Tang <tang.junhui@zte.com.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tang Junhui authored
After long time small writing I/O running, we found the occupancy of CPU is very high and I/O performance has been reduced by about half: [root@ceph151 internal]# top top - 15:51:05 up 1 day,2:43, 4 users, load average: 16.89, 15.15, 16.53 Tasks: 2063 total, 4 running, 2059 sleeping, 0 stopped, 0 zombie %Cpu(s):4.3 us, 17.1 sy 0.0 ni, 66.1 id, 12.0 wa, 0.0 hi, 0.5 si, 0.0 st KiB Mem : 65450044 total, 24586420 free, 38909008 used, 1954616 buff/cache KiB Swap: 65667068 total, 65667068 free, 0 used. 25136812 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2023 root 20 0 0 0 0 S 55.1 0.0 0:04.42 kworker/11:191 14126 root 20 0 0 0 0 S 42.9 0.0 0:08.72 kworker/10:3 9292 root 20 0 0 0 0 S 30.4 0.0 1:10.99 kworker/6:1 8553 ceph 20 0 4242492 1.805g 18804 S 30.0 2.9 410:07.04 ceph-osd 12287 root 20 0 0 0 0 S 26.7 0.0 0:28.13 kworker/7:85 31019 root 20 0 0 0 0 S 26.1 0.0 1:30.79 kworker/22:1 1787 root 20 0 0 0 0 R 25.7 0.0 5:18.45 kworker/8:7 32169 root 20 0 0 0 0 S 14.5 0.0 1:01.92 kworker/23:1 21476 root 20 0 0 0 0 S 13.9 0.0 0:05.09 kworker/1:54 2204 root 20 0 0 0 0 S 12.5 0.0 1:25.17 kworker/9:10 16994 root 20 0 0 0 0 S 12.2 0.0 0:06.27 kworker/5:106 15714 root 20 0 0 0 0 R 10.9 0.0 0:01.85 kworker/19:2 9661 ceph 20 0 4246876 1.731g 18800 S 10.6 2.8 403:00.80 ceph-osd 11460 ceph 20 0 4164692 2.206g 18876 S 10.6 3.5 360:27.19 ceph-osd 9960 root 20 0 0 0 0 S 10.2 0.0 0:02.75 kworker/2:139 11699 ceph 20 0 4169244 1.920g 18920 S 10.2 3.1 355:23.67 ceph-osd 6843 ceph 20 0 4197632 1.810g 18900 S 9.6 2.9 380:08.30 ceph-osd The kernel work consumed a lot of CPU, and I found they are running journal work, The journal is reclaiming source and flush btree node with surprising frequency. Through further analysis, we found that in btree_flush_write(), we try to get a btree node with the smallest fifo idex to flush by traverse all the btree nodein c->bucket_hash, after we getting it, since no locker protects it, this btree node may have been written to cache device by other works, and if this occurred, we retry to traverse in c->bucket_hash and get another btree node. When the problem occurrd, the retry times is very high, and we consume a lot of CPU in looking for a appropriate btree node. In this patch, we try to record 128 btree nodes with the smallest fifo idex in heap, and pop one by one when we need to flush btree node. It greatly reduces the time for the loop to find the appropriate BTREE node, and also reduce the occupancy of CPU. [note by mpl: this triggers a checkpatch error because of adjacent, pre-existing style violations] Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tang Junhui authored
Sometimes, Journal takes up a lot of CPU, we need statistics to know what's the journal is doing. So this patch provide some journal statistics: 1) reclaim: how many times the journal try to reclaim resource, usually the journal bucket or/and the pin are exhausted. 2) flush_write: how many times the journal try to flush btree node to cache device, usually the journal bucket are exhausted. 3) retry_flush_write: how many times the journal retry to flush the next btree node, usually the previous tree node have been flushed by other thread. we show these statistic by sysfs interface. Through these statistics We can totally see the status of journal module when the CPU is too high. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Linus Torvalds authored
Merge tag 'riscv-for-linus-4.16-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V updates from Palmer Dabbelt: "This contains the fixes we'd like to target for the 4.16 merge window. It's not as much as I was originally hoping to do but between glibc, the chip, and FOSDEM there just wasn't enough time to get everything put together. As such, this merge window is essentially just going to be small changes. This includes mostly cleanups: - A build fix failure to the audit test cases. RISC-V doesn't have renameat because the generic syscall ABI moved to renameat2 by the time of our port. The syscall audit test cases don't understand this, so I added a trivial fix. This went through mailing list review during the 4.15 merge window, but nobody has picked it up so I think it's best to just do this here. - The removal of our command-line argument processing code. The "mem_end" stuff was broken and the rest duplicated generic device tree code. The generic code was already being called. - Some unused/redundant code has been removed, including __ARCH_HAVE_MMU, current_pgdir, and the initialization of init_mm.pgd. - SUM is disabled upon taking a trap, which means that user memory is protected during traps taking inside copy_{to,from}_user(). - The sptbr CSR has been renamed to satp in C code. We haven't changed the assembly code in order to maintain compatibility with binutils 2.29, which doesn't understand the new name. Additionally, we're adding some new features: - Basic ftrace support, thanks to Alan Kao! - Support for ZONE_DMA32. This is necessary for all the normal reasons, but also to deal with a deficiency in the Xilinx PCIe controller we're using on our FPGA-based systems. While the ZONE_DMA32 addition should be sufficient for most uses, it doesn't complete the fix for the Xilinx controller. - TLB shootdowns now only target the harts where they're necessary, instead of applying to all harts in the system. These patches have all been sitting on our linux-next branch for a while now. Due to time constraints this is all I feel comfortable submitting during the 4.16 merge window, hopefully we'll do better next time!" [ Note to self: "harts" is RISC-V speak for "hardware threads". I had to look that up. - Linus ] * tag 'riscv-for-linus-4.16-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: riscv: inline set_pgdir into its only caller riscv: rename sptbr to satp riscv: don't read back satp in paging_init riscv: remove the unused current_pgdir function riscv: add ZONE_DMA32 RISC-V: Limit the scope of TLB shootdowns riscv: disable SUM in the exception handler riscv: remove redundant unlikely() riscv: remove unused __ARCH_HAVE_MMU define riscv/ftrace: Add basic support RISC-V: Remove mem_end command line processing RISC-V: Remove duplicate command-line parsing logic audit: Avoid build failures on systems without renameat
-
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mipsLinus Torvalds authored
Pull MIPS fixes from James Hogan: "A couple of MIPS fixes for 4.16-rc1, including an important regression in 4.15 and a rather more longstanding corner case build fix. These are separate from the main pull request as one of the bugs fixed was only recently introduced in v4.15-rc8. - Fix CPS regression on older binutils due to MIPS_ISA_LEVEL_RAW fix (4.15) - Fix allmodconfig + CONFIG_MACH_TX49XX=y builds due to incorrect use of IS_ENABLED() (2.6.28)" * tag 'mips_fixes_4.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS MIPS: CPS: Fix MIPS_ISA_LEVEL_RAW fallout
-
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mipsLinus Torvalds authored
Pull MIPS updates from James Hogan: "These are the main MIPS changes for 4.16. Rough overview: (1) Basic support for the Ingenic JZ4770 based GCW Zero open-source handheld video game console (2) Support for the Ranchu board (used by Android emulator) (3) Various cleanups and misc improvements More detailed summary: Fixes: - Fix generic platform's USB_*HCI_BIG_ENDIAN selects (4.9) - Fix vmlinuz default build when ZBOOT selected - Fix clean up of vmlinuz targets - Fix command line duplication (in preparation for Ingenic JZ4770) Miscellaneous: - Allow Processor ID reads to be to be optimised away by the compiler (improves performance when running in guest) - Push ARCH_MIGHT_HAVE_PC_SERIO/PARPORT down to platform level to disable on generic platform with Ranchu board support - Add helpers for assembler macro instructions for older assemblers - Use assembler macro instructions to support VZ, XPA & MSA operations on older assemblers, removing C wrapper duplication - Various improvements to VZ & XPA assembly wrappers - Add drivers/platform/mips/ to MIPS MAINTAINERS entry Minor cleanups: - Misc FPU emulation cleanups (removal of unnecessary include, moving macros to common header, checkpatch and sparse fixes) - Remove duplicate assignment of core in play_dead() - Remove duplication in watchpoint handling - Remove mips_dma_mapping_error() stub - Use NULL instead of 0 in prepare_ftrace_return() - Use proper kernel-doc Return keyword for __compute_return_epc_for_insn() - Remove duplicate semicolon in csum_fold() Platform support: Broadcom: - Enable ZBOOT on BCM47xx Generic platform: - Add Ranchu board support, used by Android emulator - Fix machine compatible string matching for Ranchu - Support GIC in EIC mode Ingenic platforms: - Add DT, defconfig and other support for JZ4770 SoC and GCW Zero - Support dynamnic machine types (i.e. JZ4740 / JZ4770 / JZ4780) - Add Ingenic JZ4770 CGU clocks - General Ingenic clk changes to prepare for JZ4770 SoC support - Use common command line handling code - Add DT vendor prefix to GCW (Game Consoles Worldwide) Loongson: - Add MAINTAINERS entry for Loongson2 and Loongson3 platforms - Drop 32-bit support for Loongson 2E/2F devices - Fix build failures due to multiple use of 'MEM_RESERVED'" * tag 'mips_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: (53 commits) MIPS: Malta: Sanitize mouse and keyboard configuration. MIPS: Update defconfigs after previous patch. MIPS: Push ARCH_MIGHT_HAVE_PC_SERIO down to platform level MIPS: Push ARCH_MIGHT_HAVE_PC_PARPORT down to platform level MIPS: SMP-CPS: Remove duplicate assignment of core in play_dead MIPS: Generic: Support GIC in EIC mode MIPS: generic: Fix Makefile alignment MIPS: generic: Fix ranchu_of_match[] termination MIPS: generic: Fix machine compatible matching MIPS: Loongson fix name confict - MEM_RESERVED MIPS: bcm47xx: enable ZBOOT support MIPS: Fix trailing semicolon MIPS: Watch: Avoid duplication of bits in mips_read_watch_registers MIPS: Watch: Avoid duplication of bits in mips_install_watch_registers. MIPS: MSA: Update helpers to use new asm macros MIPS: XPA: Standardise readx/writex accessors MIPS: XPA: Allow use of $0 (zero) to MTHC0 MIPS: XPA: Use XPA instructions in assembly MIPS: VZ: Pass GC0 register names in $n format MIPS: VZ: Update helpers to use new asm macros ...
-
git://git.lwn.net/linuxLinus Torvalds authored
Pull more documentation updates from Jonathan Corbet: "A few late-arriving fixes, along with Konstantin's PGP document that had no reason to wait another cycle" * tag 'docs-4.16-2' of git://git.lwn.net/linux: Documentation/process: tweak pgp maintainer guide Documentation/admin-guide: fixes for thunderbolt.rst Documentation: mips: Update AU1xxx_IDE Kconfig dependencies Fix broken link in Documentation/process/kernel-docs.rst Documentation/process: kernel maintainer PGP guide
-
Mark Brown authored
Merge remote-tracking branches 'asoc/topic/sam9x5_wm8731', 'asoc/topic/sgtl5000' and 'asoc/topic/sun8i-codec' into asoc-next
-
Mark Brown authored
Merge remote-tracking branches 'asoc/topic/max98373', 'asoc/topic/mtk', 'asoc/topic/pcm', 'asoc/topic/rockchip' and 'asoc/topic/sam9g20_wm8731' into asoc-next
-
Mark Brown authored
Merge remote-tracking branches 'asoc/topic/ak4613', 'asoc/topic/core', 'asoc/topic/dmic' and 'asoc/topic/intel' into asoc-next
-
Mark Brown authored
-
Mark Brown authored
-
Mark Brown authored
Merge remote-tracking branches 'asoc/fix/compress', 'asoc/fix/core', 'asoc/fix/dapm', 'asoc/fix/mtk' and 'asoc/fix/stm' into asoc-next
-
Olivier Moysan authored
Add of dependency for STM32 ASoC drivers. DFSDM of dependency is already inherited from STM32_DFSDM_ADC dependency. Signed-off-by: olivier moysan <olivier.moysan@st.com> Signed-off-by: Mark Brown <broonie@kernel.org>
-
Johan Hovold authored
This driver used the wrong OF-helper when looking up the optional capture-codec child node during probe. Instead of searching just children of the sound node, a tree-wide depth-first search starting at the unrelated platform node was done. Not only could this end up matching an unrelated node or no node at all; the platform node could also be prematurely freed since of_find_node_by_name() drops a reference to its first argument. This particular pattern has been observed leading to crashes after probe deferrals in other drivers. Fix this by dropping the broken call to of_find_node_by_name() and keeping only the second, correct lookup using of_get_child_by_name() while taking care not to bail out if the optional node is missing. Note that this also addresses two capture-codec node-reference leaks (one for each of the original helper calls). Compile tested only. Fixes: d349caeb ("ASoC: mediatek: Add second I2S on mt8173-rt5650 machine driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org>
-
KaiChieh Chuang authored
This fix a bug in dapm_widget_power_read_file(), where it may sent opposite order of source/sink widget into the p->connected(). for example, static int connected_check(source, sink); {"w_sink", NULL, "w_source", connected_check} the dapm_widget_power_read_file() will query p->connected() in following case p->conneted("w_source", "w_sink") p->conneted("w_sink", "w_source") we should avoid the last case, since it's the wrong order (source/sink) as declared in snd_soc_dapm_route. Signed-off-by: KaiChieh Chuang <kaichieh.chuang@mediatek.com> Signed-off-by: Mark Brown <broonie@kernel.org>
-
Andy Shevchenko authored
...instead of open coding file operations followed by custom ->open() callbacks per each attribute. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
-
Andy Shevchenko authored
...instead of open coding file operations followed by custom ->open() callbacks per each attribute. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
-
Andy Shevchenko authored
...instead of open coding file operations followed by custom ->open() callbacks per each attribute. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
-
Andy Shevchenko authored
The DEFINE_SHOW_ATTRIBUTE() helper macro would be useful for current users, which are many of them, and for new comers to decrease code duplication. Acked-by: Lee Jones <lee.jones@linaro.org> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
-
Linus Torvalds authored
Merge misc updates from Andrew Morton: - kasan updates - procfs - lib/bitmap updates - other lib/ updates - checkpatch tweaks - rapidio - ubsan - pipe fixes and cleanups - lots of other misc bits * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) Documentation/sysctl/user.txt: fix typo MAINTAINERS: update ARM/QUALCOMM SUPPORT patterns MAINTAINERS: update various PALM patterns MAINTAINERS: update "ARM/OXNAS platform support" patterns MAINTAINERS: update Cortina/Gemini patterns MAINTAINERS: remove ARM/CLKDEV SUPPORT file pattern MAINTAINERS: remove ANDROID ION pattern mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors mm: docs: fix parameter names mismatch mm: docs: fixup punctuation pipe: read buffer limits atomically pipe: simplify round_pipe_size() pipe: reject F_SETPIPE_SZ with size over UINT_MAX pipe: fix off-by-one error when checking buffer limits pipe: actually allow root to exceed the pipe buffer limits pipe, sysctl: remove pipe_proc_fn() pipe, sysctl: drop 'min' parameter from pipe-max-size converter kasan: rework Kconfig settings crash_dump: is_kdump_kernel can be boolean kernel/mutex: mutex_is_locked can be boolean ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler updates from Ingo Molnar: - membarrier updates (Mathieu Desnoyers) - SMP balancing optimizations (Mel Gorman) - stats update optimizations (Peter Zijlstra) - RT scheduler race fixes (Steven Rostedt) - misc fixes and updates * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS sched/fair: Do not migrate if the prev_cpu is idle sched/fair: Restructure wake_affine*() to return a CPU id sched/fair: Remove unnecessary parameters from wake_affine_idle() sched/rt: Make update_curr_rt() more accurate sched/rt: Up the root domain ref count when passing it around via IPIs sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() sched/core: Optimize update_stats_*() sched/core: Optimize ttwu_stat() membarrier/selftest: Test private expedited sync core command membarrier/arm64: Provide core serializing command membarrier/x86: Provide core serializing command membarrier: Provide core serializing command, *_SYNC_CORE lockin/x86: Implement sync_core_before_usermode() locking: Introduce sync_core_before_usermode() membarrier/selftest: Test global expedited command membarrier: Provide GLOBAL_EXPEDITED command membarrier: Document scheduler barrier requirements powerpc, membarrier: Skip memory barrier in switch_mm() membarrier/selftest: Test private expedited command
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Ingo Molnar: "Tooling fixes, plus add missing interval sampling to certain x86 PEBS events" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Add trace/beauty/generated/ into .gitignore perf trace: Fix call-graph output x86/events/intel/ds: Add PERF_SAMPLE_PERIOD into PEBS_FREERUNNING_FLAGS perf record: Fix period option handling perf evsel: Fix period/freq terms setup tools headers: Synchoronize x86 features UAPI headers tools headers: Synchronize uapi/linux/sched.h tools headers: Sync {tools/,}arch/powerpc/include/uapi/asm/kvm.h tooling headers: Synchronize updated s390 kvm UAPI headers tools headers: Synchronize sound/asound.h
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fixlets from Ingo Molnar: "An endianness fix and a jump labels branch hint update" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/qrwlock: include asm/byteorder.h as needed jump_label: Add branch hints to static_branch_{un,}likely()
-