- 28 Oct, 2016 27 commits
-
-
Martin Schwidefsky authored
On STP sync events the TOD clock will jump in time, either forward or backward. The TOD clocksource claims to be continuous but in case of an STP sync with a negative offset it is not. Subtract the offset injected by the STP sync check from the result of the TOD clocksource to make it continuous again. Add code to drift the offset towards zero with a fixed rate, steering 1 second in ~9 hours. Suggested-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Martin Schwidefsky authored
The last_update_clock time stamp in the lowcore should be adjusted by the TOD clock delta that is created by the clock synchronization. Otherwise the calculation of the steal time will be incorrect. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Martin Schwidefsky authored
Merge clock_sync_cpu into stp_sync_clock and split out the update of the global and per-CPU clock fields into clock_sync_global and clock_sync_local. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Jan Höppner authored
block->request_queue is used many times in dasd_setup_queue. Define a separate variable to increase readability a bit and to make it better reusable. Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Jan Höppner authored
Currently the block queue value max_segments is set to -1L, which is then implicitly casted to unsigned short in blk_queue_max_segments. This results in 65535 (64k) max_segments. Even though the resulting value is correct, setting it implicitly using -1L is rather confusing. Set the value explicitly using the USHRT_MAX macro instead. Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Michael Holzheu authored
Since commit d86bd1be ("mm/slub: support left redzone") it is no longer guaranteed that kmalloc(PAGE_SIZE) returns page aligned memory. After the above commit we get an error for diag224 because aligned memory is required. This leads to the following user visible error: # mount none -t s390_hypfs /sys/hypervisor/ mount: unknown filesystem type 's390_hypfs' # dmesg | grep hypfs hypfs.cccfb8: The hardware system does not provide all functions required by hypfs hypfs.7a79f0: Initialization of hypfs failed with rc=-61 Fix this problem and use get_free_page() instead of kmalloc() to get correctly aligned memory. Cc: stable@vger.kernel.org # v3.6+ Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "20 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk cris/arch-v32: cryptocop: print a hex number after a 0x prefix ipack: print a hex number after a 0x prefix block: DAC960: print a hex number after a 0x prefix fs: exofs: print a hex number after a 0x prefix lib/genalloc.c: start search from start of chunk mm: memcontrol: do not recurse in direct reclaim CREDITS: update credit information for Martin Kepplinger proc: fix NULL dereference when reading /proc/<pid>/auxv mm: kmemleak: ensure that the task stack is not freed during scanning lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB latent_entropy: raise CONFIG_FRAME_WARN by default kconfig.h: remove config_enabled() macro ipc: account for kmem usage on mqueue and msg mm/slab: improve performance of gathering slabinfo stats mm: page_alloc: use KERN_CONT where appropriate mm/list_lru.c: avoid error-path NULL pointer deref h8300: fix syscall restarting kcov: properly check if we are in an interrupt mm/slab: fix kmemcg cache creation delayed issue
-
Dimitri Sivanich authored
Would like to have this be a decimal number. Link: http://lkml.kernel.org/r/20161026134746.GA30169@sgi.comSigned-off-by: Dimitri Sivanich <sivanich@sgi.com> Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Uwe Kleine-König authored
It makes the result hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-6-u.kleine-koenig@pengutronix.deSigned-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Uwe Kleine-König authored
It makes the result hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-4-u.kleine-koenig@pengutronix.deSigned-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Cc: Jens Taprogge <jens.taprogge@taprogge.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Uwe Kleine-König authored
It makes the message hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.deSigned-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Uwe Kleine-König authored
It makes the message hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-2-u.kleine-koenig@pengutronix.deSigned-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Boaz Harrosh <ooo@electrozaur.com> Cc: Benny Halevy <bhalevy@primarydata.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Daniel Mentz authored
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find a contiguous block of memory that satisfies the allocation request. The shortcut if (size > atomic_read(&chunk->avail)) continue; makes the loop skip over chunks that do not have enough bytes left to fulfill the request. There are two situations, though, where an allocation might still fail: (1) The available memory is not contiguous, i.e. the request cannot be fulfilled due to external fragmentation. (2) A race condition. Another thread runs the same code concurrently and is quicker to grab the available memory. In those situations, the loop calls pool->algo() to search the entire chunk, and pool->algo() returns some value that is >= end_bit to indicate that the search failed. This return value is then assigned to start_bit. The variables start_bit and end_bit describe the range that should be searched, and this range should be reset for every chunk that is searched. Today, the code fails to reset start_bit to 0. As a result, prefixes of subsequent chunks are ignored. Memory allocations might fail even though there is plenty of room left in these prefixes of those other chunks. Fixes: 7f184275 ("lib, Make gen_pool memory allocator lockless") Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.comSigned-off-by: Daniel Mentz <danielmentz@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
On 4.0, we saw a stack corruption from a page fault entering direct memory cgroup reclaim, calling into btrfs_releasepage(), which then tried to allocate an extent and recursed back into a kmem charge ad nauseam: [...] btrfs_releasepage+0x2c/0x30 try_to_release_page+0x32/0x50 shrink_page_list+0x6da/0x7a0 shrink_inactive_list+0x1e5/0x510 shrink_lruvec+0x605/0x7f0 shrink_zone+0xee/0x320 do_try_to_free_pages+0x174/0x440 try_to_free_mem_cgroup_pages+0xa7/0x130 try_charge+0x17b/0x830 memcg_charge_kmem+0x40/0x80 new_slab+0x2d9/0x5a0 __slab_alloc+0x2fd/0x44f kmem_cache_alloc+0x193/0x1e0 alloc_extent_state+0x21/0xc0 __clear_extent_bit+0x2b5/0x400 try_release_extent_mapping+0x1a3/0x220 __btrfs_releasepage+0x31/0x70 btrfs_releasepage+0x2c/0x30 try_to_release_page+0x32/0x50 shrink_page_list+0x6da/0x7a0 shrink_inactive_list+0x1e5/0x510 shrink_lruvec+0x605/0x7f0 shrink_zone+0xee/0x320 do_try_to_free_pages+0x174/0x440 try_to_free_mem_cgroup_pages+0xa7/0x130 try_charge+0x17b/0x830 mem_cgroup_try_charge+0x65/0x1c0 handle_mm_fault+0x117f/0x1510 __do_page_fault+0x177/0x420 do_page_fault+0xc/0x10 page_fault+0x22/0x30 On later kernels, kmem charging is opt-in rather than opt-out, and that particular kmem allocation in btrfs_releasepage() is no longer being charged and won't recurse and overrun the stack anymore. But it's not impossible for an accounted allocation to happen from the memcg direct reclaim context, and we needed to reproduce this crash many times before we even got a useful stack trace out of it. Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to avoid recursing into any other form of direct reclaim. Then let recursive charges from PF_MEMALLOC contexts bypass the cgroup limit. Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Martin Kepplinger authored
Content and employer changed. Link: http://lkml.kernel.org/r/1477304102-28830-1-git-send-email-martin.kepplinger@ginzinger.comSigned-off-by: Martin Kepplinger <martink@posteo.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Leon Yu authored
Reading auxv of any kernel thread results in NULL pointer dereferencing in auxv_read() where mm can be NULL. Fix that by checking for NULL mm and bailing out early. This is also the original behavior changed by recent commit c5317167 ("proc: switch auxv to use of __mem_open()"). # cat /proc/2/auxv Unable to handle kernel NULL pointer dereference at virtual address 000000a8 Internal error: Oops: 17 [#1] PREEMPT SMP ARM CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1 Hardware name: BCM2709 task: ea3b0b00 task.stack: e99b2000 PC is at auxv_read+0x24/0x4c LR is at do_readv_writev+0x2fc/0x37c Process cat (pid: 113, stack limit = 0xe99b2210) Call chain: auxv_read do_readv_writev vfs_readv default_file_splice_read splice_direct_to_actor do_splice_direct do_sendfile SyS_sendfile64 ret_fast_syscall Fixes: c5317167 ("proc: switch auxv to use of __mem_open()") Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.comSigned-off-by: Leon Yu <chianglungyu@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Janis Danisevskis <jdanis@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Catalin Marinas authored
Commit 68f24b08 ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed during kmemleak_scan() execution, leading to either a NULL pointer fault (if task->stack is NULL) or kmemleak accessing already freed memory. This patch uses the new try_get_task_stack() API to ensure that the task stack is not freed during kmemleak stack scanning. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=173901. Fixes: 68f24b08 ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK") Link: http://lkml.kernel.org/r/1476266223-14325-1-git-send-email-catalin.marinas@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: CAI Qian <caiqian@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dmitry Vyukov authored
KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls. Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on second level). Size of each stack is (num_frames + 3) * sizeof(long). Which gives us ~84K stacks. This capacity was chosen empirically and it is enough to run kernel normally. However, when lots of configs are enabled and a fuzzer tries to maximize code coverage, it easily hits the limit within tens of minutes. I've tested for long a time with number of top level entries bumped 4x (4096). And I think I've seen overflow only once. But I don't have all configs enabled and code coverage has not reached maximum yet. So bump it 8x to 8192. Since we have two-level table, memory cost of this is very moderate -- currently the top-level table is 8KB, with this patch it is 64KB, which is negligible under KASAN. Here is some approx math. 128MB allows us to memorize ~670K stacks (assuming stack is ~200b). I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free| kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches. Most of alloc/free call sites are reachable with only one stack. But some utility functions can have large fanout. Assuming average fanout is 5x, total number of alloc/free stacks is ~300K. Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.comSigned-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
When building with the latent_entropy plugin, set the default CONFIG_FRAME_WARN to 2048, since some __init functions have many basic blocks that, when instrumented by the latent_entropy plugin, grow beyond 1024 byte stack size on 32-bit builds. Link: http://lkml.kernel.org/r/20161018211216.GA39687@beastSigned-off-by: Kees Cook <keescook@chromium.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Masahiro Yamada authored
The use of config_enabled() is ambiguous. For config options, IS_ENABLED(), IS_REACHABLE(), etc. will make intention clearer. Sometimes config_enabled() has been used for non-config options because it is useful to check whether the given symbol is defined or not. I have been tackling on deprecating config_enabled(), and now is the time to finish this work. Some new users have appeared for v4.9-rc1, but it is trivial to replace them: - arch/x86/mm/kaslr.c replace config_enabled() with IS_ENABLED() because CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean. - include/asm-generic/export.h replace config_enabled() with __is_defined(). Then, config_enabled() can be removed now. Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config options, and __is_defined() for non-config symbols. Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Garnier <thgarnie@google.com> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Aristeu Rozanski authored
When kmem accounting switched from account by default to only account if flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out. The production use case at hand is that mqueues should be customizable via sysctls in Docker containers in a Kubernetes cluster. This can only be safely allowed to the users of the cluster (without the risk that they can cause resource shortage on a node, influencing other users' containers) if all resources they control are bounded, i.e. accounted for. Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.comSigned-off-by: Aristeu Rozanski <arozansk@redhat.com> Reported-by: Stefan Schimanski <sttts@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Stefan Schimanski <sttts@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Aruna Ramakrishna authored
On large systems, when some slab caches grow to millions of objects (and many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2 seconds. During this time, interrupts are disabled while walking the slab lists (slabs_full, slabs_partial, and slabs_free) for each node, and this sometimes causes timeouts in other drivers (for instance, Infiniband). This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for total number of allocated slabs per node, per cache. This counter is updated when a slab is created or destroyed. This enables us to skip traversing the slabs_full list while gathering slabinfo statistics, and since slabs_full tends to be the biggest list when the cache is large, it results in a dramatic performance improvement. Getting slabinfo statistics now only requires walking the slabs_free and slabs_partial lists, and those lists are usually much smaller than slabs_full. We tested this after growing the dentry cache to 70GB, and the performance improved from 2s to 5ms. Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.comSigned-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joe Perches authored
Recent changes to printk require KERN_CONT uses to continue logging messages. So add KERN_CONT where necessary. [akpm@linux-foundation.org: coding-style fixes] Fixes: 4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines") Link: http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.comReported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Polakov authored
As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821: After some analysis it seems to be that the problem is in alloc_super(). In case list_lru_init_memcg() fails it goes into destroy_super(), which calls list_lru_destroy(). And in list_lru_init() we see that in case memcg_init_list_lru() fails, lru->node is freed, but not set NULL, which then leads list_lru_destroy() to believe it is initialized and call memcg_destroy_list_lru(). memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus, which is NULL. [akpm@linux-foundation.org: add comment] Signed-off-by: Alexander Polakov <apolyakov@beget.ru> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Rutland authored
Back in commit f56141e3 ("all arches, signal: move restart_block to struct task_struct"), all architectures and core code were changed to use task_struct::restart_block. However, when h8300 support was subsequently restored in v4.2, it was not updated to account for this, and maintains thread_info::restart_block, which is not kept in sync. This patch drops the redundant restart_block from thread_info, and moves h8300 to the common one in task_struct, ensuring that syscall restarting always works as expected. Fixes: f56141e3 ("all arches, signal: move restart_block to struct task_struct") Link: http://lkml.kernel.org/r/1476714934-11635-1-git-send-email-mark.rutland@arm.comSigned-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrey Konovalov authored
in_interrupt() returns a nonzero value when we are either in an interrupt or have bh disabled via local_bh_disable(). Since we are interested in only ignoring coverage from actual interrupts, do a proper check instead of just calling in_interrupt(). As a result of this change, kcov will start to collect coverage from within local_bh_disable()/local_bh_enable() sections. Link: http://lkml.kernel.org/r/1476115803-20712-1-git-send-email-andreyknvl@google.comSigned-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Nicolai Stange <nicstange@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Kees Cook <keescook@chromium.org> Cc: James Morse <james.morse@arm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
There is a bug report that SLAB makes extreme load average due to over 2000 kworker thread. https://bugzilla.kernel.org/show_bug.cgi?id=172981 This issue is caused by kmemcg feature that try to create new set of kmem_caches for each memcg. Recently, kmem_cache creation is slowed by synchronize_sched() and futher kmem_cache creation is also delayed since kmem_cache creation is synchronized by a global slab_mutex lock. So, the number of kworker that try to create kmem_cache increases quietly. synchronize_sched() is for lockless access to node's shared array but it's not needed when a new kmem_cache is created. So, this patch rules out that case. Fixes: 801faf0d ("mm/slab: lockless decision to grow cache") Link: http://lkml.kernel.org/r/1475734855-4837-1-git-send-email-iamjoonsoo.kim@lge.comReported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 27 Oct, 2016 13 commits
-
-
Linus Torvalds authored
No, KASAN may not be able to co-exist with HOTPLUG_MEMORY at runtime, but for build testing there is no reason not to allow them together. This hopefully means better build coverage and fewer embarrasing silly problems like the one fixed by commit 9db4f36e ("mm: remove unused variable in memory hotplug") in the future. Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
When I removed the per-zone bitlock hashed waitqueues in commit 9dcb8b68 ("mm: remove per-zone hashtable of bitlock waitqueues"), I removed all the magic hotplug memory initialization of said waitqueues too. But when I actually _tested_ the resulting build, I stupidly assumed that "allmodconfig" would enable memory hotplug. And it doesn't, because it enables KASAN instead, which then disables hotplug memory support. As a result, my build test of the per-zone waitqueues was totally broken, and I didn't notice that the compiler warns about the now unused iterator variable 'i'. I guess I should be happy that that seems to be the worst breakage from my clearly horribly failed test coverage. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "I2C has some driver bugfixes, module autoload fixes, and driver enablement on some architectures" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx: defer probe if bus recovery GPIOs are not ready i2c: designware: Avoid aborted transfers with fast reacting I2C slaves i2c: i801: Fix I2C Block Read on 8-Series/C220 and later i2c: xgene: Avoid dma_buffer overrun i2c: digicolor: Fix module autoload i2c: xlr: Fix module autoload for OF registration i2c: xlp9xx: Fix module autoload i2c: jz4780: Fix module autoload i2c: allow configuration of imx driver for ColdFire architecture i2c: mark device nodes only in case of successful instantiation i2c: rk3x: Give the tuning value 0 during rk3x_i2c_v0_calc_timings i2c: hix5hd2: allow build with ARCH_HISI
-
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linuxLinus Torvalds authored
Pull thermal updates from Zhang Rui: "The latest Thermal Management updates for v4.9-rc3: - Fix a regression introduced by commit b721ca0d(thermal/powerclamp: remove cpu whitelist), that powerclamp driver checks cpu support in a wrong way. From: Eric Ernst. - Fix a problem that intel_pch_thermal driver misses passive trip point when the PCH thermal device has an ACPI companion device associated. From: Srinivas Pandruvada. - Add missing support for Haswell PCH thermal sensor. From: Srinivas Pandruvada" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: thermal/powerclamp: correct cpu support check thermal: intel_pch_thermal: Enable Haswell PCH thermal: intel_pch_thermal: Add an ACPI passive trip
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull s390 fixes from Martin Schwidefsky: "A few more s390 patches for 4.9: - a fix for an overflow in the dasd driver reported by UBSAN - fix a regression and add hotplug memory to the zone movable again - add ignore defines for the pkey system calls - fix the ouput of the merged stack tracer - replace printk with pr_cont in arch/s390 where appropriate - remove the arch specific return_address function again - ignore reserved channel paths at boot time - add a missing hugetlb_bad_size call to the arch backend" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: fix zone calculation in arch_add_memory() s390/dumpstack: use pr_cont within show_stack and die s390/dumpstack: get rid of return_address again s390/disassambler: use pr_cont where appropriate s390/dumpstack: use pr_cont where appropriate s390/dumpstack: restore reliable indicator for call traces s390/mm: use hugetlb_bad_size() s390/cio: don't register chpids in reserved state s390: ignore pkey system calls s390/dasd: avoid undefined behaviour
-
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linuxLinus Torvalds authored
Pull module maintainership updates from Rusty Russell: "(Quoting from the MAINTAINERS commit:) Being a Linux kernel maintainer has been my proudest professional accomplishment, spanning the last 19 years. But now we have a surfeit of excellent hackers, and I can hand this over without regret. I'll still be around as co-maintainer for another cycle, but Jessica is now the one to convince if you want your patches applied. She rocks, and is far more timely than me too!" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MAINTAINERS: Begin module maintainer transition
-
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linuxLinus Torvalds authored
Pull oreangefs updates from Mike Marshall: "A couple of orangefs cleanups sent in by other developers: - use d_fsdata instead of d_time (Miklos Szeredi) - use file_inode(file) instead of file->f_path.dentry->d_inode (Amir Goldstein)" * tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: don't use d_time orangefs: user file_inode() where it is due
-
Linus Torvalds authored
Merge tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "This update contains fixes for most of the outstanding regressions introduced with the 4.9-rc1 XFS merge. There is also a fix for an iomap bug, too. This is a quite a bit larger than I'd prefer for a -rc3, but most of the change comes from cleaning up the new reflink copy on write code; it's much simpler and easier to understand now. These changes fixed several bugs in the new code, and it wasn't clear that there was an easier/simpler way to fix them. The rest of the fixes are the usual size you'd expect at this stage. I've left the commits to soak in linux-next for a some extra time because of the size before asking you to pull, no new problems with them have been reported so I think it's all OK. Summary: - iomap page offset masking fix for page faults - add IOMAP_REPORT to distinguish between read and fiemap map requests - cleanups to new shared data extent code - fix mount active status on failed log recovery - fix broken dquots in a buffer calculation - fix locking order issues and merge xfs_reflink_remap_range and xfs_file_share_range - rework unmapping of CoW extents and remove now unused functions - clean state when CoW is done" * tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (25 commits) xfs: clear cowblocks tag when cow fork is emptied xfs: fix up inode cowblocks tracking tracepoints fs: Do to trim high file position bits in iomap_page_mkwrite_actor xfs: remove xfs_bunmapi_cow xfs: optimize xfs_reflink_end_cow xfs: optimize xfs_reflink_cancel_cow_blocks xfs: refactor xfs_bunmapi_cow xfs: optimize writes to reflink files xfs: don't bother looking at the refcount tree for reads xfs: handle "raw" delayed extents xfs_reflink_trim_around_shared xfs: add xfs_trim_extent iomap: add IOMAP_REPORT xfs: merge xfs_reflink_remap_range and xfs_file_share_range xfs: remove xfs_file_wait_for_io xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range xfs: fix the same_inode check in xfs_file_share_range xfs: remove the same fs check from xfs_file_share_range libxfs: v3 inodes are only valid on crc-enabled filesystems libxfs: clean up _calc_dquots_per_chunk xfs: unset MS_ACTIVE if mount fails ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Two small fixes: one is a fatal section mismatch (reference to init after it's discarded) and the other two are iscsi locking fixes" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: NCR5380: no longer mark irq probing as __init scsi: be2iscsi: Replace _bh with _irqsave/irqrestore scsi: libiscsi: Fix locking in __iscsi_conn_send_pdu
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libataLinus Torvalds authored
Pull libata fixes from Tejun Heo: "The AHCI MSI handling change in rc1 was a bit broken and caused disk probing failures on some machines. These three patches should fix the issues" David Howells comments: "My test machine fell foul of this using a PCIe M.2-attached SSD card. The patches fix it for me" * 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ahci: fix the single MSI-X case in ahci_init_one ahci: fix nvec check ahci: only try to use multi-MSI mode if there is more than 1 port
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "A set of fixes for this series, most notably the fix for the blk-mq software queue regression in from this merge window. Apart from that, a fix for an unlikely hang if a queue is flooded with FUA requests from Ming, and a few small fixes for nbd and badblocks. Lastly, a rename update for the proc softirq output, since the block polling code was made generic" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: update hardware and software queues for sleeping alloc block: flush: fix IO hang in case of flood fua req nbd: fix incorrect unlock of nbd->sock_lock in sock_shutdown badblocks: badblocks_set/clear update unacked_exist softirq: Display IRQ_POLL for irq-poll statistics
-
Linus Torvalds authored
The per-zone waitqueues exist because of a scalability issue with the page waitqueues on some NUMA machines, but it turns out that they hurt normal loads, and now with the vmalloced stacks they also end up breaking gfs2 that uses a bit_wait on a stack object: wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE) where 'gh' can be a reference to the local variable 'mount_gh' on the stack of fill_super(). The reason the per-zone hash table breaks for this case is that there is no "zone" for virtual allocations, and trying to look up the physical page to get at it will fail (with a BUG_ON()). It turns out that I actually complained to the mm people about the per-zone hash table for another reason just a month ago: the zone lookup also hurts the regular use of "unlock_page()" a lot, because the zone lookup ends up forcing several unnecessary cache misses and generates horrible code. As part of that earlier discussion, we had a much better solution for the NUMA scalability issue - by just making the page lock have a separate contention bit, the waitqueue doesn't even have to be looked at for the normal case. Peter Zijlstra already has a patch for that, but let's see if anybody even notices. In the meantime, let's fix the actual gfs2 breakage by simplifying the bitlock waitqueues and removing the per-zone issue. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jens Axboe authored
If we end up sleeping due to running out of requests, we should update the hardware and software queues in the map ctx structure. Otherwise we could end up having rq->mq_ctx point to the pre-sleep context, and risk corrupting ctx->rq_list since we'll be grabbing the wrong lock when inserting the request. Reported-by: Dave Jones <davej@codemonkey.org.uk> Reported-by: Chris Mason <clm@fb.com> Tested-by: Chris Mason <clm@fb.com> Fixes: 63581af3 ("blk-mq: remove non-blocking pass in blk_mq_map_request") Signed-off-by: Jens Axboe <axboe@fb.com>
-