- 24 Feb, 2017 1 commit
-
-
Martin Schwidefsky authored
This patch introduces a new in-kernel-crypto blockcipher called 'paes' which implements AES with protected keys. The paes blockcipher can be used similar to the aes blockcipher but uses secure key material to derive the working protected key and so offers an encryption implementation where never a clear key value is exposed in memory. The paes module is only available for the s390 platform providing a minimal hardware support of CPACF enabled with at least MSA level 3. Upon module initialization these requirements are checked. Includes additional contribution from Harald Freudenberger. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
- 23 Feb, 2017 39 commits
-
-
Colin Ian King authored
trivial fix to spelling mistake in literal string Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Harald Freudenberger authored
This patch introcudes a new kernel module pkey which is providing protected key handling and management functions. The pkey API is available within the kernel for other s390 specific code to create and manage protected keys. Additionally the functions are exported to user space via IOCTL calls. The implementation makes extensive use of functions provided by the zcrypt device driver. For generating protected keys from secure keys there is also a CEX coprocessor card needed. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Harald Freudenberger authored
Export the two zcrypt device driver functions zcrypt_send_cprb and zcrypt_device_status_mask to be useable for other kernel code. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Harald Freudenberger authored
The CONFIG_ZCRYPT Kconfig entry in drivers/crypto showed outdated hardware whereas the latest cards where missing. Reworked the text to reflect the current abilities of the zcrypt device driver. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Harald Freudenberger authored
The AP bus code is not buildable as kernel module any more. Commit 5fe38260d083 ("s390/zcrypt: make ap_bus explicitly non-modular") leaves one now unused function which gets removed with this patch. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Heiko Carstens authored
Play safe and purge all tlbs after the control registers that contain the primary, secondary and home space asces have been validated. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Heiko Carstens authored
When validating register contents first validate control registers since these control the availability of features later being validated. For example the control register 0 should be validated first, before the additional floating point (AFP) registers are validated, since control register 0 contains the AFP-register control bit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Harald Freudenberger authored
Adding the PCKMO inline function and the function code definitions for using the pckmo function to the cpacf header file. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Harald Freudenberger authored
This patch introduces the possibility to reset the request_count attribute for cards and queues to zero. This can be used to set a clear state on the counters before running an application and try out if and which hardware is actually used. If the request_count counter of a card is reset, for all associated queues the request_count is also zeroed. If just a queue request_count is reset the card counter is not updated however. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Dominik Dingel authored
_SEGMENT_ENTRY_INVALID denotes the invalid bit in a segment table entry whereas _SEGMENT_ENTRY_EMPTY means that the value of the whole entry is only the invalid bit, as the entry is completely empty. Therefore we use _SEGMENT_ENTRY_INVALID only to check and set the invalid bit with bitwise operations. _SEGMENT_ENTRY_EMPTY is only used to check for (un)equality. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Peter Oberparleiter authored
Prevent kernel crashes due to unhandled exceptions raised by the CHSC instruction which may for example be triggered by invalid ioctl data. Fixes: 64150adf ("s390/cio: Introduce generic synchronous CHSC IOCTL") Cc: <stable@vger.kernel.org> # v3.11+ Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Heiko Carstens authored
This the s390 version of commit c1bd55f9 ("x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit"). Simply use the tls system call argument instead of extracting the tls argument by magic from the pt_regs structure. See commit 3033f14a ("clone: support passing tls argument via C rather than pt_regs magic") for more background. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Heiko Carstens authored
Unbalanced set_fs usages (e.g. early exit from a function and a forgotten set_fs(USER_DS) call) may lead to a situation where the secondary asce is the kernel space asce when returning to user space. This would allow user space to modify kernel space at will. This would only be possible with the above mentioned kernel bug, however we can detect this and fix the secondary asce before returning to user space. Therefore a new TIF_ASCE_SECONDARY which is used within set_fs. When returning to user space check if TIF_ASCE_SECONDARY is set, which would indicate a bug. If it is set print a message to the console, fixup the secondary asce, and then return to user space. This is similar to what is being discussed for x86 and arm: "[RFC] syscalls: Restore address limit after a syscall". Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Heiko Carstens authored
This is just a preparation patch in order to keep the "restore address space after syscall" patch small. Rename CIF_ASCE to CIF_ASCE_PRIMARY to be unique and specific when introducing a second CIF_ASCE_SECONDARY CIF flag. Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Linus Torvalds authored
Merge updates from Andrew Morton: "142 patches: - DAX updates - various misc bits - OCFS2 updates - most of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (142 commits) mm/z3fold.c: limit first_num to the actual range of possible buddy indexes mm: fix <linux/pagemap.h> stray kernel-doc notation zram: remove obsolete sysfs attrs mm/memblock.c: remove unnecessary log and clean up oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA mm: drop unused argument of zap_page_range() mm: drop zap_details::check_swap_entries mm: drop zap_details::ignore_dirty mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled mm: help __GFP_NOFAIL allocations which do not trigger OOM killer mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically mm: consolidate GFP_NOFAIL checks in the allocator slowpath lib/show_mem.c: teach show_mem to work with the given nodemask arch, mm: remove arch specific show_mem mm, page_alloc: warn_alloc print nodemask mm, page_alloc: do not report all nodes in show_mem Revert "mm: bail out in shrink_inactive_list()" mm, vmscan: consider eligible zones in get_scan_count mm, vmscan: cleanup lru size claculations mm, vmscan: do not count freed pages as PGDEACTIVATE ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linuxLinus Torvalds authored
Pull DeviceTree updates from Rob Herring: "Pretty standard stuff with dtc upstream sync being the biggest piece. - Sync dtc to upstream commit 0931cea3ba20. This picks up overlay support in dtc. - Set dma_ops for reserved memory users. - Make references to IOMMU consistent in DT bindings. - Cleanup references to pm_power_off in bindings. - Move some display bindings that snuck into the old bindings/video/ path. - Fix some wrong documentation paths caused from binding restructuring. - Vendor prefixes for Faraday and Fujitsu. - Fix an of_node ref counting leak in of_find_node_opts_by_path - Introduce new graph helper of_graph_get_remote_node() which will be used by DRM drivers in 4.12" * tag 'devicetree-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (27 commits) DT: add Faraday Tec. as vendor of: introduce of_graph_get_remote_node of: Add missing space at end of pr_fmt(). of: make of_device_make_bus_id() static of: fix of_node leak caused in of_find_node_opts_by_path dt-bindings: net: remove reference to fixed link support dt-bindings: power: reset: qnap-poweroff: Drop reference to pm_power_off dt-bindings: power: reset: gpio-poweroff: Drop reference to pm_power_off dt-bindings: mfd: as3722: Drop reference to pm_power_off dt-bindings: display: move ANX7814 and SiI8620 bridge bindings of/unittest: Swap arguments of of_unittest_apply_overlay() Documentation: usb: fix wrong documentation paths serial: fsl-imx-uart.txt: Remove generic property devicetree: Add Fujitsu Ltd. vendor prefix Documentation: display: fix wrong documentation paths of: remove redundant memset in overlay bus:qcom : Fix typo in qcom,ebi2.txt dt-bindings: qman: Remove pool channel node Documentation: panel-dpi: fix path to display-timing.txt devicetree: bindings: clk: mvebu: fix description for sata1 on Armada XP ...
-
git://git.lwn.net/linuxLinus Torvalds authored
Pull documentation updates from Jonathan Corbet: "A slightly quieter cycle for documentation this time around. Three more DocBook template files have been converted to RST; only 21 to go. There are various build improvements and the usual array of documentation improvements and fixes" * tag 'docs-4.11' of git://git.lwn.net/linux: (44 commits) docs / driver-api: Fix structure references in device_link.rst PM / docs: Fix structure references in device.rst Add a target to check broken external links in the Documentation Documentation: Fix linux-api list typo Documentation: DocBook/Makefile comment typo Improve sparse documentation Documentation: make Makefile.sphinx no-ops quieter Documentation: DMA-ISA-LPC.txt Documentation: input: fix path to input code definitions docs: Remove the copyright year from conf.py docs: Fix a warning in the Korean HOWTO.rst translation PM / sleep / docs: Convert PM notifiers document to reST PM / core / docs: Convert sleep states API document to reST PM / core: Update kerneldoc comments in pm.h doc-rst: Fix recursive make invocation from macros doc-rst: Delete output of failed dot-SVG conversion doc-rst: Break shell command sequences on failure Documentation/sphinx: make targets independent of Sphinx work for HAVE_SPHINX=0 doc-rst: fixed cleandoc target when used with O=dir Documentation/sphinx: prevent generation of .pyc files in the source tree ...
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull KVM updates from Paolo Bonzini: "4.11 is going to be a relatively large release for KVM, with a little over 200 commits and noteworthy changes for most architectures. ARM: - GICv3 save/restore - cache flushing fixes - working MSI injection for GICv3 ITS - physical timer emulation MIPS: - various improvements under the hood - support for SMP guests - a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers to support copy-on-write, KSM, idle page tracking, swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is also supported, so that writes to some memory regions can be treated as MMIO. The new MMU also paves the way for hardware virtualization support. PPC: - support for POWER9 using the radix-tree MMU for host and guest - resizable hashed page table - bugfixes. s390: - expose more features to the guest - more SIMD extensions - instruction execution protection - ESOP2 x86: - improved hashing in the MMU - faster PageLRU tracking for Intel CPUs without EPT A/D bits - some refactoring of nested VMX entry/exit code, preparing for live migration support of nested hypervisors - expose yet another AVX512 CPUID bit - host-to-guest PTP support - refactoring of interrupt injection, with some optimizations thrown in and some duct tape removed. - remove lazy FPU handling - optimizations of user-mode exits - optimizations of vcpu_is_preempted() for KVM guests generic: - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits) x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 x86/paravirt: Change vcp_is_preempted() arg type to long KVM: VMX: use correct vmcs_read/write for guest segment selector/base x86/kvm/vmx: Defer TR reload after VM exit x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss x86/kvm/vmx: Simplify segment_base() x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels x86/kvm/vmx: Don't fetch the TSS base from the GDT x86/asm: Define the kernel TSS limit in a macro kvm: fix page struct leak in handle_vmon KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now KVM: Return an error code only as a constant in kvm_get_dirty_log() KVM: Return an error code only as a constant in kvm_get_dirty_log_protect() KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl() KVM: x86: remove code for lazy FPU handling KVM: race-free exit from KVM_RUN without POSIX signals KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message KVM: PPC: Book3S PR: Ratelimit copy data failure error messages KVM: Support vCPU-based gfn->hva cache KVM: use separate generations for each address space ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull IOMMU fix from Joerg Roedel: "Fix a boot crash caused by the VT-d driver when booted with IOMMU disabled. This was introduced with the recent IOMMU changes" * tag 'iommu-fix-v4.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix crash on boot when DMAR is disabled
-
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds authored
Pull seccomp fix from James Morris: "A fix for a regression in the seccomp code (it was supposed to be in the first pull req but I had it queued in the wrong branch)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: seccomp: Only dump core when single-threaded
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull xfs updates from Darrick Wong: "Here are the XFS changes for 4.11. We aren't introducing any major features in this release cycle except for this being the first merge window I've managed on my own. :) Changes since last update: - Various cleanups - Livelock fixes for eofblocks scanning - Improved input verification for on-disk metadata - Fix races in the copy on write remap mechanism - Fix buffer io error timeout controls - Streamlining of directio copy on write - Asynchronous discard support - Fix asserts when splitting delalloc reservations - Don't bloat bmbt when right shifting extents - Inode alignment fixes for 32k block sizes" * tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (39 commits) xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG xfs: simplify xfs_rtallocate_extent xfs: tune down agno asserts in the bmap code xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment xfs: don't reserve blocks for right shift transactions xfs: fix len comparison in xfs_extent_busy_trim xfs: fix uninitialized variable in _reflink_convert_cow xfs: split indlen reservations fairly when under reserved xfs: handle indlen shortage on delalloc extent merge xfs: resurrect debug mode drop buffered writes mechanism xfs: clear delalloc and cache on buffered write failure xfs: don't block the log commit handler for discards xfs: improve busy extent sorting xfs: improve handling of busy extents in the low-level allocator xfs: don't fail xfs_extent_busy allocation xfs: correct null checks and error processing in xfs_initialize_perag xfs: update ctime and mtime on clone destinatation inodes xfs: allocate direct I/O COW blocks in iomap_begin xfs: go straight to real allocations for direct I/O COW writes xfs: return the converted extent in __xfs_reflink_convert_cow ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printkLinus Torvalds authored
Pull printk updates from Petr Mladek: - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven Rostedt as the printk reviewer. This idea came up after the discussion about printk issues at Kernel Summit. It was formulated and discussed at lkml[1]. - Extend a lock-less NMI per-cpu buffers idea to handle recursive printk() calls by Sergey Senozhatsky[2]. It is the first step in sanitizing printk as discussed at Kernel Summit. The change allows to see messages that would normally get ignored or would cause a deadlock. Also it allows to enable lockdep in printk(). This already paid off. The testing in linux-next helped to discover two old problems that were hidden before[3][4]. - Remove unused parameter by Sergey Senozhatsky. Clean up after a past change. [1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com [2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com [3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com [4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: drop call_console_drivers() unused param printk: convert the rest to printk-safe printk: remove zap_locks() function printk: use printk_safe buffers in printk printk: report lost messages in printk safe/nmi contexts printk: always use deferred printk when flush printk_safe lines printk: introduce per-cpu safe_print seq buffer printk: rename nmi.c and exported api printk: use vprintk_func in vprintk() MAINTAINERS: Add printk maintainers
-
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linuxLinus Torvalds authored
Pull modules updates from Jessica Yu: "Summary of modules changes for the 4.11 merge window: - A few small code cleanups - Add modules git tree url to MAINTAINERS" * tag 'modules-for-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: MAINTAINERS: add tree for modules module: fix memory leak on early load_module() failures module: Optimize search_module_extables() modules: mark __inittest/__exittest as __maybe_unused livepatch/module: print notice of TAINT_LIVEPATCH module: Drop redundant declaration of struct module
-
zhong jiang authored
At present, Tying the first_num size to NCHUNKS_ORDER is confusing. the number of chunks is completely unrelated to the number of buddies. The patch limits the first_num to actual range of possible buddy indexes. and that is more reasonable and obvious without functional change. Link: http://lkml.kernel.org/r/1476776569-29504-1-git-send-email-zhongjiang@huawei.comSigned-off-by: zhong jiang <zhongjiang@huawei.com> Suggested-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Vitaly Wool <vitalywool@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Randy Dunlap authored
Delete stray (second) function description in find_lock_page() kernel-doc notation. Note: scripts/kernel-doc just ignores the second function description. Fixes: 2457aec6 ("mm: non-atomically mark page accessed during page cache allocation where possible") Link: http://lkml.kernel.org/r/b037e9a3-516c-ec02-6c8e-fa5479747ba6@infradead.orgSigned-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Sergey Senozhatsky authored
We had a deprecated_attr_warn() warning for 2 years and now the time has come and we finally can do the cleanup. The plan was as follows: : per-stat sysfs attributes are considered to be deprecated. : The basic strategy is: : -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11) : -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11) : : The list of deprecated attributes can be found here: : Documentation/ABI/obsolete/sysfs-block-zram : : Basically, every attribute that has its own read accessible sysfs : node (e.g. num_reads) *AND* is accessible via one of the stat files : (zram<id>/stat or zram<id>/io_stat or zram<id>/mm_stat) is considered : to be deprecated. The patch also removes `obsolete/sysfs-block-zram', clean ups `testing/sysfs-block-zram' and tweaks zram.txt files. Link: http://lkml.kernel.org/r/20170118035838.11090-1-sergey.senozhatsky@gmail.comSigned-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Miles Chen authored
There is no variable named flags in memblock_add() and memblock_reserve() so remove it from the log messages. This patch also cleans up the type casting for phys_addr_t by using %pa to print them. Link: http://lkml.kernel.org/r/1484720165-25403-1-git-send-email-miles.chen@mediatek.comSigned-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Logic on whether we can reap pages from the VMA should match what we have in madvise_dontneed(). In particular, we should skip, VM_PFNMAP VMAs, but we don't now. Let's just extract condition on which we can shoot down pagesi from a VMA with MADV_DONTNEED into separate function and use it in both places. Link: http://lkml.kernel.org/r/20170118122429.43661-4-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
There's no users of zap_page_range() who wants non-NULL 'details'. Let's drop it. Link: http://lkml.kernel.org/r/20170118122429.43661-3-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
detail == NULL would give the same functionality as .check_swap_entries==true. Link: http://lkml.kernel.org/r/20170118122429.43661-2-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
The only user of ignore_dirty is oom-reaper. But it doesn't really use it. ignore_dirty only has effect on file pages mapped with dirty pte. But oom-repear skips shared VMAs, so there's no way we can dirty file pte in them. Link: http://lkml.kernel.org/r/20170118122429.43661-1-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Rientjes authored
The patch "mm, page_alloc: warn_alloc print nodemask" implicitly sets the allocation nodemask to cpuset_current_mems_allowed when there is no effective mempolicy. cpuset_current_mems_allowed is only effective when cpusets are enabled, which is also printed by warn_alloc(), so setting the nodemask to cpuset_current_mems_allowed is redundant and prevents debugging issues where ac->nodemask is not set properly in the page allocator. This provides better debugging output since cpuset_print_current_mems_allowed() is already provided. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701181347320.142399@chino.kir.corp.google.comSigned-off-by: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Now that __GFP_NOFAIL doesn't override decisions to skip the oom killer we are left with requests which require to loop inside the allocator without invoking the oom killer (e.g. GFP_NOFS|__GFP_NOFAIL used by fs code) and so they might, in very unlikely situations, loop for ever - e.g. other parallel request could starve them. This patch tries to limit the likelihood of such a lockup by giving these __GFP_NOFAIL requests a chance to move on by consuming a small part of memory reserves. We are using ALLOC_HARDER which should be enough to prevent from the starvation by regular allocation requests, yet it shouldn't consume enough from the reserves to disrupt high priority requests (ALLOC_HIGH). While we are at it, let's introduce a helper __alloc_pages_cpuset_fallback which enforces the cpusets but allows to fallback to ignore them if the first attempt fails. __GFP_NOFAIL requests can be considered important enough to allow cpuset runaway in order for the system to move on. It is highly unlikely that any of these will be GFP_USER anyway. Link: http://lkml.kernel.org/r/20161220134904.21023-4-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
__alloc_pages_may_oom makes sure to skip the OOM killer depending on the allocation request. This includes lowmem requests, costly high order requests and others. For a long time __GFP_NOFAIL acted as an override for all those rules. This is not documented and it can be quite surprising as well. E.g. GFP_NOFS requests are not invoking the OOM killer but GFP_NOFS|__GFP_NOFAIL does so if we try to convert some of the existing open coded loops around allocator to nofail request (and we have done that in the past) then such a change would have a non trivial side effect which is far from obvious. Note that the primary motivation for skipping the OOM killer is to prevent from pre-mature invocation. The exception has been added by commit 82553a93 ("oom: invoke oom killer for __GFP_NOFAIL"). The changelog points out that the oom killer has to be invoked otherwise the request would be looping for ever. But this argument is rather weak because the OOM killer doesn't really guarantee a forward progress for those exceptional cases: - it will hardly help to form costly order which in turn can result in the system panic because of no oom killable task in the end - I believe we certainly do not want to put the system down just because there is a nasty driver asking for order-9 page with GFP_NOFAIL not realizing all the consequences. It is much better this request would loop for ever than the massive system disruption - lowmem is also highly unlikely to be freed during OOM killer - GFP_NOFS request could trigger while there is still a lot of memory pinned by filesystems. This patch simply removes the __GFP_NOFAIL special case in order to have a more clear semantic without surprising side effects. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Nils Holland <nholland@tisys.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Tetsuo Handa has pointed out that commit 0a0337e0 ("mm, oom: rework oom detection") has subtly changed semantic for costly high order requests with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now. My code inspection didn't reveal any such users in the tree but it is true that this might lead to unexpected allocation failures and subsequent OOPs. __alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently. There are few special cases but we are lacking a catch all place to be sure we will not miss any case where the non failing allocation might fail. This patch reorganizes the code a bit and puts all those special cases under nopage label which is the generic go-to-fail path. Non failing allocations are retried or those that cannot retry like non-sleeping allocation go to the failure point directly. This should make the code flow much easier to follow and make it less error prone for future changes. While we are there we have to move the stall check up to catch potentially looping non-failing allocations. [akpm@linux-foundation.org: fix alloc_flags may-be-used-uninitalized] Link: http://lkml.kernel.org/r/20161220134904.21023-2-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
show_mem() allows to filter out node specific data which is irrelevant to the allocation request via SHOW_MEM_FILTER_NODES. The filtering is done in skip_free_areas_node which skips all nodes which are not in the mems_allowed of the current process. This works most of the time as expected because the nodemask shouldn't be outside of the allocating task but there are some exceptions. E.g. memory hotplug might want to request allocations from outside of the allowed nodes (see new_node_page). Get rid of this hardcoded behavior and push the allocation mask down the show_mem path and use it instead of cpuset_current_mems_allowed. NULL nodemask is interpreted as cpuset_current_mems_allowed. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170117091543.25850-5-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
We have a generic implementation for quite some time already. If there is any arch specific information to be printed then we should add a callback called from the generic code rather than duplicate the whole show_mem. The current code has resulted in the code duplication and the output divergence which is both confusing and adds maintainance costs. Let's just get rid of this mess. Link: http://lkml.kernel.org/r/20170117091543.25850-4-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [UniCore32] Acked-by: Helge Deller <deller@gmx.de> [for parisc] Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile] Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
warn_alloc is currently used for to report an allocation failure or an allocation stall. We print some details of the allocation request like the gfp mask and the request order. We do not print the allocation nodemask which is important when debugging the reason for the allocation failure as well. We alreaddy print the nodemask in the OOM report. Add nodemask to warn_alloc and print it in warn_alloc as well. Link: http://lkml.kernel.org/r/20170117091543.25850-3-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Patch series "show_mem updates", v2. This is a mixture of one bug fix (patch 1), an enhancement (patch 2) and cleanups (the rest of the series). First two patches should be really straightforward. Patch 3 removes some arch specific show_mem implementations because I think they are quite outdated and do not really serve any useful purpose anymore. I think we should really strive to have a consistent show_mem output regardless of the architecture. If some architecture is really special and wants to dump something additional we should do that via an arch specific hook. The last patch adds nodemask parameter so that we do not rely on the hardcoded mems_allowed of the current task when doing the node filtering. I consider this more a cleanup than a fix because basically all users use a nodemask which is a subset of mems_allowed. There is only one call path in the memory hotplug which doesn't comply with this but that is hardly something to worry about. This patch (of 4): Commit 599d0c95 ("mm, vmscan: move LRU lists to node") has added per numa node statistics to show_mem but it forgot to add skip_free_areas_node to filter out nodes which are outside of the allocating task numa policy. Add this check to not pollute the output with the pointless information. Link: http://lkml.kernel.org/r/20170117091543.25850-2-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-