- 28 May, 2016 6 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull more i2c updates from Wolfram Sang: "Here is the second pull request from I2C for this merge window: - one new feature (which nearly fell through the cracks): i2c-dev does now use the cdev API so it can handle >256 minors. Seems people do need that. - two fixes for the just added DMA feature for i2c-rcar - some typo fixes" * 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: dev: don't start function name with 'return' i2c: dev: switch from register_chrdev to cdev API i2c: xlr: rename ARCH_TANGOX to ARCH_TANGO i2c: at91: change log when dma configuration fails misc: at24: Fix typo in at24 header file i2c: rcar: should depend on HAS_DMA i2c: rcar: use dma_request_chan()
-
git://git.kernel.org/pub/scm/linux/kernel/git/rw/umlLinus Torvalds authored
Pull UML updates from Richard Weinberger: "This contains a nice FPU fixup from Eli Cooper for UML" * 'for-linus-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: add extended processor state save/restore support um: extend fpstate to _xstate to support YMM registers um: fix FPU state preservation around signal handlers
-
git://git.infradead.org/linux-ubifsLinus Torvalds authored
Pull UBI/UBIFS updates from Richard Weinberger: "This contains mostly cleanups and minor improvements of UBI and UBIFS" * tag 'upstream-4.7-rc1' of git://git.infradead.org/linux-ubifs: ubifs: ubifs_dump_inode: Fix dumping field bulk_read UBI: Fix static volume checks when Fastmap is used UBI: Set free_count to zero before walking through erase list UBI: Silence an unintialized variable warning UBI: Clean up return in ubi_remove_volume() UBI: Modify wrong comment in ubi_leb_map function. UBI: Don't read back all data in ubi_eba_copy_leb() UBI: Add ro-mode sysfs attribute
-
Linus Torvalds authored
Older versions of gcc don't understand named initializers inside a anonymous structure or union member. It can be worked around by adding the bracin gin the initializer for the anonymous member. Without this, gcc 4.4.4 will fail the build with CC fs/nfs/nfs4state.o fs/nfs/nfs4state.c:69: error: unknown field ‘data’ specified in initializer fs/nfs/nfs4state.c:69: warning: missing braces around initializer fs/nfs/nfs4state.c:69: warning: (near initialization for ‘zero_stateid.<anonymous>.data’) make[2]: *** [fs/nfs/nfs4state.o] Error 1 introduced in commit 93b717fd ("NFSv4: Label stateids with the type") Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Anna Schumaker <Anna.Schumaker@netapp.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull vfs fixes from Al Viro: "Followups to the parallel lookup work: - update docs - restore killability of the places that used to take ->i_mutex killably now that we have down_write_killable() merged - Additionally, it turns out that I missed a prerequisite for security_d_instantiate() stuff - ->getxattr() wasn't the only thing that could be called before dentry is attached to inode; with smack we needed the same treatment applied to ->setxattr() as well" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->setxattr() to passing dentry and inode separately switch xattr_handler->set() to passing dentry and inode separately restore killability of old mutex_lock_killable(&inode->i_mutex) users add down_write_killable_nested() update D/f/directory-locking
-
Al Viro authored
smack ->d_instantiate() uses ->setxattr(), so to be able to call it before we'd hashed the new dentry and attached it to inode, we need ->setxattr() instances getting the inode as an explicit argument rather than obtaining it from dentry. Similar change for ->getxattr() had been done in commit ce23e640. Unlike ->getxattr() (which is used by both selinux and smack instances of ->d_instantiate()) ->setxattr() is used only by smack one and unfortunately it got missed back then. Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 27 May, 2016 34 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfsLinus Torvalds authored
Pull overlayfs update from Miklos Szeredi: "The meat of this is a change to use the mounter's credentials for operations that require elevated privileges (such as whiteout creation). This fixes behavior under user namespaces as well as being a nice cleanup" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: Do d_type check only if work dir creation was successful ovl: update documentation ovl: override creds with the ones from the superblock mounter
-
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds authored
Pull btrfs cleanups and fixes from Chris Mason: "We have another round of fixes and a few cleanups. I have a fix for short returns from btrfs_copy_from_user, which finally nails down a very hard to find regression we added in v4.6. Dave is pushing around gfp parameters, mostly to cleanup internal apis and make it a little more consistent. The rest are smaller fixes, and one speelling fixup patch" * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (22 commits) Btrfs: fix handling of faults from btrfs_copy_from_user btrfs: fix string and comment grammatical issues and typos btrfs: scrub: Set bbio to NULL before calling btrfs_map_block Btrfs: fix unexpected return value of fiemap Btrfs: free sys_array eb as soon as possible btrfs: sink gfp parameter to convert_extent_bit btrfs: make state preallocation more speculative in __set_extent_bit btrfs: untangle gotos a bit in convert_extent_bit btrfs: untangle gotos a bit in __clear_extent_bit btrfs: untangle gotos a bit in __set_extent_bit btrfs: sink gfp parameter to set_record_extent_bits btrfs: sink gfp parameter to set_extent_new btrfs: sink gfp parameter to set_extent_defrag btrfs: sink gfp parameter to set_extent_delalloc btrfs: sink gfp parameter to clear_extent_dirty btrfs: sink gfp parameter to clear_record_extent_bits btrfs: sink gfp parameter to clear_extent_bits btrfs: sink gfp parameter to set_extent_bits btrfs: make find_workspace warn if there are no workspaces btrfs: make find_workspace always succeed ...
-
Linus Torvalds authored
Now that the allmodconfig x86-64 build is clean wrt IS_ERR_VALUE() uses on integers, add a cast to a pointer and back to the argument, so that any new mis-uses of IS_ERR_VALUE() will cause warnings like warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] so that we don't re-introduce any bogus uses. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
The do_brk() and vm_brk() return value was "unsigned long" and returned the starting address on success, and an error value on failure. The reasons are entirely historical, and go back to it basically behaving like the mmap() interface does. However, nobody actually wanted that interface, and it causes totally pointless IS_ERR_VALUE() confusion. What every single caller actually wants is just the simpler integer return of zero for success and negative error number on failure. So just convert to that much clearer and more common calling convention, and get rid of all the IS_ERR_VALUE() uses wrt vm_brk(). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arnd Bergmann authored
Most users of IS_ERR_VALUE() in the kernel are wrong, as they pass an 'int' into a function that takes an 'unsigned long' argument. This happens to work because the type is sign-extended on 64-bit architectures before it gets converted into an unsigned type. However, anything that passes an 'unsigned short' or 'unsigned int' argument into IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers and types that are wider than 'unsigned long'. Andrzej Hajda has already fixed a lot of the worst abusers that were causing actual bugs, but it would be nice to prevent any users that are not passing 'unsigned long' arguments. This patch changes all users of IS_ERR_VALUE() that I could find on 32-bit ARM randconfig builds and x86 allmodconfig. For the moment, this doesn't change the definition of IS_ERR_VALUE() because there are probably still architecture specific users elsewhere. Almost all the warnings I got are for files that are better off using 'if (err)' or 'if (err < 0)'. The only legitimate user I could find that we get a warning for is the (32-bit only) freescale fman driver, so I did not remove the IS_ERR_VALUE() there but changed the type to 'unsigned long'. For 9pfs, I just worked around one user whose calling conventions are so obscure that I did not dare change the behavior. I was using this definition for testing: #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \ unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO)) which ends up making all 16-bit or wider types work correctly with the most plausible interpretation of what IS_ERR_VALUE() was supposed to return according to its users, but also causes a compile-time warning for any users that do not pass an 'unsigned long' argument. I suggested this approach earlier this year, but back then we ended up deciding to just fix the users that are obviously broken. After the initial warning that caused me to get involved in the discussion (fs/gfs2/dir.c) showed up again in the mainline kernel, Linus asked me to send the whole thing again. [ Updated the 9p parts as per Al Viro - Linus ] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.org/lkml/2016/1/7/363 Link: https://lkml.org/lkml/2016/5/27/486 Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> # For nvmem part Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
The register_page_bootmem_info_node() function needs to be marked __init in order to avoid a new warning introduced by commit f65e91df ("mm: use early_pfn_to_nid in register_page_bootmem_info_node"). Otherwise you'll get a warning about how a non-init function calls early_pfn_to_nid (which is __meminit) Cc: Yang Shi <yang.shi@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Merge misc updates and fixes from Andrew Morton: - late-breaking ocfs2 updates - random bunch of fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM mm/memcontrol.c: move comments for get_mctgt_type() to proper position mm/memcontrol.c: fix the margin computation in mem_cgroup_margin() mm/cma: silence warnings due to max() usage mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap() oom_reaper: close race with exiting task mm: use early_pfn_to_nid in register_page_bootmem_info_node mm: use early_pfn_to_nid in page_ext_init MAINTAINERS: Kdump maintainers update MAINTAINERS: add kexec_core.c and kexec_file.c mm: oom: do not reap task if there are live threads in threadgroup direct-io: fix direct write stale data exposure from concurrent buffered read ocfs2: bump up o2cb network protocol version ocfs2: o2hb: fix hb hung time ocfs2: o2hb: don't negotiate if last hb fail ocfs2: o2hb: add some user/debug log ocfs2: o2hb: add NEGOTIATE_APPROVE message ocfs2: o2hb: add NEGO_TIMEOUT message ocfs2: o2hb: add negotiate timer
-
Gavin Shan authored
When we have !NO_BOOTMEM, the deferred page struct initialization doesn't work well because the pages reserved in bootmem are released to the page allocator uncoditionally. It causes memory corruption and system crash eventually. As Mel suggested, the bootmem is retiring slowly. We fix the issue by simply hiding DEFERRED_STRUCT_PAGE_INIT when bootmem is enabled. Link: http://lkml.kernel.org/r/1460602170-5821-1-git-send-email-gwshan@linux.vnet.ibm.comSigned-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Li RongQing authored
Move the comments for get_mctgt_type() to be before get_mctgt_type() implementation. Link: http://lkml.kernel.org/r/1463644638-7446-1-git-send-email-roy.qing.li@gmail.comSigned-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Li RongQing authored
mem_cgroup_margin() might return (memory.limit - memory_count) when the memsw.limit is in excess. This doesn't happen usually because we do not allow excess on hard limits and (memory.limit <= memsw.limit), but __GFP_NOFAIL charges can force the charge and cause the excess when no memory is really swappable (swap is full or no anonymous memory is left). [mhocko@suse.com: rewrote changelog] Link: http://lkml.kernel.org/r/20160525155122.GK20132@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1464068266-27736-1-git-send-email-roy.qing.li@gmail.comSigned-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Stephen Rothwell authored
pageblock_order can be (at least) an unsigned int or an unsigned long depending on the kernel config and architecture, so use max_t(unsigned long, ...) when comparing it. fixes these warnings: In file included from include/asm-generic/bug.h:13:0, from arch/powerpc/include/asm/bug.h:127, from include/linux/bug.h:4, from include/linux/mmdebug.h:4, from include/linux/mm.h:8, from include/linux/memblock.h:18, from mm/cma.c:28: mm/cma.c: In function 'cma_init_reserved_mem': include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast (void) (&_max1 == &_max2); ^ mm/cma.c:186:27: note: in expansion of macro 'max' alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order); ^ mm/cma.c: In function 'cma_declare_contiguous': include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast (void) (&_max1 == &_max2); ^ include/linux/kernel.h:747:9: note: in definition of macro 'max' typeof(y) _max2 = (y); ^ mm/cma.c:270:29: note: in expansion of macro 'max' (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order)); ^ include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast (void) (&_max1 == &_max2); ^ include/linux/kernel.h:747:21: note: in definition of macro 'max' typeof(y) _max2 = (y); ^ mm/cma.c:270:29: note: in expansion of macro 'max' (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order)); ^ [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20160526150748.5be38a4f@canb.auug.org.auSigned-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
If page_move_anon_rmap() is refiling a pmd-splitted THP mapped in a tail page from a pte, the "address" must be THP aligned in order for the page->index bugcheck to pass in the CONFIG_DEBUG_VM=y builds. Link: http://lkml.kernel.org/r/1464253620-106404-1-git-send-email-kirill.shutemov@linux.intel.com Fixes: 6d0a07ed ("mm: thp: calculate the mapcount correctly for THP pages during WP faults") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Tetsuo has reported: Out of memory: Kill process 443 (oleg's-test) score 855 or sacrifice child Killed process 443 (oleg's-test) total-vm:493248kB, anon-rss:423880kB, file-rss:4kB, shmem-rss:0kB sh invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=0 sh cpuset=/ mems_allowed=0 CPU: 2 PID: 1 Comm: sh Not tainted 4.6.0-rc7+ #51 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Call Trace: dump_stack+0x85/0xc8 dump_header+0x5b/0x394 oom_reaper: reaped process 443 (oleg's-test), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB In other words: __oom_reap_task exit_mm atomic_inc_not_zero tsk->mm = NULL mmput atomic_dec_and_test # > 0 exit_oom_victim # New victim will be # selected <OOM killer invoked> # no TIF_MEMDIE task so we can select a new one unmap_page_range # to release the memory The race exists even without the oom_reaper because anybody who pins the address space and gets preempted might race with exit_mm but oom_reaper made this race more probable. We can address the oom_reaper part by using oom_lock for __oom_reap_task because this would guarantee that a new oom victim will not be selected if the oom reaper might race with the exit path. This doesn't solve the original issue, though, because somebody else still might be pinning mm_users and so __mmput won't be called to release the memory but that is not really realiably solvable because the task will get away from the oom sight as soon as it is unhashed from the task_list and so we cannot guarantee a new victim won't be selected. [akpm@linux-foundation.org: fix use of unused `mm', Per Stephen] [akpm@linux-foundation.org: coding-style fixes] Fixes: aac45363 ("mm, oom: introduce oom reaper") Link: http://lkml.kernel.org/r/1464271493-20008-1-git-send-email-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yang Shi authored
register_page_bootmem_info_node() is invoked in mem_init(), so it will be called before page_alloc_init_late() if DEFERRED_STRUCT_PAGE_INIT is enabled. But, pfn_to_nid() depends on memmap which won't be fully setup until page_alloc_init_late() is done, so replace pfn_to_nid() by early_pfn_to_nid(). Link: http://lkml.kernel.org/r/1464210007-30930-1-git-send-email-yang.shi@linaro.orgSigned-off-by: Yang Shi <yang.shi@linaro.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yang Shi authored
page_ext_init() checks suitable pages with pfn_to_nid(), but pfn_to_nid() depends on memmap which will not be setup fully until page_alloc_init_late() is done. Use early_pfn_to_nid() instead of pfn_to_nid() so that page extension could be still used early even though CONFIG_ DEFERRED_STRUCT_PAGE_INIT is enabled and catch early page allocation call sites. Suggested by Joonsoo Kim [1], this fix basically undoes the change introduced by commit b8f1a75d ("mm: call page_ext_init() after all struct pages are initialized") and fixes the same problem with a better approach. [1] http://lkml.kernel.org/r/CAAmzW4OUmyPwQjvd7QUfc6W1Aic__TyAuH80MLRZNMxKy0-wPQ@mail.gmail.com Link: http://lkml.kernel.org/r/1464198689-23458-1-git-send-email-yang.shi@linaro.orgSigned-off-by: Yang Shi <yang.shi@linaro.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vivek Goyal authored
I am proposing following updates to kdump maintainership. I have got busy in other things and not getting time to spend on kdump. Remove Haren Myneni as he has not participated in kdump development for a long time now. Add the names of Dave and Baoquan as kdump maintainers as they have been contributing to kdump for a long time now and they are in a much better position to spend time on this than me. Mark myself as a reviewer. Link: http://lkml.kernel.org/r/20160525131616.GB27291@redhat.comSigned-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Simon Horman <horms@verge.net.au> Cc: Haren Myneni <hbabu@us.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minfei Huang authored
In the below commits kexec.c was split to kexec.c, kexec_file.c and kexec_core.c. commit a43cac0d ("kexec: split kexec_file syscall code to kexec_file.c") commit 2965faa5 ("kexec: split kexec_load syscall from kexec core code") Both kexec_file.c and kexec_core.c still belong to the kexec component. In order to get correct mail lists by using the script get_maintainer.pl, add these files to MAINTAINERS. Link: http://lkml.kernel.org/r/1464189735-59113-1-git-send-email-mnghuan@gmail.comSigned-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vladimir Davydov authored
If the current process is exiting, we don't invoke oom killer, instead we give it access to memory reserves and try to reap its mm in case nobody is going to use it. There's a mistake in the code performing this check - we just ignore any process of the same thread group no matter if it is exiting or not - see try_oom_reaper. Fix it. Link: http://lkml.kernel.org/r/1464087628-7318-1-git-send-email-vdavydov@virtuozzo.com Fixes: 3ef22dff ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path")Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Eryu Guan authored
Currently direct writes inside i_size on a DIO_SKIP_HOLES filesystem are not allowed to allocate blocks(get_more_blocks() sets 'create' to 0 before calling get_block() callback), if it's a sparse file, direct writes fall back to buffered writes to avoid stale data exposure from concurrent buffered read. But there're two cases that can result in stale data exposure are not correctly detected. 1. The detection for "writing inside i_size" is not sufficient, writes can be treated as "extending writes" wrongly. For example, direct write 1FSB (file system block) to a 1FSB sparse file on ext2/3/4, starting from offset 0, in this case it's writing inside i_size, but 'create' is non-zero, because 'block_in_file' and '(i_size_read(inode) >> blkbits' are both zero. 2. Direct writes starting from or beyong i_size (not inside i_size) also could trigger block allocation and expose stale data. For example, consider a sparse file with i_size of 2k, and a write to offset 2k or 3k into the file, with a filesystem block size of 4k. (Thanks to Jeff Moyer for pointing this case out in his review.) The first problem can be demostrated by running ltp-aiodio test ADSP045 many times. When testing on extN filesystems, I see test failures occasionally, buffered read could read non-zero (stale) data. ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 1 dio_sparse 0 TINFO : Dirtying free blocks dio_sparse 0 TINFO : Starting I/O tests non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa non-zero read at offset 0 dio_sparse 0 TINFO : Killing childrens(s) dio_sparse 1 TFAIL : dio_sparse.c:191: 1 children(s) exited abnormally The second problem can also be reproduced easily by a hacked dio_sparse program, which accepts an option to specify the write offset. What we should really do is to disable block allocation for writes that could result in filling holes inside i_size. Link: http://lkml.kernel.org/r/1463156728-13357-1-git-send-email-guaneryu@gmail.comReviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Junxiao Bi authored
Two new messages are added to support negotiating hb timeout. Stop nodes frmo talking an old version to mount as they will cause the negotiation to fail. Link: http://lkml.kernel.org/r/1464231615-27939-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Junxiao Bi authored
hr_last_timeout_start should be set as the last time where hb is still OK. When hb write timeout, hung time will be (jiffies - hr_last_timeout_start). Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Gang He <ghe@suse.com> Cc: rwxybh <rwxybh@126.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Junxiao Bi authored
Sometimes io error is returned when storage is down for a while. Like for iscsi device, stroage is made offline when session timeout, and this will make all io return -EIO. For this case, nodes shouldn't do negotiate timeout but should fence self. So let nodes fence self when o2hb_do_disk_heartbeat return an error, this is the same behavior with o2hb without negotiate timer. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Gang He <ghe@suse.com> Cc: rwxybh <rwxybh@126.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Junxiao Bi authored
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Gang He <ghe@suse.com> Cc: rwxybh <rwxybh@126.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Junxiao Bi authored
This message is used to re-queue write timeout timer and negotiate timer when all nodes suffer a write hung to storage, this makes node not fence self if storage down. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Gang He <ghe@suse.com> Cc: rwxybh <rwxybh@126.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Junxiao Bi authored
This message is sent to master node when non-master nodes's negotiate timer expired. Master node records these nodes in a bitmap which is used to do write timeout timer re-queue decision. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Gang He <ghe@suse.com> Cc: rwxybh <rwxybh@126.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Junxiao Bi authored
This series of patches is to fix the issue that when storage down, all nodes will fence self due to write timeout. With this patch set, all nodes will keep going until storage back online, except if the following issue happens, then all nodes will do as before to fence self. 1. io error got 2. network between nodes down 3. nodes panic This patch (of 6): When storage down, all nodes will fence self due to write timeout. The negotiate timer is designed to avoid this, with it node will wait until storage up again. Negotiate timer working in the following way: 1. The timer expires before write timeout timer, its timeout is half of write timeout now. It is re-queued along with write timeout timer. If expires, it will send NEGO_TIMEOUT message to master node(node with lowest node number). This message does nothing but marks a bit in a bitmap recording which nodes are negotiating timeout on master node. 2. If storage down, nodes will send this message to master node, then when master node finds its bitmap including all online nodes, it sends NEGO_APPROVL message to all nodes one by one, this message will re-queue write timeout timer and negotiate timer. For any node doesn't receive this message or meets some issue when handling this message, it will be fenced. If storage up at any time, o2hb_thread will run and re-queue all the timer, nothing will be affected by these two steps. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Gang He <ghe@suse.com> Cc: rwxybh <rwxybh@126.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "A set of fixes that wasn't included in the first merge window pull request. This pull request contains: - A set of NVMe fixes from Keith, and one from Nic for the integrity side of it. - Fix from Ming, clearing ->mq_ops if we don't successfully setup a queue for multiqueue. - A set of stability fixes for bcache from Jiri, and also marking bcache as orphaned as it's no longer actively maintained (in mainline, at least)" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: clear q->mq_ops if init fail MAINTAINERS: mark bcache as orphan bcache: bch_gc_thread() is not freezable bcache: bch_allocator_thread() is not freezable bcache: bch_writeback_thread() is not freezable nvme/host: Add missing blk_integrity tag_size + flags assignments NVMe: Add device ID's with stripe quirk NVMe: Short-cut removal on surprise hot-unplug NVMe: Allow user initiated rescan NVMe: Reduce driver log spamming NVMe: Unbind driver on failure NVMe: Delete only created queues NVMe: Allocate queues only for online cpus
-
git://git.infradead.org/linux-mtdLinus Torvalds authored
Pull MTD fixes from Brian Norris: "We've already noticed a few flaws in the MTD work for v4.7-rc1: - The Atmel folks got ahead of themselves on trying to support their latest hardware and were working off incorrect documentation. Fix up the NAND driver to get this correct. - Fix up device tree example documentation to use the latest recommendations for describing NAND ECC algorithms" * tag 'for-linus-20160527' of git://git.infradead.org/linux-mtd: Documentation: dt: mtd: drop "soft_bch" from example Revert "mtd: atmel_nand: Support variable RB_EDGE interrupts"
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
Pull drm fixes from Dave Airlie: - one IMX built-in regression fix - a set of amdgpu fixes, mostly powerplay and polaris GPU stuff - a set of i915 fixes all over, many cc'ed to stable. The i915 batch contain support for DP++ dongle detection, which is used to fix some regressions in the HDMI color depth area * tag 'drm-fixes-v4.7-rc1' of git://people.freedesktop.org/~airlied/linux: (44 commits) drm/amd: add Kconfig dependency for ACP on DRM_AMDGPU drm/amdgpu: Fix hdmi deep color support. drm/amdgpu: fix bug in fence driver fini drm/i915: Stop automatically retiring requests after a GPU hang drm/i915: Unify intel_ring_begin() drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2) drm/i915/psr: Try to program link training times correctly drm/imx: Match imx-ipuv3-crtc components using device node in platform data drm/i915/bxt: Adjusting the error in horizontal timings retrieval drm/i915: Don't leave old junk in ilk active watermarks on readout drm/i915: s/DPPL/DPLL/ for SKL DPLLs drm/i915: Fix gen8 semaphores id for legacy mode drm/i915: Set crtc_state->lane_count for HDMI drm/i915/BXT: Retrieving the horizontal timing for DSI drm/i915: Protect gen7 irq_seqno_barrier with uncore lock drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as needed drm/i915: Respect DP++ adaptor TMDS clock limit drm: Add helper for DP++ adaptors ...
-
Linus Torvalds authored
Merge tag 'platform-drivers-x86-v4.7-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver updates from Darren Hart: "Mostly minor updates and cleanups. One new power management controller driver for Intel Core SoCs. platform/x86: - Add PMC Driver for Intel Core SoC dell-rbtn: - Ignore ACPI notifications if device is suspended thinkpad_acpi: - save kbdlight state on suspend and restore it on resume intel_menlow: - reduce code duplication asus-wmi: - provide access to ALS control ideapad-laptop: - add a new WMI string for ESC key surfacepro3_button: - Add a warning when switching to tablet mode sony-laptop: - Avoid oops on module unload for older laptops intel_telemetry: - Constify telemetry_core_ops structures fujitsu-laptop: - Use IS_ENABLED() instead of checking for built-in or module asus-laptop: - correct error handling in sysfs_acpi_set - remove redundant initializers - correct error handling in asus_read_brightness() fujitsu-laptop: - Support radio LED" * tag 'platform-drivers-x86-v4.7-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: platform/x86: Add PMC Driver for Intel Core SoC dell-rbtn: Ignore ACPI notifications if device is suspended thinkpad_acpi: save kbdlight state on suspend and restore it on resume intel_menlow: reduce code duplication asus-wmi: provide access to ALS control ideapad-laptop: add a new WMI string for ESC key surfacepro3_button: Add a warning when switching to tablet mode sony-laptop: Avoid oops on module unload for older laptops intel_telemetry: Constify telemetry_core_ops structures fujitsu-laptop: Use IS_ENABLED() instead of checking for built-in or module asus-laptop: correct error handling in sysfs_acpi_set asus-laptop: remove redundant initializers asus-laptop: correct error handling in asus_read_brightness() fujitsu-laptop: Support radio LED
-
git://git.infradead.org/intel-iommuLinus Torvalds authored
Pull intel IOMMU updates from David Woodhouse: "This patchset improves the scalability of the Intel IOMMU code by resolving two spinlock bottlenecks and eliminating the linearity of the IOVA allocator, yielding up to ~5x performance improvement and approaching 'iommu=off' performance" * git://git.infradead.org/intel-iommu: iommu/vt-d: Use per-cpu IOVA caching iommu/iova: introduce per-cpu caching to iova allocation iommu/vt-d: change intel-iommu to use IOVA frame numbers iommu/vt-d: avoid dev iotlb logic for domains with no dev iotlbs iommu/vt-d: only unmap mapped entries iommu/vt-d: correct flush_unmaps pfn usage iommu/vt-d: per-cpu deferred invalidation queues iommu/vt-d: refactoring of deferred flush entries
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull second batch of KVM updates from Radim Krčmář: "General: - move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat (kvm_stat had nothing to do with QEMU in the first place -- the tool only interprets debugfs) - expose per-vm statistics in debugfs and support them in kvm_stat (KVM always collected per-vm statistics, but they were summarised into global statistics) x86: - fix dynamic APICv (VMX was improperly configured and a guest could access host's APIC MSRs, CVE-2016-4440) - minor fixes ARM changes from Christoffer Dall: - new vgic reimplementation of our horribly broken legacy vgic implementation. The two implementations will live side-by-side (with the new being the configured default) for one kernel release and then we'll remove the legacy one. - fix for a non-critical issue with virtual abort injection to guests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits) tools: kvm_stat: Add comments tools: kvm_stat: Introduce pid monitoring KVM: Create debugfs dir and stat files for each VM MAINTAINERS: Add kvm tools tools: kvm_stat: Powerpc related fixes tools: Add kvm_stat man page tools: Add kvm_stat vm monitor script kvm:vmx: more complete state update on APICv on/off KVM: SVM: Add more SVM_EXIT_REASONS KVM: Unify traced vector format svm: bitwise vs logical op typo KVM: arm/arm64: vgic-new: Synchronize changes to active state KVM: arm/arm64: vgic-new: enable build KVM: arm/arm64: vgic-new: implement mapped IRQ handling KVM: arm/arm64: vgic-new: Wire up irqfd injection KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable KVM: arm/arm64: vgic-new: vgic_init: implement map_resources KVM: arm/arm64: vgic-new: vgic_init: implement vgic_init KVM: arm/arm64: vgic-new: vgic_init: implement vgic_create KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init ...
-
Al Viro authored
preparation for similar switch in ->setxattr() (see the next commit for rationale). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Rajneesh Bhardwaj authored
This patch adds the Power Management Controller driver as a PCI driver for Intel Core SoC architecture. This driver can utilize debugging capabilities and supported features as exposed by the Power Management Controller. Please refer to the below specification for more details on PMC features. http://www.intel.in/content/www/in/en/chipsets/100-series-chipset-datasheet-vol-2.html The current version of this driver exposes SLP_S0_RESIDENCY counter. This counter can be used for detecting fragile SLP_S0 signal related failures and take corrective actions when PCH SLP_S0 signal is not asserted after kernel freeze as part of suspend to idle flow (echo freeze > /sys/power/state). Intel Platform Controller Hub (PCH) asserts SLP_S0 signal when it detects favorable conditions to enter its low power mode. As a pre-requisite the SoC should be in deepest possible Package C-State and devices should be in low power mode. For example, on Skylake SoC the deepest Package C-State is Package C10 or PC10. Suspend to idle flow generally leads to PC10 state but PC10 state may not be sufficient for realizing the platform wide power potential which SLP_S0 signal assertion can provide. SLP_S0 signal is often connected to the Embedded Controller (EC) and the Power Management IC (PMIC) for other platform power management related optimizations. In general, SLP_S0 assertion == PC10 + PCH low power mode + ModPhy Lanes power gated + PLL Idle. As part of this driver, a mechanism to read the SLP_S0_RESIDENCY is exposed as an API and also debugfs features are added to indicate SLP_S0 signal assertion residency in microseconds. echo freeze > /sys/power/state wake the system cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> Signed-off-by: Vishwanath Somayaji <vishwanath.somayaji@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
-