- 12 Apr, 2016 1 commit
-
-
Nicolai Stange authored
Nothing prevents a dentry found by path lookup before a return of __debugfs_remove() to actually get opened after that return. Now, after the return of __debugfs_remove(), there are no guarantees whatsoever regarding the memory the corresponding inode's file_operations object had been kept in. Since __debugfs_remove() is seldomly invoked, usually from module exit handlers only, the race is hard to trigger and the impact is very low. A discussion of the problem outlined above as well as a suggested solution can be found in the (sub-)thread rooted at http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk ("Yet another pipe related oops.") Basically, Greg KH suggests to introduce an intermediate fops and Al Viro points out that a pointer to the original ones may be stored in ->d_fsdata. Follow this line of reasoning: - Add SRCU as a reverse dependency of DEBUG_FS. - Introduce a srcu_struct object for the debugfs subsystem. - In debugfs_create_file(), store a pointer to the original file_operations object in ->d_fsdata. - Make debugfs_remove() and debugfs_remove_recursive() wait for a SRCU grace period after the dentry has been delete()'d and before they return to their callers. - Introduce an intermediate file_operations object named "debugfs_open_proxy_file_operations". It's ->open() functions checks, under the protection of a SRCU read lock, whether the dentry is still alive, i.e. has not been d_delete()'d and if so, tries to acquire a reference on the owning module. On success, it sets the file object's ->f_op to the original file_operations and forwards the ongoing open() call to the original ->open(). - For clarity, rename the former debugfs_file_operations to debugfs_noop_file_operations -- they are in no way canonical. The choice of SRCU over "normal" RCU is justified by the fact, that the former may also be used to protect ->i_private data from going away during the execution of a file's readers and writers which may (and do) sleep. Finally, introduce the fs/debugfs/internal.h header containing some declarations internal to the debugfs implementation. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 29 Mar, 2016 7 commits
-
-
Deepa Dinamani authored
This is in preparation for the series that transitions filesystem timestamps to use 64 bit time and hence make them y2038 safe. CURRENT_TIME macro will be deleted before merging the aforementioned series. Use current_fs_time() instead of CURRENT_TIME for inode timestamps. struct kernfs_node is associated with a sysfs file/ directory. Truncate the values to appropriate time granularity when writing to inode timestamps of the files. ktime_get_real_ts() is used to obtain times for struct kernfs_iattrs. Since these times are later assigned to inode times using timespec_truncate() for all filesystem based operations, we can save the supers list traversal time here by using ktime_get_real_ts() directly. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Deepa Dinamani authored
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Roman Pen authored
Directory inodes should start off with i_nlink == 2 (one extra ref for "." entry). debugfs_create_automount() increases neither the i_nlink reference for current inode nor for parent inode. On attempt to remove the automount dentry, kernel complains: [ 86.288070] WARNING: CPU: 1 PID: 3616 at fs/inode.c:273 drop_nlink+0x3e/0x50() [ 86.288461] Modules linked in: debugfs_example2(O-) [ 86.288745] CPU: 1 PID: 3616 Comm: rmmod Tainted: G O 4.4.0-rc3-next-20151207+ #135 [ 86.289197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150617_082717-anatol 04/01/2014 [ 86.289696] ffffffff81be05c9 ffff8800b9e6fda0 ffffffff81352e2c 0000000000000000 [ 86.290110] ffff8800b9e6fdd8 ffffffff81065142 ffff8801399175e8 ffff8800bb78b240 [ 86.290507] ffff8801399175e8 ffff8800b73d7898 ffff8800b73d7840 ffff8800b9e6fde8 [ 86.290933] Call Trace: [ 86.291080] [<ffffffff81352e2c>] dump_stack+0x4e/0x82 [ 86.291340] [<ffffffff81065142>] warn_slowpath_common+0x82/0xc0 [ 86.291640] [<ffffffff8106523a>] warn_slowpath_null+0x1a/0x20 [ 86.291932] [<ffffffff811ae62e>] drop_nlink+0x3e/0x50 [ 86.292208] [<ffffffff811ba35b>] simple_unlink+0x4b/0x60 [ 86.292481] [<ffffffff811ba3a7>] simple_rmdir+0x37/0x50 [ 86.292748] [<ffffffff812d9808>] __debugfs_remove.part.16+0xa8/0xd0 [ 86.293082] [<ffffffff812d9a0b>] debugfs_remove_recursive+0xdb/0x1c0 [ 86.293406] [<ffffffffa00004dd>] cleanup_module+0x2d/0x3b [debugfs_example2] [ 86.293762] [<ffffffff810d959b>] SyS_delete_module+0x16b/0x220 [ 86.294077] [<ffffffff818ef857>] entry_SYSCALL_64_fastpath+0x12/0x6a [ 86.294405] ---[ end trace c9fc53353fe14a36 ]--- [ 86.294639] ------------[ cut here ]------------ To reproduce the issue it is enough to invoke these lines: autom = debugfs_create_automount("automount", NULL, vfsmount_cb, data); BUG_ON(IS_ERR_OR_NULL(autom)); debugfs_remove(autom); The issue is fixed by increasing inode i_nlink references for current and parent inodes. Signed-off-by: Roman Pen <r.peniaev@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
William Breathitt Gray authored
Many motherboards utilize a LPC to ISA bridge in order to decode ISA-style port-mapped I/O addresses. This is particularly true for embedded motherboards supporting the PC/104 bus (a bus specification derived from ISA). These motherboards are now commonly running 64-bit x86 processors. The X86_32 dependency should be removed from the ISA bus configuration option in order to support these newer motherboards. A new config option, CONFIG_ISA_BUS, is introduced to allow for the compilation of the ISA bus driver independent of the CONFIG_ISA option. Devices which communicate via ISA-compatible buses can now be supported independent of the dependencies of the CONFIG_ISA option. Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Walleij authored
Alan is no longer maintaining this list through the Linux assigned numbers authority. Make it a collective document by referring to "the maintainers" in plural throughout, and naming the chardev and block layer maintainers in particular as parties of involvement. Cut down and remove some sections that pertained to the process of maintaining the list at lanana.org and contacting Alan directly. Make it clear that this document, in the kernel, is the master document. Also move paragraphs around so as to emphasize dynamic major number allocation. Remove paragraph on 2.6 deprecation, that tag no longer appears anywhere in the file. Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Walleij authored
Currently a dynamically allocated character device major is taken from 254 and downward. This mechanism is used for RTC, IIO and a few other subsystems. The kernel currently has no check prevening these dynamic allocations from eating into the assigned numbers at 233 and downward. In a recent test it was reported that so many dynamic device majors were used on a test server, that the major number for infiniband (231) was stolen. This occurred when allocating a new major number for GPIO chips. The error messages from the kernel were not helpful. (See: https://lkml.org/lkml/2016/2/14/124) This patch adds a defined lower limit of the dynamic major allocation region will henceforth emit a warning if we start to eat into the assigned numbers. It does not do any semantic changes and will not change the kernels behaviour: numbers will still continue to be stolen, but we will know from dmesg what is going on. This also updates the Documentation/devices.txt to clearly reflect that we are using this range of major numbers for dynamic allocation. Reported-by: Ying Huang <ying.huang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gabriel Somlo authored
Refrain from defining default fw_cfg register offsets on unsupported architectures -- throw an error instead. If QEMU were to add fw_cfg support on additional architectures, we should add them to the FW_CFG_SYSFS depends statement in drivers/firmware/Kconfig, and provide default values for register offsets in drivers/firmware/qemu_fw_cfg.c at that time. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gabriel Somlo <somlo@cmu.edu> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 26 Mar, 2016 15 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds authored
Pull Ceph updates from Sage Weil: "There is quite a bit here, including some overdue refactoring and cleanup on the mon_client and osd_client code from Ilya, scattered writeback support for CephFS and a pile of bug fixes from Zheng, and a few random cleanups and fixes from others" [ I already decided not to pull this because of it having been rebased recently, but ended up changing my mind after all. Next time I'll really hold people to it. Oh well. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits) libceph: use KMEM_CACHE macro ceph: use kmem_cache_zalloc rbd: use KMEM_CACHE macro ceph: use lookup request to revalidate dentry ceph: kill ceph_get_dentry_parent_inode() ceph: fix security xattr deadlock ceph: don't request vxattrs from MDS ceph: fix mounting same fs multiple times ceph: remove unnecessary NULL check ceph: avoid updating directory inode's i_size accidentally ceph: fix race during filling readdir cache libceph: use sizeof_footer() more ceph: kill ceph_empty_snapc ceph: fix a wrong comparison ceph: replace CURRENT_TIME by current_fs_time() ceph: scattered page writeback libceph: add helper that duplicates last extent operation libceph: enable large, variable-sized OSD requests libceph: osdc->req_mempool should be backed by a slab pool libceph: make r_request msg_size calculation clearer ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linuxLinus Torvalds authored
Pull orangefs filesystem from Mike Marshall. This finally merges the long-pending orangefs filesystem, which has been much cleaned up with input from Al Viro over the last six months. From the documentation file: "OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal for large storage problems faced by HPC, BigData, Streaming Video, Genomics, Bioinformatics. Orangefs, originally called PVFS, was first developed in 1993 by Walt Ligon and Eric Blumer as a parallel file system for Parallel Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns of parallel programs. Orangefs features include: - Distributes file data among multiple file servers - Supports simultaneous access by multiple clients - Stores file data and metadata on servers using local file system and access methods - Userspace implementation is easy to install and maintain - Direct MPI support - Stateless" see Documentation/filesystems/orangefs.txt for more in-depth details. * tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (174 commits) orangefs: fix orangefs_superblock locking orangefs: fix do_readv_writev() handling of error halfway through orangefs: have ->kill_sb() evict the VFS side of things first orangefs: sanitize ->llseek() orangefs-bufmap.h: trim unused junk orangefs: saner calling conventions for getting a slot orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer orangefs: get rid of readdir_handle_s ornagefs: ensure that truncate has an up to date inode size orangefs: move code which sets i_link to orangefs_inode_getattr orangefs: remove needless wrapper around GFP_KERNEL orangefs: remove wrapper around mutex_lock(&inode->i_mutex) orangefs: refactor inode type or link_target change detection orangefs: use new getattr for revalidate and remove old getattr orangefs: use new getattr in inode getattr and permission orangefs: use new orangefs_inode_getattr to get size in write and llseek orangefs: use new orangefs_inode_getattr to create new inodes orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr orangefs: remove inode->i_lock wrapper orangefs: put register_chrdev immediately before register_filesystem ...
-
git://github.com/jonmason/ntbLinus Torvalds authored
Pull NTB bug fixes from Jon Mason: "NTB bug fixes for tasklet from spinning forever, link errors, translation window setup, NULL ptr dereference, and ntb-perf errors. Also, a modification to the driver API that makes _addr functions optional" * tag 'ntb-4.6' of git://github.com/jonmason/ntb: NTB: Remove _addr functions from ntb_hw_amd NTB: Make _addr functions optional in the API NTB: Fix incorrect clean up routine in ntb_perf NTB: Fix incorrect return check in ntb_perf ntb: fix possible NULL dereference ntb: add missing setup of translation window ntb: stop link work when we do not have memory ntb: stop tasklet from spinning forever during shutdown. ntb: perf test: fix address space confusion
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull more SCSI updates from James Bottomley: "The only new stuff which missed the first pull request is an update to the UFS driver. The rest is an assortment of bug fixes and minor tweaks which appeared recently (some are fixes for recent code and some are stuff spotted recently by the checkers or the new gcc-6 compiler [most of Arnd's stuff])" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits) scsi_common: do not clobber fixed sense information scsi: ufs: select CONFIG_NLS scsi: fc: use get/put_unaligned64 for wwn access fnic: move printk()s outside of the critical code section. qla2xxx: avoid maybe_uninitialized warning megaraid_sas: add missing curly braces in ioctl handler lpfc: fix misleading indentation scsi_transport_sas: add 'scsi_target_id' sysfs attribute scsi_dh_alua: uninitialized variable in alua_check_vpd() scsi: ufs-qcom: add printouts of testbus debug registers scsi: ufs-qcom: enable/disable the device ref clock scsi: ufs-qcom: set PA_Local_TX_LCC_Enable before link startup scsi: ufs: add device quirk delay before putting UFS rails in LPM scsi: ufs: fix leakage during link off state scsi: ufs: tune UniPro parameters to optimize hibern8 exit time scsi: ufs: handle non spec compliant bkops behaviour by device scsi: ufs: add retry for query descriptors scsi: ufs: add error recovery after DL NAC error scsi: ufs: make error handling bit faster scsi: ufs: disable vccq if it's not needed by UFS device ...
-
Linus Torvalds authored
Commit 0b81d077 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto") moved the f2fs crypto files to fs/crypto/ and renamed the symbol prefixes from "f2fs_" to "fscrypt_" (and from "F2FS_" to just "FS" for preprocessor symbols). Because of the symbol renaming, it's a bit hard to see it as a file move: use git show -M30 0b81d077 to lower the rename detection to just 30% similarity and make git show the files as renamed (the header file won't be shown as a rename even then - since all it contains is symbol definitions, it looks almost completely different). Even with the renames showing as renames, the diffs are not all that easy to read, since so much is just the renames. But Eric Biggers noticed that it's not just all renames: the initialization of the xts_tweak had been broken too, using the inode number rather than the page offset. That's not right - it makes the xfs_tweak the same for all pages of each inode. It _might_ make sense to make the xfs_tweak contain both the offset _and_ the inode number, but not just the inode number. Reported-by: Eric Biggers <ebiggers3@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Allen Hubbe authored
Kernel zero day testing warned about address space confusion. A virtual iomem address was used where a physical address is expected. The offending functions implement an optional part of the api, so they are removed. They can be added later, after testing. Fixes: a1b36958Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Acked-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
-
Al Viro authored
* switch orangefs_remount() to taking ORANGEFS_SB(sb) instead of sb * remove from the list _before_ orangefs_unmount() - request_mutex in the latter will make sure that nothing observed in the loop in ORANGEFS_DEV_REMOUNT_ALL handling will get freed until the end of loop * on removal, keep the forward pointer and zero the back one. That way we can drop and regain the spinlock in the loop body (again, ORANGEFS_DEV_REMOUNT_ALL one) and still be able to get to the rest of the list. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
-
Al Viro authored
Error should only be returned if nothing had been read/written. Otherwise we need to report a short read/write instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
-
Al Viro authored
a) open files can't have NULL inodes b) it's SEEK_END, not ORANGEFS_SEEK_END; no need to get cute. c) make_bad_inode() on lseek()? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
-
Al Viro authored
just have it return the slot number or -E... - the caller checks the sign anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
-
Al Viro authored
it's always __orangefs_bufmap Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
-
Al Viro authored
no point, really - we couldn't keep those across the calls of getdents(); it would be too easy to DoS, having all slots exhausted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
-
- 25 Mar, 2016 17 commits
-
-
Linus Torvalds authored
Merge fourth patch-bomb from Andrew Morton: "A lot more stuff than expected, sorry. A bunch of ocfs2 reviewing was finished off. - mhocko's oom-reaper out-of-memory-handler changes - ocfs2 fixes and features - KASAN feature work - various fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (42 commits) thp: fix typo in khugepaged_scan_pmd() MAINTAINERS: fill entries for KASAN mm/filemap: generic_file_read_iter(): check for zero reads unconditionally kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2 mm, kasan: stackdepot implementation. Enable stackdepot for SLAB arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections mm, kasan: add GFP flags to KASAN API mm, kasan: SLAB support kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right() include/linux/oom.h: remove undefined oom_kills_count()/note_oom_kill() mm/page_alloc: prevent merging between isolated and other pageblocks drivers/memstick/host/r592.c: avoid gcc-6 warning ocfs2: extend enough credits for freeing one truncate record while replaying truncate records ocfs2: extend transaction for ocfs2_remove_rightmost_path() and ocfs2_update_edge_lengths() before to avoid inconsistency between inode and et ocfs2/dlm: move lock to the tail of grant queue while doing in-place convert ocfs2: solve a problem of crossing the boundary in updating backups ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list ocfs2/dlm: fix race between convert and recovery ocfs2: fix a deadlock issue in ocfs2_dio_end_io_write() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fixlet from Rafael Wysocki: "One of commits in my previous pull request changed the permissions of drivers/power/avs/rockchip-io-domain.c to executable by mistake" * tag 'pm+acpi-4.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Fix permissions of drivers/power/avs/rockchip-io-domain.c
-
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linuxLinus Torvalds authored
Pull ia64 update from Tony Luck: "Wire up new system calls p{read,write}v2 for ia64" * tag 'please-pull-preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: [IA64] Enable preadv2 and pwritev2 syscalls for ia64
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull more input updates from Dmitry Torokhov: "Second round of updates for the input subsystem. The BYD PS/2 protocol driver now uses absolute reporting mode and should behave more like other touchpads; Synaptics driver needed to extend one of its quirks to a newer firmware version, and a few USB drivers got tightened up checks for the contents of their descriptors" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: sur40 - fix DMA on stack Input: ati_remote2 - fix crashes on detecting device with invalid descriptor Input: synaptics - handle spurious release of trackstick buttons, again Input: synaptics-rmi4 - remove check of Non-NULL array Input: byd - enable absolute mode Input: ims-pcu - sanity check against missing interfaces Input: melfas_mip4 - add hw_version sysfs attribute
-
Kirill A. Shutemov authored
!PageLRU should lead to SCAN_PAGE_LRU, not SCAN_SCAN_ABORT result. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrey Ryabinin authored
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nicolai Stange authored
If - generic_file_read_iter() gets called with a zero read length, - the read offset is at a page boundary, - IOCB_DIRECT is not set - and the page in question hasn't made it into the page cache yet, then do_generic_file_read() will trigger a readahead with a req_size hint of zero. Since roundup_pow_of_two(0) is undefined, UBSAN reports UBSAN: Undefined behaviour in include/linux/log2.h:63:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 3 PID: 1017 Comm: sa1 Tainted: G L 4.5.0-next-20160318+ #14 [...] Call Trace: [...] [<ffffffff813ef61a>] ondemand_readahead+0x3aa/0x3d0 [<ffffffff813ef61a>] ? ondemand_readahead+0x3aa/0x3d0 [<ffffffff813c73bd>] ? find_get_entry+0x2d/0x210 [<ffffffff813ef9c3>] page_cache_sync_readahead+0x63/0xa0 [<ffffffff813cc04d>] do_generic_file_read+0x80d/0xf90 [<ffffffff813cc955>] generic_file_read_iter+0x185/0x420 [...] [<ffffffff81510b06>] __vfs_read+0x256/0x3d0 [...] when get_init_ra_size() gets called from ondemand_readahead(). The net effect is that the initial readahead size is arch dependent for requested read lengths of zero: for example, since 1UL << (sizeof(unsigned long) * 8) evaluates to 1 on x86 while its result is 0 on ARMv7, the initial readahead size becomes 4 on the former and 0 on the latter. What's more, whether or not the file access timestamp is updated for zero length reads is decided differently for the two cases of IOCB_DIRECT being set or cleared: in the first case, generic_file_read_iter() explicitly skips updating that timestamp while in the latter case, it is always updated through the call to do_generic_file_read(). According to POSIX, zero length reads "do not modify the last data access timestamp" and thus, the IOCB_DIRECT behaviour is POSIXly correct. Let generic_file_read_iter() unconditionally check the requested read length at its entry and return immediately with success if it is zero. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Potapenko authored
Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Potapenko authored
Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot will allow KASAN store allocation/deallocation stack traces for memory chunks. The stack traces are stored in a hash table and referenced by handles which reside in the kasan_alloc_meta and kasan_free_meta structures in the allocated memory chunks. IRQ stack traces are cut below the IRQ entry point to avoid unnecessary duplication. Right now stackdepot support is only enabled in SLAB allocator. Once KASAN features in SLAB are on par with those in SLUB we can switch SLUB to stackdepot as well, thus removing the dependency on SLUB stack bookkeeping, which wastes a lot of memory. This patch is based on the "mm: kasan: stack depots" patch originally prepared by Dmitry Chernenkov. Joonsoo has said that he plans to reuse the stackdepot code for the mm/page_owner.c debugging facility. [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t] [aryabinin@virtuozzo.com: comment style fixes] Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Potapenko authored
KASAN needs to know whether the allocation happens in an IRQ handler. This lets us strip everything below the IRQ entry point to reduce the number of unique stack traces needed to be stored. Move the definition of __irq_entry to <linux/interrupt.h> so that the users don't need to pull in <linux/ftrace.h>. Also introduce the __softirq_entry macro which is similar to __irq_entry, but puts the corresponding functions to the .softirqentry.text section. Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Potapenko authored
Add GFP flags to KASAN hooks for future patches to use. This patch is based on the "mm: kasan: unified support for SLUB and SLAB allocators" patch originally prepared by Dmitry Chernenkov. Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Potapenko authored
Add KASAN hooks to SLAB allocator. This patch is based on the "mm: kasan: unified support for SLUB and SLAB allocators" patch originally prepared by Dmitry Chernenkov. Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Potapenko authored
This patchset implements SLAB support for KASAN Unlike SLUB, SLAB doesn't store allocation/deallocation stacks for heap objects, therefore we reimplement this feature in mm/kasan/stackdepot.c. The intention is to ultimately switch SLUB to use this implementation as well, which will save a lot of memory (right now SLUB bloats each object by 256 bytes to store the allocation/deallocation stacks). Also neither SLUB nor SLAB delay the reuse of freed memory chunks, which is necessary for better detection of use-after-free errors. We introduce memory quarantine (mm/kasan/quarantine.c), which allows delayed reuse of deallocated memory. This patch (of 7): Rename kmalloc_large_oob_right() to kmalloc_pagealloc_oob_right(), as the test only checks the page allocator functionality. Also reimplement kmalloc_large_oob_right() so that the test allocates a large enough chunk of memory that still does not trigger the page allocator fallback. Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tetsuo Handa authored
A leftover from commit c32b3cbe ("oom, PM: make OOM detection in the freezer path raceless"). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
Hanjun Guo has reported that a CMA stress test causes broken accounting of CMA and free pages: > Before the test, I got: > -bash-4.3# cat /proc/meminfo | grep Cma > CmaTotal: 204800 kB > CmaFree: 195044 kB > > > After running the test: > -bash-4.3# cat /proc/meminfo | grep Cma > CmaTotal: 204800 kB > CmaFree: 6602584 kB > > So the freed CMA memory is more than total.. > > Also the the MemFree is more than mem total: > > -bash-4.3# cat /proc/meminfo > MemTotal: 16342016 kB > MemFree: 22367268 kB > MemAvailable: 22370528 kB Laura Abbott has confirmed the issue and suspected the freepage accounting rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA pageblocks: > CMA isolates MAX_ORDER aligned blocks, but, during the process, > partialy isolated block exists. If MAX_ORDER is 11 and > pageblock_order is 9, two pageblocks make up MAX_ORDER > aligned block and I can think following scenario because pageblock > (un)isolation would be done one by one. > > (each character means one pageblock. 'C', 'I' means MIGRATE_CMA, > MIGRATE_ISOLATE, respectively. > > CC -> IC -> II (Isolation) > II -> CI -> CC (Un-isolation) > > If some pages are freed at this intermediate state such as IC or CI, > that page could be merged to the other page that is resident on > different type of pageblock and it will cause wrong freepage count. This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but since it doesn't hold the zone->lock between pageblocks, a race window does exist. It's also likely that unexpected merging can occur between MIGRATE_ISOLATE and non-CMA pageblocks. This should be prevented in __free_one_page() since commit 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock"). However, we only check the migratetype of the pageblock where buddy merging has been initiated, not the migratetype of the buddy pageblock (or group of pageblocks) which can be MIGRATE_ISOLATE. Joonsoo has suggested checking for buddy migratetype as part of page_is_buddy(), but that would add extra checks in allocator hotpath and bloat-o-meter has shown significant code bloat (the function is inline). This patch reduces the bloat at some expense of more complicated code. The buddy-merging while-loop in __free_one_page() is initially bounded to pageblock_border and without any migratetype checks. The checks are placed outside, bumping the max_order if merging is allowed, and returning to the while-loop with a statement which can't be possibly considered harmful. This fixes the accounting bug and also removes the arguably weird state in the original commit 3c605096 where buddies could be left unmerged. Fixes: 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock") Link: https://lkml.org/lkml/2016/3/2/280Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Hanjun Guo <guohanjun@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Debugged-by: Laura Abbott <labbott@redhat.com> Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arnd Bergmann authored
The r592 driver relies on behavior of the DMA mapping API that is normally observed but not guaranteed by the API. Instead it uses a runtime check to fail transfers if the API ever behaves When CONFIG_NEED_SG_DMA_LENGTH is not set, one of the checks turns into a comparison of a variable with itself, which gcc-6.0 now warns about: drivers/memstick/host/r592.c: In function 'r592_transfer_fifo_dma': drivers/memstick/host/r592.c:302:31: error: self-comparison always evaluates to false [-Werror=tautological-compare] (sg_dma_len(&dev->req->sg) < dev->req->sg.length)) { ^ The check itself is not a problem, so this patch just rephrases the condition in a way that gcc does not consider an indication of a mistake. We already know that dev->req->sg.length was initially R592_LFIFO_SIZE, so we can compare it to that constant again. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Quentin Lambert <lambert.quentin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Xue jiufei authored
Now function ocfs2_replay_truncate_records() first modifies tl_used, then calls ocfs2_extend_trans() to extend transactions for gd and alloc inode used for freeing clusters. jbd2_journal_restart() may be called and it may happen that tl_used in truncate log is decreased but the clusters are not freed, which means these clusters are lost. So we should avoid extending transactions in these two operations. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Acked-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-