- 25 Jan, 2024 6 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfsLinus Torvalds authored
Pull overlayfs fix from Amir Goldstein: "Change the on-disk format for the new "xwhiteouts" feature introduced in v6.7 The change reduces unneeded overhead of an extra getxattr per readdir. The only user of the "xwhiteout" feature is the external composefs tool, which has been updated to support the new on-disk format. This change is also designated for 6.7.y" * tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: mark xwhiteouts directory with overlay.opaque='x'
-
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds authored
Pull netfs fixes from Christian Brauner: "This contains various fixes for the netfs work merged earlier this cycle: afs: - Fix locking imbalance in afs_proc_addr_prefs_show() - Remove afs_dynroot_d_revalidate() which is redundant - Fix error handling during lookup - Hide sillyrenames from userspace. This fixes a race between silly-rename files being created/removed and userspace iterating over directory entries - Don't use unnecessary folio_*() functions cifs: - Don't use unnecessary folio_*() functions cachefiles: - erofs: Fix Null dereference when cachefiles are not doing ondemand-mode - Update mailing list netfs library: - Add Jeff Layton as reviewer - Update mailing list - Fix a error checking in netfs_perform_write() - fscache: Check error before dereferencing - Don't use unnecessary folio_*() functions" * tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Fix missing/incorrect unlocking of RCU read lock afs: Remove afs_dynroot_d_revalidate() as it is redundant afs: Fix error handling with lookup via FS.InlineBulkStatus afs: Hide silly-rename files from userspace cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write() netfs, fscache: Prevent Oops in fscache_put_cache() cifs: Don't use certain unnecessary folio_*() functions afs: Don't use certain unnecessary folio_*() functions netfs: Don't use certain unnecessary folio_*() functions netfs: Add Jeff Layton as reviewer netfs, cachefiles: Change mailing list
-
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds authored
Pull nfsd fixes from Chuck Lever: - Fix in-kernel RPC UDP transport - Fix NFSv4.0 RELEASE_LOCKOWNER * tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix RELEASE_LOCKOWNER SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()
-
https://github.com/neeraju/linuxLinus Torvalds authored
Pull RCU fix from Neeraj Upadhyay: "This fixes RCU grace period stalls, which are observed when an outgoing CPU's quiescent state reporting results in wakeup of one of the grace period kthreads, to complete the grace period. If those kthreads have SCHED_FIFO policy, the wake up can indirectly arm the RT bandwith timer to the local offline CPU. Earlier migration of the hrtimers from the CPU introduced in commit 5c0930cc ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") results in this timer getting ignored. If the RCU grace period kthreads are waiting for RT bandwidth to be available, they may never be actually scheduled, resulting in RCU stall warnings" * tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux: rcu: Defer RCU kthreads wakeup when CPU is dying
-
https://github.com/ceph/ceph-clientLinus Torvalds authored
Pull ceph fixes from Ilya Dryomov: "A fix to avoid triggering an assert in some cases where RBD exclusive mappings are involved and a deprecated API cleanup" * tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client: rbd: don't move requests to the running list on errors rbd: remove usage of the deprecated ida_simple_*() API
-
Linus Torvalds authored
Merge tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity fix from Mimi Zohar: "Revert patch that required user-provided key data, since keys can be created from kernel-generated random numbers" * tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: Revert "KEYS: encrypted: Add check for strsep"
-
- 24 Jan, 2024 10 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull execve fixes from Kees Cook: - Fix error handling in begin_new_exec() (Bernd Edlinger) - MAINTAINERS: specifically mention ELF (Alexey Dobriyan) - Various cleanups related to earlier open() (Askar Safin, Kees Cook) * tag 'execve-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Distinguish in_execve from in_exec exec: Fix error handling in begin_new_exec() exec: Add do_close_execat() helper exec: remove useless comment ELF, MAINTAINERS: specifically mention ELF
-
Linus Torvalds authored
Jann Horn points out that uselib() really shouldn't trigger the new FMODE_EXEC logic introduced by commit 4759ff71 ("exec: __FMODE_EXEC instead of in_execve for LSMs"). In fact, it shouldn't even have ever triggered the old pre-existing logic for __FMODE_EXEC (like the NFS code that makes executables not need read permissions). Unlike a real execve(), that can work even with files that are purely executable by the user (not readable), uselib() has that MAY_READ requirement becasue it's really just a convenience wrapper around mmap() for legacy shared libraries. The whole FMODE_EXEC bit was originally introduced by commit b500531e ("[PATCH] Introduce FMODE_EXEC file flag"), primarily to give ETXTBUSY error returns for distributed filesystems. It has since grown a few other warts (like that NFS thing), but there really isn't any reason to use it for uselib(), and now that we are trying to use it to replace the horrid 'tsk->in_execve' flag, it's actively wrong. Of course, as Jann Horn also points out, nobody should be enabling CONFIG_USELIB in the first place in this day and age, but that's a different discussion entirely. Reported-by: Jann Horn <jannh@google.com> Fixes: 4759ff71 ("exec: __FMODE_EXEC instead of in_execve for LSMs") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mimi Zohar authored
This reverts commit b4af096b. New encrypted keys are created either from kernel-generated random numbers or user-provided decrypted data. Revert the change requiring user-provided decrypted data. Reported-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
-
Linus Torvalds authored
Make 'git status' quietly happy again after a full allmodconfig build. Fixes: 60433a9d ("samples: introduce new samples subdir for cgroup") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
Just to help distinguish the fs->in_exec flag from the current->in_execve flag, add comments in check_unsafe_exec() and copy_fs() for more context. Also note that in_execve is only used by TOMOYO now. Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org>
-
Kees Cook authored
After commit 978ffcbf ("execve: open the executable file before doing anything else"), current->in_execve was no longer in sync with the open(). This broke AppArmor and TOMOYO which depend on this flag to distinguish "open" operations from being "exec" operations. Instead of moving around in_execve, switch to using __FMODE_EXEC, which is where the "is this an exec?" intent is stored. Note that TOMOYO still uses in_execve around cred handling. Reported-by: Kevin Locke <kevin@kevinlocke.name> Closes: https://lore.kernel.org/all/ZbE4qn9_h14OqADK@kevinlocke.nameSuggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 978ffcbf ("execve: open the executable file before doing anything else") Cc: Josh Triplett <josh@joshtriplett.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jmorris@namei.org> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: <linux-mm@kvack.org> Cc: <apparmor@lists.ubuntu.com> Cc: <linux-security-module@vger.kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Frederic Weisbecker authored
When the CPU goes idle for the last time during the CPU down hotplug process, RCU reports a final quiescent state for the current CPU. If this quiescent state propagates up to the top, some tasks may then be woken up to complete the grace period: the main grace period kthread and/or the expedited main workqueue (or kworker). If those kthreads have a SCHED_FIFO policy, the wake up can indirectly arm the RT bandwith timer to the local offline CPU. Since this happens after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the timer gets ignored. Therefore if the RCU kthreads are waiting for RT bandwidth to be available, they may never be actually scheduled. This triggers TREE03 rcutorture hangs: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved) rcu: (t=21035 jiffies g=938281 q=40787 ncpus=6) rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:14896 pid:14 tgid:14 ppid:2 flags:0x00004000 Call Trace: <TASK> __schedule+0x2eb/0xa80 schedule+0x1f/0x90 schedule_timeout+0x163/0x270 ? __pfx_process_timeout+0x10/0x10 rcu_gp_fqs_loop+0x37c/0x5b0 ? __pfx_rcu_gp_kthread+0x10/0x10 rcu_gp_kthread+0x17c/0x200 kthread+0xde/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2b/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> The situation can't be solved with just unpinning the timer. The hrtimer infrastructure and the nohz heuristics involved in finding the best remote target for an unpinned timer would then also need to handle enqueues from an offline CPU in the most horrendous way. So fix this on the RCU side instead and defer the wake up to an online CPU if it's too late for the local one. Reported-by: Paul E. McKenney <paulmck@kernel.org> Fixes: 5c0930cc ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdevLinus Torvalds authored
Pull fbdev fixes and cleanups from Helge Deller: "A crash fix in stifb which was missed to be included in the drm-misc tree, two checks to prevent wrong userspace input in sisfb and savagefb and two trivial printk cleanups: - stifb: Fix crash in stifb_blank() - savage/sis: Error out if pixclock equals zero - minor trivial cleanups" * tag 'fbdev-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: stifb: Fix crash in stifb_blank() fbcon: Fix incorrect printed function name in fbcon_prepare_logo() fbdev: sis: Error out if pixclock equals zero fbdev: savage: Error out if pixclock equals zero fbdev: vt8500lcdfb: Remove unnecessary print function dev_err()
-
NeilBrown authored
The test on so_count in nfsd4_release_lockowner() is nonsense and harmful. Revert to using check_for_locks(), changing that to not sleep. First: harmful. As is documented in the kdoc comment for nfsd4_release_lockowner(), the test on so_count can transiently return a false positive resulting in a return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is clearly a protocol violation and with the Linux NFS client it can cause incorrect behaviour. If RELEASE_LOCKOWNER is sent while some other thread is still processing a LOCK request which failed because, at the time that request was received, the given owner held a conflicting lock, then the nfsd thread processing that LOCK request can hold a reference (conflock) to the lock owner that causes nfsd4_release_lockowner() to return an incorrect error. The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so it knows that the error is impossible. It assumes the lock owner was in fact released so it feels free to use the same lock owner identifier in some later locking request. When it does reuse a lock owner identifier for which a previous RELEASE failed, it will naturally use a lock_seqid of zero. However the server, which didn't release the lock owner, will expect a larger lock_seqid and so will respond with NFS4ERR_BAD_SEQID. So clearly it is harmful to allow a false positive, which testing so_count allows. The test is nonsense because ... well... it doesn't mean anything. so_count is the sum of three different counts. 1/ the set of states listed on so_stateids 2/ the set of active vfs locks owned by any of those states 3/ various transient counts such as for conflicting locks. When it is tested against '2' it is clear that one of these is the transient reference obtained by find_lockowner_str_locked(). It is not clear what the other one is expected to be. In practice, the count is often 2 because there is precisely one state on so_stateids. If there were more, this would fail. In my testing I see two circumstances when RELEASE_LOCKOWNER is called. In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in all the lock states being removed, and so the lockowner being discarded (it is removed when there are no more references which usually happens when the lock state is discarded). When nfsd4_release_lockowner() finds that the lock owner doesn't exist, it returns success. The other case shows an so_count of '2' and precisely one state listed in so_stateid. It appears that the Linux client uses a separate lock owner for each file resulting in one lock state per lock owner, so this test on '2' is safe. For another client it might not be safe. So this patch changes check_for_locks() to use the (newish) find_any_file_locked() so that it doesn't take a reference on the nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With this check is it safe to restore the use of check_for_locks() rather than testing so_count against the mysterious '2'. Fixes: ce3c4ad7 ("NFSD: Fix possible sleep during nfsd4_release_lockowner()") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds authored
Pull tracing and eventfs fixes from Steven Rostedt: - Fix histogram tracing_map insertion. The tracing_map_insert copies the value into the elt variable and then assigns the elt to the entry value. But it is possible that the entry value becomes visible on other CPUs before the elt is fully initialized. This is fixed by adding a wmb() between the initialization of the elt variable and assigning it. - Have eventfs directory have unique inode numbers. Having them be all the same proved to be a failure as the 'find' application will think that the directories are causing loops, as it checks for directory loops via their inodes. Have the evenfs dir entries get their inodes assigned when they are referenced and then save them in the eventfs_inode structure. * tag 'trace-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Save directory inodes in the eventfs_inode structure tracing: Ensure visibility when inserting an element into tracing_map
-
- 23 Jan, 2024 5 commits
-
-
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsChristian Brauner authored
Pull netfs fixes from David Howells: * 'netfs-fixes' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix missing/incorrect unlocking of RCU read lock afs: Remove afs_dynroot_d_revalidate() as it is redundant afs: Fix error handling with lookup via FS.InlineBulkStatus afs: Hide silly-rename files from userspace cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write() netfs, fscache: Prevent Oops in fscache_put_cache() cifs: Don't use certain unnecessary folio_*() functions afs: Don't use certain unnecessary folio_*() functions netfs: Don't use certain unnecessary folio_*() functions Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Steven Rostedt (Google) authored
The eventfs inodes and directories are allocated when referenced. But this leaves the issue of keeping consistent inode numbers and the number is only saved in the inode structure itself. When the inode is no longer referenced, it can be freed. When the file that the inode was representing is referenced again, the inode is once again created, but the inode number needs to be the same as it was before. Just making the inode numbers the same for all files is fine, but that does not work with directories. The find command will check for loops via the inode number and having the same inode number for directories triggers: # find /sys/kernel/tracing find: File system loop detected; '/sys/kernel/debug/tracing/events/initcall/initcall_finish' is part of the same file system loop as '/sys/kernel/debug/tracing/events/initcall'. [..] Linus pointed out that the eventfs_inode structure ends with a single 32bit int, and on 64 bit machines, there's likely a 4 byte hole due to alignment. We can use this hole to store the inode number for the eventfs_inode. All directories in eventfs are represented by an eventfs_inode and that data structure can hold its inode number. That last int was also purposely placed at the end of the structure to prevent holes from within. Now that there's a 4 byte number to hold the inode, both the inode number and the last integer can be moved up in the structure for better cache locality, where the llist and rcu fields can be moved to the end as they are only used when the eventfs_inode is being deleted. Link: https://lore.kernel.org/all/CAMuHMdXKiorg-jiuKoZpfZyDJ3Ynrfb8=X+c7x0Eewxn-YRdCA@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240122152748.46897388@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Fixes: 53c41052 ("eventfs: Have the inodes all for files and directories all be the same") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Kees Cook <keescook@chromium.org>
-
Amir Goldstein authored
An opaque directory cannot have xwhiteouts, so instead of marking an xwhiteouts directory with a new xattr, overload overlay.opaque xattr for marking both opaque dir ('y') and xwhiteouts dir ('x'). This is more efficient as the overlay.opaque xattr is checked during lookup of directory anyway. This also prevents unnecessary checking the xattr when reading a directory without xwhiteouts, i.e. most of the time. Note that the xwhiteouts marker is not checked on the upper layer and on the last layer in lowerstack, where xwhiteouts are not expected. Fixes: bc8df7a3 ("ovl: Add an alternative type of whiteout") Cc: <stable@vger.kernel.org> # v6.7 Reviewed-by: Alexander Larsson <alexl@redhat.com> Tested-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
-
Helge Deller authored
Avoid a kernel crash in stifb by providing the correct pointer to the fb_info struct. Prior to commit e2e0b838 ("video/sticore: Remove info field from STI struct") the fb_info struct was at the beginning of the fb struct. Fixes: e2e0b838 ("video/sticore: Remove info field from STI struct") Signed-off-by: Helge Deller <deller@gmx.de> Cc: Thomas Zimmermann <tzimmermann@suse.de>
-
Fedor Pchelkin authored
The QXL driver doesn't use any device for DMA mappings or allocations so dev_to_node() will panic inside ttm_device_init() on NUMA systems: general protection fault, probably for non-canonical address 0xdffffc000000007a: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x00000000000003d0-0x00000000000003d7] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.7.0+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:ttm_device_init+0x10e/0x340 Call Trace: qxl_ttm_init+0xaa/0x310 qxl_device_init+0x1071/0x2000 qxl_pci_probe+0x167/0x3f0 local_pci_probe+0xe1/0x1b0 pci_device_probe+0x29d/0x790 really_probe+0x251/0x910 __driver_probe_device+0x1ea/0x390 driver_probe_device+0x4e/0x2e0 __driver_attach+0x1e3/0x600 bus_for_each_dev+0x12d/0x1c0 bus_add_driver+0x25a/0x590 driver_register+0x15c/0x4b0 qxl_pci_driver_init+0x67/0x80 do_one_initcall+0xf5/0x5d0 kernel_init_freeable+0x637/0xb10 kernel_init+0x1c/0x2e0 ret_from_fork+0x48/0x80 ret_from_fork_asm+0x1b/0x30 RIP: 0010:ttm_device_init+0x10e/0x340 Fall back to NUMA_NO_NODE if there is no device for DMA. Found by Linux Verification Center (linuxtesting.org). Fixes: b0a7ce53 ("drm/ttm: Schedule delayed_delete worker closer") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Christian König <christian.koenig@amd.com> Reported-by: Steven Rostedt <rostedt@goodmis.org> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 22 Jan, 2024 19 commits
-
-
Linus Torvalds authored
This reverts commit 1e7f6def. It causes my machine to not even boot, and Klara Modin reports that the cause is that small zstd-compressed files return garbage when read. Reported-by: Klara Modin <klarasmodin@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CABq1_vj4GpUeZpVG49OHCo-3sdbe2-2ROcu_xDvUG-6-5zPRXg@mail.gmail.com/Reported-and-bisected-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: David Sterba <dsterba@suse.com> Cc: Qu Wenruo <wqu@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Howells authored
In afs_proc_addr_prefs_show(), we need to unlock the RCU read lock in both places before returning (and not lock it again). Fixes: f94f70d3 ("afs: Provide a way to configure address priorities") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401172243.cd53d5f6-oliver.sang@intel.comSigned-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org
-
David Howells authored
Remove afs_dynroot_d_revalidate() as it is redundant as all it does is return 1 and the caller assumes that if the op is not given. Suggested-by: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org
-
David Howells authored
When afs does a lookup, it tries to use FS.InlineBulkStatus to preemptively look up a bunch of files in the parent directory and cache this locally, on the basis that we might want to look at them too (for example if someone does an ls on a directory, they may want want to then stat every file listed). FS.InlineBulkStatus can be considered a compound op with the normal abort code applying to the compound as a whole. Each status fetch within the compound is then given its own individual abort code - but assuming no error that prevents the bulk fetch from returning the compound result will be 0, even if all the constituent status fetches failed. At the conclusion of afs_do_lookup(), we should use the abort code from the appropriate status to determine the error to return, if any - but instead it is assumed that we were successful if the op as a whole succeeded and we return an incompletely initialised inode, resulting in ENOENT, no matter the actual reason. In the particular instance reported, a vnode with no permission granted to be accessed is being given a UAEACCES abort code which should be reported as EACCES, but is instead being reported as ENOENT. Fix this by abandoning the inode (which will be cleaned up with the op) if file[1] has an abort code indicated and turn that abort code into an error instead. Whilst we're at it, add a tracepoint so that the abort codes of the individual subrequests of FS.InlineBulkStatus can be logged. At the moment only the container abort code can be 0. Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept") Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
There appears to be a race between silly-rename files being created/removed and various userspace tools iterating over the contents of a directory, leading to such errors as: find: './kernel/.tmp_cpio_dir/include/dt-bindings/reset/.__afs2080': No such file or directory tar: ./include/linux/greybus/.__afs3C95: File removed before we read it when building a kernel. Fix afs_readdir() so that it doesn't return .__afsXXXX silly-rename files to userspace. This doesn't stop them being looked up directly by name as we need to be able to look them up from within the kernel as part of the silly-rename algorithm. Fixes: 79ddbfa5 ("afs: Implement sillyrename for unlink and rename") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
cachefiles_ondemand_init_object() as called from cachefiles_open_file() and cachefiles_create_tmpfile() does not check if object->ondemand is set before dereferencing it, leading to an oops something like: RIP: 0010:cachefiles_ondemand_init_object+0x9/0x41 ... Call Trace: <TASK> cachefiles_open_file+0xc9/0x187 cachefiles_lookup_cookie+0x122/0x2be fscache_cookie_state_machine+0xbe/0x32b fscache_cookie_worker+0x1f/0x2d process_one_work+0x136/0x208 process_scheduled_works+0x3a/0x41 worker_thread+0x1a2/0x1f6 kthread+0xca/0xd2 ret_from_fork+0x21/0x33 Fix this by making cachefiles_ondemand_init_object() return immediately if cachefiles->ondemand is NULL. Fixes: 3c5ecfe1 ("cachefiles: extract ondemand info field from cachefiles_object") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Gao Xiang <xiang@kernel.org> cc: Chao Yu <chao@kernel.org> cc: Yue Hu <huyue2@coolpad.com> cc: Jeffle Xu <jefflexu@linux.alibaba.com> cc: linux-erofs@lists.ozlabs.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
-
Petr Pavlu authored
Running the following two commands in parallel on a multi-processor AArch64 machine can sporadically produce an unexpected warning about duplicate histogram entries: $ while true; do echo hist:key=id.syscall:val=hitcount > \ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist sleep 0.001 done $ stress-ng --sysbadaddr $(nproc) The warning looks as follows: [ 2911.172474] ------------[ cut here ]------------ [ 2911.173111] Duplicates detected: 1 [ 2911.173574] WARNING: CPU: 2 PID: 12247 at kernel/trace/tracing_map.c:983 tracing_map_sort_entries+0x3e0/0x408 [ 2911.174702] Modules linked in: iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) af_packet(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ena(E) tiny_power_button(E) qemu_fw_cfg(E) button(E) fuse(E) efi_pstore(E) ip_tables(E) x_tables(E) xfs(E) libcrc32c(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) polyval_ce(E) polyval_generic(E) ghash_ce(E) gf128mul(E) sm4_ce_gcm(E) sm4_ce_ccm(E) sm4_ce(E) sm4_ce_cipher(E) sm4(E) sm3_ce(E) sm3(E) sha3_ce(E) sha512_ce(E) sha512_arm64(E) sha2_ce(E) sha256_arm64(E) nvme(E) sha1_ce(E) nvme_core(E) nvme_auth(E) t10_pi(E) sg(E) scsi_mod(E) scsi_common(E) efivarfs(E) [ 2911.174738] Unloaded tainted modules: cppc_cpufreq(E):1 [ 2911.180985] CPU: 2 PID: 12247 Comm: cat Kdump: loaded Tainted: G E 6.7.0-default #2 1b58bbb22c97e4399dc09f92d309344f69c44a01 [ 2911.182398] Hardware name: Amazon EC2 c7g.8xlarge/, BIOS 1.0 11/1/2018 [ 2911.183208] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 2911.184038] pc : tracing_map_sort_entries+0x3e0/0x408 [ 2911.184667] lr : tracing_map_sort_entries+0x3e0/0x408 [ 2911.185310] sp : ffff8000a1513900 [ 2911.185750] x29: ffff8000a1513900 x28: ffff0003f272fe80 x27: 0000000000000001 [ 2911.186600] x26: ffff0003f272fe80 x25: 0000000000000030 x24: 0000000000000008 [ 2911.187458] x23: ffff0003c5788000 x22: ffff0003c16710c8 x21: ffff80008017f180 [ 2911.188310] x20: ffff80008017f000 x19: ffff80008017f180 x18: ffffffffffffffff [ 2911.189160] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000a15134b8 [ 2911.190015] x14: 0000000000000000 x13: 205d373432323154 x12: 5b5d313131333731 [ 2911.190844] x11: 00000000fffeffff x10: 00000000fffeffff x9 : ffffd1b78274a13c [ 2911.191716] x8 : 000000000017ffe8 x7 : c0000000fffeffff x6 : 000000000057ffa8 [ 2911.192554] x5 : ffff0012f6c24ec0 x4 : 0000000000000000 x3 : ffff2e5b72b5d000 [ 2911.193404] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0003ff254480 [ 2911.194259] Call trace: [ 2911.194626] tracing_map_sort_entries+0x3e0/0x408 [ 2911.195220] hist_show+0x124/0x800 [ 2911.195692] seq_read_iter+0x1d4/0x4e8 [ 2911.196193] seq_read+0xe8/0x138 [ 2911.196638] vfs_read+0xc8/0x300 [ 2911.197078] ksys_read+0x70/0x108 [ 2911.197534] __arm64_sys_read+0x24/0x38 [ 2911.198046] invoke_syscall+0x78/0x108 [ 2911.198553] el0_svc_common.constprop.0+0xd0/0xf8 [ 2911.199157] do_el0_svc+0x28/0x40 [ 2911.199613] el0_svc+0x40/0x178 [ 2911.200048] el0t_64_sync_handler+0x13c/0x158 [ 2911.200621] el0t_64_sync+0x1a8/0x1b0 [ 2911.201115] ---[ end trace 0000000000000000 ]--- The problem appears to be caused by CPU reordering of writes issued from __tracing_map_insert(). The check for the presence of an element with a given key in this function is: val = READ_ONCE(entry->val); if (val && keys_match(key, val->key, map->key_size)) ... The write of a new entry is: elt = get_free_elt(map); memcpy(elt->key, key, map->key_size); entry->val = elt; The "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;" stores may become visible in the reversed order on another CPU. This second CPU might then incorrectly determine that a new key doesn't match an already present val->key and subsequently insert a new element, resulting in a duplicate. Fix the problem by adding a write barrier between "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;", and for good measure, also use WRITE_ONCE(entry->val, elt) for publishing the element. The sequence pairs with the mentioned "READ_ONCE(entry->val);" and the "val->key" check which has an address dependency. The barrier is placed on a path executed when adding an element for a new key. Subsequent updates targeting the same key remain unaffected. From the user's perspective, the issue was introduced by commit c193707d ("tracing: Remove code which merges duplicates"), which followed commit cbf4100e ("tracing: Add support to detect and avoid duplicates"). The previous code operated differently; it inherently expected potential races which result in duplicates but merged them later when they occurred. Link: https://lore.kernel.org/linux-trace-kernel/20240122150928.27725-1-petr.pavlu@suse.com Fixes: c193707d ("tracing: Remove code which merges duplicates") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Dan Carpenter authored
The netfs_grab_folio_for_write() function doesn't return NULL, it returns error pointers. Update the check accordingly. Fixes: c38f4e96 ("netfs: Provide func to copy data to pagecache for buffered write") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/29fb1310-8e2d-47ba-b68d-40354eb7b896@moroto.mountain/
-
Dan Carpenter authored
This function dereferences "cache" and then checks if it's IS_ERR_OR_NULL(). Check first, then dereference. Fixes: 9549332d ("fscache: Implement cache registration") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/e84bc740-3502-4f16-982a-a40d5676615c@moroto.mountain/ # v2
-
David Howells authored
Filesystems should use folio->index and folio->mapping, instead of folio_index(folio), folio_mapping() and folio_file_mapping() since they know that it's in the pagecache. Change this automagically with: perl -p -i -e 's/folio_mapping[(]([^)]*)[)]/\1->mapping/g' fs/smb/client/*.c perl -p -i -e 's/folio_file_mapping[(]([^)]*)[)]/\1->mapping/g' fs/smb/client/*.c perl -p -i -e 's/folio_index[(]([^)]*)[)]/\1->index/g' fs/smb/client/*.c Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Ronnie Sahlberg <lsahlber@redhat.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org
-
David Howells authored
Filesystems should use folio->index and folio->mapping, instead of folio_index(folio), folio_mapping() and folio_file_mapping() since they know that it's in the pagecache. Change this automagically with: perl -p -i -e 's/folio_mapping[(]([^)]*)[)]/\1->mapping/g' fs/afs/*.c perl -p -i -e 's/folio_file_mapping[(]([^)]*)[)]/\1->mapping/g' fs/afs/*.c perl -p -i -e 's/folio_index[(]([^)]*)[)]/\1->index/g' fs/afs/*.c Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org
-
David Howells authored
Filesystems should use folio->index and folio->mapping, instead of folio_index(folio), folio_mapping() and folio_file_mapping() since they know that it's in the pagecache. Change this automagically with: perl -p -i -e 's/folio_mapping[(]([^)]*)[)]/\1->mapping/g' fs/netfs/*.c perl -p -i -e 's/folio_file_mapping[(]([^)]*)[)]/\1->mapping/g' fs/netfs/*.c perl -p -i -e 's/folio_index[(]([^)]*)[)]/\1->index/g' fs/netfs/*.c Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-erofs@lists.ozlabs.org cc: linux-fsdevel@vger.kernel.org
-
Geert Uytterhoeven authored
If the boot logo does not fit, a message is printed, including a wrong function name prefix. Instead of correcting the function name (or using __func__), just use "fbcon", like is done in several other messages. While at it, modernize the call by switching to pr_info(). Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Helge Deller <deller@gmx.de>
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: - zoned mode fixes: - fix slowdown when writing large file sequentially by looking up block groups with enough space faster - locking fixes when activating a zone - new mount API fixes: - preserve mount options for a ro/rw mount of the same subvolume - scrub fixes: - fix use-after-free in case the chunk length is not aligned to 64K, this does not happen normally but has been reported on images converted from ext4 - similar alignment check was missing with raid-stripe-tree - subvolume deletion fixes: - prevent calling ioctl on already deleted subvolume - properly track flag tracking a deleted subvolume - in subpage mode, fix decompression of an inline extent (zlib, lzo, zstd) - fix crash when starting writeback on a folio, after integration with recent MM changes this needs to be started conditionally - reject unknown flags in defrag ioctl - error handling, API fixes, minor warning fixes * tag 'for-6.8-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: limit RST scrub to chunk boundary btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned btrfs: don't unconditionally call folio_start_writeback in subpage btrfs: use the original mount's mount options for the legacy reconfigure btrfs: don't warn if discard range is not aligned to sector btrfs: tree-checker: fix inline ref size in error messages btrfs: zstd: fix and simplify the inline extent decompression btrfs: lzo: fix and simplify the inline extent decompression btrfs: zlib: fix and simplify the inline extent decompression btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted btrfs: don't abort filesystem when attempting to snapshot deleted subvolume btrfs: zoned: fix lock ordering in btrfs_zone_activate() btrfs: fix unbalanced unlock of mapping_tree_lock btrfs: ref-verify: free ref cache before clearing mount opt btrfs: fix kvcalloc() arguments order in btrfs_ioctl_send() btrfs: zoned: optimize hint byte for zoned allocator btrfs: zoned: factor out prepare_allocation_zoned()
-
Bernd Edlinger authored
If get_unused_fd_flags() fails, the error handling is incomplete because bprm->cred is already set to NULL, and therefore free_bprm will not unlock the cred_guard_mutex. Note there are two error conditions which end up here, one before and one after bprm->cred is cleared. Fixes: b8a61c9e ("exec: Generic execfd support") Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Link: https://lore.kernel.org/r/AS8P193MB128517ADB5EFF29E04389EDAE4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
-
Kees Cook authored
Consolidate the calls to allow_write_access()/fput() into a single place, since we repeat this code pattern. Add comments around the callers for the details on it. Link: https://lore.kernel.org/r/202209161637.9EDAF6B18@keescookSigned-off-by: Kees Cook <keescook@chromium.org>
-
Askar Safin authored
Function name is wrong and the comment tells us nothing Signed-off-by: Askar Safin <safinaskar@zohomail.com> Link: https://lore.kernel.org/r/20240109030801.31827-1-safinaskar@zohomail.comSigned-off-by: Kees Cook <keescook@chromium.org>
-
Alexey Dobriyan authored
People complain when I miss people in Cc. [ kees: Also add the ELF uapi doc link ] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Link: https://lore.kernel.org/r/2cb0891e-d7c0-4939-bb5f-282812de6078@p183Signed-off-by: Kees Cook <keescook@chromium.org>
-
Linus Torvalds authored
Merge tag 'Wstringop-overflow-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull stringop-overflow warning update from Gustavo A. R. Silva: "Enable -Wstringop-overflow globally. I waited for the release of -rc1 to run a final build-test on top of it before sending this pull request. Fortunatelly, after building 358 kernels overnight (basically all supported archs with a wide variety of configs), no more warnings have surfaced! :) Thus, we are in a good position to enable this compiler option for all versions of GCC that support it, with the exception of GCC-11, which appears to have some issues with this option [1]" Link: https://lore.kernel.org/lkml/b3c99290-40bc-426f-b3d2-1aa903f95c4e@embeddedor.com/ [1] * tag 'Wstringop-overflow-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: init: Kconfig: Disable -Wstringop-overflow for GCC-11 Makefile: Enable -Wstringop-overflow globally
-