- 31 Jul, 2023 1 commit
-
-
Ajay Kaher authored
Up until now, /sys/kernel/tracing/events was no different than any other part of tracefs. The files and directories within the events directory was created when the tracefs was mounted, and also created for the instances in /sys/kernel/tracing/instances/<instance>/events. Most of these files and directories will never be referenced. Since there are thousands of these files and directories they spend their time wasting precious memory resources. Move the "events" directory to the new eventfs. The eventfs will take the meta data of the events that they represent and store that. When the files in the events directory are referenced, the dentry and inodes to represent them are then created. When the files are no longer referenced, they are freed. This saves the precious memory resources that were wasted on these seldom referenced dentries and inodes. Running the following: ~# cat /proc/meminfo /proc/slabinfo > before.out ~# mkdir /sys/kernel/tracing/instances/foo ~# cat /proc/meminfo /proc/slabinfo > after.out to test the changes produces the following deltas: Before this change: Before after deltas for meminfo: MemFree: -32260 MemAvailable: -21496 KReclaimable: 21528 Slab: 22440 SReclaimable: 21528 SUnreclaim: 912 VmallocUsed: 16 Before after deltas for slabinfo: <slab>: <objects> [ * <size> = <total>] tracefs_inode_cache: 14472 [* 1184 = 17134848] buffer_head: 24 [* 168 = 4032] hmem_inode_cache: 28 [* 1480 = 41440] dentry: 14450 [* 312 = 4508400] lsm_inode_cache: 14453 [* 32 = 462496] vma_lock: 11 [* 152 = 1672] vm_area_struct: 2 [* 184 = 368] trace_event_file: 1748 [* 88 = 153824] kmalloc-256: 1072 [* 256 = 274432] kmalloc-64: 2842 [* 64 = 181888] Total slab additions in size: 22,763,400 bytes With this change: Before after deltas for meminfo: MemFree: -12600 MemAvailable: -12580 Cached: 24 Active: 12 Inactive: 68 Inactive(anon): 48 Active(file): 12 Inactive(file): 20 Dirty: -4 AnonPages: 68 KReclaimable: 12 Slab: 1856 SReclaimable: 12 SUnreclaim: 1844 KernelStack: 16 PageTables: 36 VmallocUsed: 16 Before after deltas for slabinfo: <slab>: <objects> [ * <size> = <total>] tracefs_inode_cache: 108 [* 1184 = 127872] buffer_head: 24 [* 168 = 4032] hmem_inode_cache: 18 [* 1480 = 26640] dentry: 127 [* 312 = 39624] lsm_inode_cache: 152 [* 32 = 4864] vma_lock: 67 [* 152 = 10184] vm_area_struct: -12 [* 184 = -2208] trace_event_file: 1764 [* 96 = 169344] kmalloc-96: 14322 [* 96 = 1374912] kmalloc-64: 2814 [* 64 = 180096] kmalloc-32: 1103 [* 32 = 35296] kmalloc-16: 2308 [* 16 = 36928] kmalloc-8: 12800 [* 8 = 102400] Total slab additions in size: 2,109,984 bytes Which is a savings of 20,653,416 bytes (20 MB) per tracing instance. Link: https://lkml.kernel.org/r/1690568452-46553-10-git-send-email-akaher@vmware.comSigned-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
- 30 Jul, 2023 27 commits
-
-
Ajay Kaher authored
When events are removed from tracefs, the eventfs must be aware of this. The eventfs_remove() removes the meta data from eventfs so that it will no longer create the files associated with that event. When an instance is removed from tracefs, eventfs_remove_events_dir() will remove and clean up the entire "events" directory. The helper function eventfs_remove_rec() is used to clean up and free the associated data from eventfs for both of the added functions. SRCU is used to protect the lists of meta data stored in the eventfs. The eventfs_mutex is used to protect the content of the items in the list. As lookups may be happening as deletions of events are made, the freeing of dentry/inodes and relative information is done after the SRCU grace period has passed. Link: https://lkml.kernel.org/r/1690568452-46553-9-git-send-email-akaher@vmware.comSigned-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202305030611.Kas747Ev-lkp@intel.com/Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ajay Kaher authored
Add create_file() and create_dir() functions to create the files and directories respectively when they are accessed. The functions will be called from the lookup operation of the inode_operations or from the open function of file_operations. Link: https://lkml.kernel.org/r/1690568452-46553-8-git-send-email-akaher@vmware.comSigned-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ajay Kaher authored
Add the inode_operations, file_operations, and helper functions to eventfs: dcache_dir_open_wrapper() eventfs_root_lookup() eventfs_release() eventfs_set_ef_status_free() eventfs_post_create_dir() The inode_operations and file_operations functions will be called from the VFS layer. create_file() and create_dir() are added as stub functions and will be filled in later. Link: https://lkml.kernel.org/r/1690568452-46553-7-git-send-email-akaher@vmware.comSigned-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ajay Kaher authored
Add the following functions to add files to evenfs: eventfs_add_events_file() to add the data needed to create a specific file located at the top level events directory. The dentry/inode will be created when the events directory is scanned. eventfs_add_file() to add the data needed for files within the directories below the top level events directory. The dentry/inode of the file will be created when the directory that the file is in is scanned. Link: https://lkml.kernel.org/r/1690568452-46553-6-git-send-email-akaher@vmware.comSigned-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.comSigned-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ajay Kaher authored
Add eventfs_file structure which will hold the properties of the eventfs files and directories. Add following functions to create the directories in eventfs: eventfs_create_events_dir() will create the top level "events" directory within the tracefs file system. eventfs_add_subsystem_dir() creates an eventfs_file descriptor with the given name of the subsystem. eventfs_add_dir() creates an eventfs_file descriptor with the given name of the directory and attached to a eventfs_file of a subsystem. Add tracefs_inode structure to hold the inodes, flags and pointers to private data used by eventfs. Link: https://lkml.kernel.org/r/1690568452-46553-5-git-send-email-akaher@vmware.comSigned-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.comSigned-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ajay Kaher authored
Export a few tracefs functions that will be needed by the eventfs dynamic file system. Rename them to start with "tracefs_" to keep with the name space. start_creating -> tracefs_start_creating failed_creating -> tracefs_failed_creating end_creating -> tracefs_end_creating Link: https://lkml.kernel.org/r/1690568452-46553-4-git-send-email-akaher@vmware.comSigned-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ajay Kaher authored
Create a kmem cache of tracefs_inodes. To be more efficient, as there are lots of tracefs inodes, create its own cache. This also allows to see how many tracefs inodes have been created. Add helper functions: tracefs_alloc_inode() tracefs_free_inode() get_tracefs() Link: https://lkml.kernel.org/r/1690568452-46553-3-git-send-email-akaher@vmware.comSigned-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Steven Rostedt (Google) authored
The creation of the trace event directory requires that a TRACE_SYSTEM is defined that the trace event directory is added within the system it was defined in. The code handled the case where a TRACE_SYSTEM was not added, and would then add the event at the events directory. But nothing should be doing this. This code also prevents the implementation of creating dynamic dentrys for the eventfs system. As this path has never been hit on correct code, remove it. If it does get hit, issues a WARN_ON_ONCE() and return ENODEV. Link: https://lkml.kernel.org/r/1690568452-46553-2-git-send-email-akaher@vmware.comSigned-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Ajay Kaher <akaher@vmware.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Zheng Yejian authored
Currently we can resize trace ringbuffer by writing a value into file 'buffer_size_kb', then by reading the file, we get the value that is usually what we wrote. However, this value may be not actual size of trace ring buffer because of the round up when doing resize in kernel, and the actual size would be more useful. Link: https://lore.kernel.org/linux-trace-kernel/20230705002705.576633-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Steven Rostedt (Google) authored
As the trace iterator is created and used by various interfaces, the clean up of it needs to be consistent. Create a free_trace_iter_content() helper function that frees the content of the iterator and use that to clean it up in all places that it is used. Link: https://lkml.kernel.org/r/20230715141348.341887497@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Steven Rostedt (Google) authored
The iterator allocated a descriptor to copy the current_trace. This was done with the assumption that the function pointers might change. But this was a false assuption, as it does not change. There's no reason to make a copy of the current_trace and just use the pointer it points to. This removes needing to manage freeing the descriptor. Worse yet, there's locations that the iterator is used but does make a copy and just uses the pointer. This could cause the actual pointer to the trace descriptor to be freed and not the allocated copy. This is more of a clean up than a fix. Link: https://lkml.kernel.org/r/20230715141348.135792275@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: d7350c3f ("tracing/core: make the read callbacks reentrants") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Uros Bizjak authored
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in ring_buffer.c. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). No functional change intended. Link: https://lore.kernel.org/linux-trace-kernel/20230714154418.8884-1-ubizjak@gmail.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Steven Rostedt (Google) authored
For backward compatibility, older tooling expects to see the kernel_stack event with a "caller" field that is a fixed size array of 8 addresses. The code now supports more than 8 with an added "size" field that states the real number of entries. But the "caller" field still just looks like a fixed size to user space. Since the tracing macros that create the user space format files also creates the structures that those files represent, the kernel_stack event structure had its "caller" field a fixed size of 8, but in reality, when it is allocated on the ring buffer, it can hold more if the stack trace is bigger that 8 functions. The copying of these entries was simply done with a memcpy(): size = nr_entries * sizeof(unsigned long); memcpy(entry->caller, fstack->calls, size); The FORTIFY_SOURCE logic noticed at runtime that when the nr_entries was larger than 8, that the memcpy() was writing more than what the structure stated it can hold and it complained about it. This is because the FORTIFY_SOURCE code is unaware that the amount allocated is actually enough to hold the size. It does not expect that a fixed size field will hold more than the fixed size. This was originally solved by hiding the caller assignment with some pointer arithmetic. ptr = ring_buffer_data(); entry = ptr; ptr += offsetof(typeof(*entry), caller); memcpy(ptr, fstack->calls, size); But it is considered bad form to hide from kernel hardening. Instead, make it work nicely with FORTIFY_SOURCE by adding a new __stack_array() macro that is specific for this one special use case. The macro will take 4 arguments: type, item, len, field (whereas the __array() macro takes just the first three). This macro will act just like the __array() macro when creating the code to deal with the format file that is exposed to user space. But for the kernel, it will turn the caller field into: type item[] __counted_by(field); or for this instance: unsigned long caller[] __counted_by(size); Now the kernel code can expose the assignment of the caller to the FORTIFY_SOURCE and everyone is happy! Link: https://lore.kernel.org/linux-trace-kernel/20230712105235.5fc441aa@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20230713092605.2ddb9788@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Kees Cook <keescook@chromium.org>
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spiLinus Torvalds authored
Pull spi fixes from Mark Brown: "A bunch of fixes for the Qualcomm QSPI driver, fixing multiple issues with the newly added DMA mode - it had a number of issues exposed when tested in a wider range of use cases, both race condition style issues and issues with different inputs to those that had been used in test" * tag 'spi-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-qcom-qspi: Add mem_ops to avoid PIO for badly sized reads spi: spi-qcom-qspi: Fallback to PIO for xfers that aren't multiples of 4 bytes spi: spi-qcom-qspi: Add DMA_CHAIN_DONE to ALL_IRQS spi: spi-qcom-qspi: Call dma_wmb() after setting up descriptors spi: spi-qcom-qspi: Use GFP_ATOMIC flag while allocating for descriptor spi: spi-qcom-qspi: Ignore disabled interrupts' status in isr
-
Linus Torvalds authored
Merge tag 'regulator-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A couple of small fixes for the the mt6358 driver, fixing error reporting and a bootstrapping issue" * tag 'regulator-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: mt6358: Fix incorrect VCN33 sync error message regulator: mt6358: Sync VCN33_* enable status after checking ID
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB fixes from Greg KH: "Here are a set of USB driver fixes for 6.5-rc4. Include in here are: - new USB serial device ids - dwc3 driver fixes for reported issues - typec driver fixes for reported problems - gadget driver fixes - reverts of some problematic USB changes that went into -rc1 All of these have been in linux-next with no reported problems" * tag 'usb-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits) usb: misc: ehset: fix wrong if condition usb: dwc3: pci: skip BYT GPIO lookup table for hardwired phy usb: cdns3: fix incorrect calculation of ep_buf_size when more than one config usb: gadget: call usb_gadget_check_config() to verify UDC capability usb: typec: Use sysfs_emit_at when concatenating the string usb: typec: Iterate pds array when showing the pd list usb: typec: Set port->pd before adding device for typec_port usb: typec: qcom: fix return value check in qcom_pmic_typec_probe() Revert "usb: gadget: tegra-xudc: Fix error check in tegra_xudc_powerdomain_init()" Revert "usb: xhci: tegra: Fix error check" USB: gadget: Fix the memory leak in raw_gadget driver usb: gadget: core: remove unbalanced mutex_unlock in usb_gadget_activate Revert "usb: dwc3: core: Enable AutoRetry feature in the controller" Revert "xhci: add quirk for host controllers that don't update endpoint DCS" USB: quirks: add quirk for Focusrite Scarlett usb: xhci-mtk: set the dma max_seg_size MAINTAINERS: drop invalid usb/cdns3 Reviewer e-mail usb: dwc3: don't reset device side if dwc3 was configured as host-only usb: typec: ucsi: move typec_set_mode(TYPEC_STATE_SAFE) to ucsi_unregister_partner() usb: ohci-at91: Fix the unhandle interrupt when resume ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty/serial fixes from Greg KH: "Here are some small TTY and serial driver fixes for 6.5-rc4 for some reported problems. Included in here is: - TIOCSTI fix for braille readers - documentation fix for minor numbers - MAINTAINERS update for new serial files in -rc1 - minor serial driver fixes for reported problems All of these have been in linux-next with no reported problems" * tag 'tty-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250_dw: Preserve original value of DLF register tty: serial: sh-sci: Fix sleeping in atomic context serial: sifive: Fix sifive_serial_console_setup() section Documentation: devices.txt: reconcile serial/ucc_uart minor numers MAINTAINERS: Update TTY layer for lists and recently added files tty: n_gsm: fix UAF in gsm_cleanup_mux TIOCSTI: always enable for CAP_SYS_ADMIN
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging driver fixes from Greg KH: "Here are three small staging driver fixes for 6.5-rc4 that resolve some reported problems. These fixes are: - fix for an old bug in the r8712 driver - fbtft driver fix for a spi device - potential overflow fix in the ks7010 driver All of these have been in linux-next with no reported problems" * tag 'staging-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext() staging: fbtft: ili9341: use macro FBTFT_REGISTER_SPI_DRIVER staging: r8712: Fix memory leak in _r8712_init_xmit_priv()
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char driver and Documentation fixes from Greg KH: "Here is a char driver fix and some documentation updates for 6.5-rc4 that contain the following changes: - sram/genalloc bugfix for reported problem - security-bugs.rst update based on recent discussions - embargoed-hardware-issues minor cleanups and then partial revert for the project/company lists All of these have been in linux-next for a while with no reported problems, and the documentation updates have all been reviewed by the relevant developers" * tag 'char-misc-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: misc/genalloc: Name subpools by of_node_full_name() Documentation: embargoed-hardware-issues.rst: add AMD to the list Documentation: embargoed-hardware-issues.rst: clean out empty and unused entries Documentation: security-bugs.rst: clarify CVE handling Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group
-
Linus Torvalds authored
Merge tag 'probes-fixes-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probe fixes from Masami Hiramatsu: - probe-events: add NULL check for some BTF API calls which can return error code and NULL. - ftrace selftests: check fprobe and kprobe event correctly. This fixes a miss condition of the test command. - kprobes: do not allow probing functions that start with "__cfi_" or "__pfx_" since those are auto generated for kernel CFI and not executed. * tag 'probes-fixes-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobes: Prohibit probing on CFI preamble symbol selftests/ftrace: Fix to check fprobe event eneblement tracing/probes: Fix to add NULL check for BTF APIs
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: "x86: - Do not register IRQ bypass consumer if posted interrupts not supported - Fix missed device interrupt due to non-atomic update of IRR - Use GFP_KERNEL_ACCOUNT for pid_table in ipiv - Make VMREAD error path play nice with noinstr - x86: Acquire SRCU read lock when handling fastpath MSR writes - Support linking rseq tests statically against glibc 2.35+ - Fix reference count for stats file descriptors - Detect userspace setting invalid CR0 Non-KVM: - Remove coccinelle script that has caused multiple confusion ("debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage", acked by Greg)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: selftests: Expand x86's sregs test to cover illegal CR0 values KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid Revert "debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage" KVM: selftests: Verify stats fd is usable after VM fd has been closed KVM: selftests: Verify stats fd can be dup()'d and read KVM: selftests: Verify userspace can create "redundant" binary stats files KVM: selftests: Explicitly free vcpus array in binary stats test KVM: selftests: Clean up stats fd in common stats_test() helper KVM: selftests: Use pread() to read binary stats header KVM: Grab a reference to KVM for VM and vCPU stats file descriptors selftests/rseq: Play nice with binaries statically linked against glibc 2.35+ Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid" KVM: x86: Acquire SRCU read lock when handling fastpath MSR writes KVM: VMX: Use vmread_error() to report VM-Fail in "goto" path KVM: VMX: Make VMREAD error path play nice with noinstr KVM: x86/irq: Conditionally register IRQ bypass consumer again KVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipiv KVM: x86: check the kvm_cpu_get_interrupt result before using it KVM: x86: VMX: set irr_pending in kvm_apic_update_irr ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fix from Borislav Petkov: - Fix a rtmutex race condition resulting from sharing of the sort key between the lock waiters and the PI chain tree (->pi_waiters) of a task by giving each tree their own sort key * tag 'locking_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Fix task->pi_waiters integrity
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: - AMD's automatic IBRS doesn't enable cross-thread branch target injection protection (STIBP) for user processes. Enable STIBP on such systems. - Do not delete (but put the ref instead) of AMD MCE error thresholding sysfs kobjects when destroying them in order not to delete the kernfs pointer prematurely - Restore annotation in ret_from_fork_asm() in order to fix kthread stack unwinding from being marked as unreliable and thus breaking livepatching * tag 'x86_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks x86: Fix kthread unwind
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Borislav Petkov: - Work around an erratum on GIC700, where a race between a CPU handling a wake-up interrupt, a change of affinity, and another CPU going to sleep can result in a lack of wake-up event on the next interrupt - Fix the locking required on a VPE for GICv4 - Enable Rockchip 3588001 erratum workaround for RK3588S - Fix the irq-bcm6345-l1 assumtions of the boot CPU always be the first CPU in the system * tag 'irq_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3: Workaround for GIC-700 erratum 2941627 irqchip/gic-v3: Enable Rockchip 3588001 erratum workaround for RK3588S irqchip/gic-v4.1: Properly lock VPEs when doing a directLPI invalidation irq-bcm6345-l1: Do not assume a fixed block to cpu mapping
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull smb client fixes from Steve French: "Four small SMB3 client fixes: - two reconnect fixes (to address the case where non-default iocharset gets incorrectly overridden at reconnect with the default charset) - fix for NTLMSSP_AUTH request setting a flag incorrectly) - Add missing check for invalid tlink (tree connection) in ioctl" * tag '6.5-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: add missing return value check for cifs_sb_tlink smb3: do not set NTLMSSP_VERSION flag for negotiate not auth request cifs: fix charset issue in reconnection fs/nls: make load_nls() take a const parameter
-
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds authored
Pull tracing fixes from Steven Rostedt: - Fix to /sys/kernel/tracing/per_cpu/cpu*/stats read and entries. If a resize shrinks the buffer it clears the read count to notify readers that they need to reset. But the read count is also used for accounting and this causes the numbers to be off. Instead, create a separate variable to use to notify readers to reset. - Fix the ref counts of the "soft disable" mode. The wrong value was used for testing if soft disable mode should be enabled or disable, but instead, just change the logic to do the enable and disable in place when the SOFT_MODE is set or cleared. - Several kernel-doc fixes - Removal of unused external declarations * tag 'trace-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix warning in trace_buffered_event_disable() ftrace: Remove unused extern declarations tracing: Fix kernel-doc warnings in trace_seq.c tracing: Fix kernel-doc warnings in trace_events_trigger.c tracing/synthetic: Fix kernel-doc warnings in trace_events_synth.c ring-buffer: Fix kernel-doc warnings in ring_buffer.c ring-buffer: Fix wrong stat of cpu_buffer->read
-
- 29 Jul, 2023 12 commits
-
-
Sven Joachim authored
Commit a2225d93 ("autofs: remove left-over autofs4 stubs") promised the removal of the fs/autofs/Kconfig fragment for AUTOFS4_FS within a couple of releases, but five years later this still has not happened yet, and AUTOFS4_FS is still enabled in 63 defconfigs. Get rid of it mechanically: git grep -l CONFIG_AUTOFS4_FS -- '*defconfig' | xargs sed -i 's/AUTOFS4_FS/AUTOFS_FS/' Also just remove the AUTOFS4_FS config option stub. Anybody who hasn't regenerated their config file in the last five years will need to just get the new name right when they do. Signed-off-by: Sven Joachim <svenjoac@gmx.de> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Merge tag 'loongarch-fixes-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Some bug fixes for build system, builtin cmdline handling, bpf and {copy, clear}_user, together with a trivial cleanup" * tag 'loongarch-fixes-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Cleanup __builtin_constant_p() checking for cpu_has_* LoongArch: BPF: Fix check condition to call lu32id in move_imm() LoongArch: BPF: Enable bpf_probe_read{, str}() on LoongArch LoongArch: Fix return value underflow in exception path LoongArch: Fix CMDLINE_EXTEND and CMDLINE_BOOTLOADER handling LoongArch: Fix module relocation error with binutils 2.41 LoongArch: Only fiddle with CHECKFLAGS if `need-compiler'
-
Sean Christopherson authored
Add coverage to x86's set_sregs_test to verify KVM rejects vendor-agnostic illegal CR0 values, i.e. CR0 values whose legality doesn't depend on the current VMX mode. KVM historically has neglected to reject bad CR0s from userspace, i.e. would happily accept a completely bogus CR0 via KVM_SET_SREGS{2}. Punt VMX specific subtests to future work, as they would require quite a bit more effort, and KVM gets coverage for CR0 checks in general through other means, e.g. KVM-Unit-Tests. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230613203037.1968489-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Stuff CR0 and/or CR4 to be compliant with a restricted guest if and only if KVM itself is not configured to utilize unrestricted guests, i.e. don't stuff CR0/CR4 for a restricted L2 that is running as the guest of an unrestricted L1. Any attempt to VM-Enter a restricted guest with invalid CR0/CR4 values should fail, i.e. in a nested scenario, KVM (as L0) should never observe a restricted L2 with incompatible CR0/CR4, since nested VM-Enter from L1 should have failed. And if KVM does observe an active, restricted L2 with incompatible state, e.g. due to a KVM bug, fudging CR0/CR4 instead of letting VM-Enter fail does more harm than good, as KVM will often neglect to undo the side effects, e.g. won't clear rmode.vm86_active on nested VM-Exit, and thus the damage can easily spill over to L1. On the other hand, letting VM-Enter fail due to bad guest state is more likely to contain the damage to L2 as KVM relies on hardware to perform most guest state consistency checks, i.e. KVM needs to be able to reflect a failed nested VM-Enter into L1 irrespective of (un)restricted guest behavior. Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Fixes: bddd82d1 ("KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230613203037.1968489-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid, e.g. due to setting bits 63:32, illegal combinations, or to a value that isn't allowed in VMX (non-)root mode. The VMX checks in particular are "fun" as failure to disallow Real Mode for an L2 that is configured with unrestricted guest disabled, when KVM itself has unrestricted guest enabled, will result in KVM forcing VM86 mode to virtual Real Mode for L2, but then fail to unwind the related metadata when synthesizing a nested VM-Exit back to L1 (which has unrestricted guest enabled). Opportunistically fix a benign typo in the prototype for is_valid_cr4(). Cc: stable@vger.kernel.org Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.comSigned-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230613203037.1968489-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Remove coccinelle's recommendation to use DEFINE_DEBUGFS_ATTRIBUTE() instead of DEFINE_SIMPLE_ATTRIBUTE(). Regardless of whether or not the "significant overhead" incurred by debugfs_create_file() is actually meaningful, warnings from the script have led to a rash of low-quality patches that have sowed confusion and consumed maintainer time for little to no benefit. There have been no less than four attempts to "fix" KVM, and a quick search on lore shows that KVM is not alone. This reverts commit 5103068e. Link: https://lore.kernel.org/all/87tu2nbnz3.fsf@mpe.ellerman.id.au Link: https://lore.kernel.org/all/c0b98151-16b6-6d8f-1765-0f7d46682d60@redhat.com Link: https://lkml.kernel.org/r/20230706072954.4881-1-duminjie%40vivo.com Link: https://lore.kernel.org/all/Y2FsbufV00jbyF0B@google.com Link: https://lore.kernel.org/all/Y2ENJJ1YiSg5oHiy@orome Link: https://lore.kernel.org/all/7560b350e7b23786ce712118a9a504356ff1cca4.camel@kernel.orgSuggested-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230726202920.507756-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Verify that VM and vCPU binary stats files are usable even after userspace has put its last direct reference to the VM. This is a regression test for a UAF bug where KVM didn't gift the stats files a reference to the VM. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Expand the binary stats test to verify that a stats fd can be dup()'d and read, to (very) roughly simulate userspace passing around the file. Adding the dup() test is primarily an intermediate step towards verifying that userspace can read VM/vCPU stats before _and_ after userspace closes its copy of the VM fd; the dup() test itself is only mildly interesting. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Verify that KVM doesn't artificially limit KVM_GET_STATS_FD to a single file per VM/vCPU. There's no known use case for getting multiple stats fds, but it should work, and more importantly creating multiple files will make it easier to test that KVM correct manages VM refcounts for stats files. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Explicitly free the all-encompassing vcpus array in the binary stats test so that the test is consistent with respect to freeing all dynamically allocated resources (versus letting them be freed on exit). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the stats fd cleanup code into stats_test() and drop the superfluous vm_stats_test() and vcpu_stats_test() helpers in order to decouple creation of the stats file from consuming/testing the file (deduping code is a bonus). This will make it easier to test various edge cases related to stats, e.g. that userspace can dup() a stats fd, that userspace can have multiple stats files for a singleVM/vCPU, etc. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use pread() with an explicit offset when reading the header and the header name for a binary stats fd so that the common helper and the binary stats test don't subtly rely on the file effectively being untouched, e.g. to allow multiple reads of the header, name, etc. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230711230131.648752-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-