Commit 1f440397 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-6.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "A moderatly busy cycle for development this time around.

   - Some cleanup of the main index page for easier navigation

   - Rework some of the other top-level pages for better readability
     and, with luck, fewer merge conflicts in the future.

   - Submit-checklist improvements, hopefully the first of many.

   - New Italian translations

   - A fair number of kernel-doc fixes and improvements. We have also
     dropped the recommendation to use an old version of Sphinx.

   - A new document from Thorsten on bisection

  ... and lots of fixes and updates"

* tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits)
  docs: verify/bisect: fixes, finetuning, and support for Arch
  docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs
  docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst
  docs: submit-checklist: use subheadings
  docs: submit-checklist: structure by category
  docs: new text on bisecting which also covers bug validation
  docs: drop the version constraints for sphinx and dependencies
  docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4
  docs: Restore "smart quotes" for quotes
  docs/zh_CN: accurate translation of "function"
  docs: Include simplified link titles in main index
  docs: Correct formatting of title in admin-guide/index.rst
  docs: kernel_feat.py: fix build error for missing files
  MAINTAINERS: Set the field name for subsystem profile section
  kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO
  Fixed case issue with 'fault-injection' in documentation
  kernel-doc: handle #if in enums as well
  Documentation: update mailing list addresses
  doc: kerneldoc.py: fix indentation
  scripts/kernel-doc: simplify signature printing
  ...
parents 3749bda2 0c8e9b53
What: /sys/bus/vdpa/drivers_autoprobe What: /sys/bus/vdpa/drivers_autoprobe
Date: March 2020 Date: March 2020
Contact: virtualization@lists.linux-foundation.org Contact: virtualization@lists.linux.dev
Description: Description:
This file determines whether new devices are immediately bound This file determines whether new devices are immediately bound
to a driver after the creation. It initially contains 1, which to a driver after the creation. It initially contains 1, which
...@@ -12,7 +12,7 @@ Description: ...@@ -12,7 +12,7 @@ Description:
What: /sys/bus/vdpa/driver_probe What: /sys/bus/vdpa/driver_probe
Date: March 2020 Date: March 2020
Contact: virtualization@lists.linux-foundation.org Contact: virtualization@lists.linux.dev
Description: Description:
Writing a device name to this file will cause the kernel binds Writing a device name to this file will cause the kernel binds
devices to a compatible driver. devices to a compatible driver.
...@@ -22,7 +22,7 @@ Description: ...@@ -22,7 +22,7 @@ Description:
What: /sys/bus/vdpa/drivers/.../bind What: /sys/bus/vdpa/drivers/.../bind
Date: March 2020 Date: March 2020
Contact: virtualization@lists.linux-foundation.org Contact: virtualization@lists.linux.dev
Description: Description:
Writing a device name to this file will cause the driver to Writing a device name to this file will cause the driver to
attempt to bind to the device. This is useful for overriding attempt to bind to the device. This is useful for overriding
...@@ -30,7 +30,7 @@ Description: ...@@ -30,7 +30,7 @@ Description:
What: /sys/bus/vdpa/drivers/.../unbind What: /sys/bus/vdpa/drivers/.../unbind
Date: March 2020 Date: March 2020
Contact: virtualization@lists.linux-foundation.org Contact: virtualization@lists.linux.dev
Description: Description:
Writing a device name to this file will cause the driver to Writing a device name to this file will cause the driver to
attempt to unbind from the device. This may be useful when attempt to unbind from the device. This may be useful when
...@@ -38,7 +38,7 @@ Description: ...@@ -38,7 +38,7 @@ Description:
What: /sys/bus/vdpa/devices/.../driver_override What: /sys/bus/vdpa/devices/.../driver_override
Date: November 2021 Date: November 2021
Contact: virtualization@lists.linux-foundation.org Contact: virtualization@lists.linux.dev
Description: Description:
This file allows the driver for a device to be specified. This file allows the driver for a device to be specified.
When specified, only a driver with a name matching the value When specified, only a driver with a name matching the value
......
...@@ -111,7 +111,9 @@ $(YNL_INDEX): $(YNL_RST_FILES) ...@@ -111,7 +111,9 @@ $(YNL_INDEX): $(YNL_RST_FILES)
$(YNL_RST_DIR)/%.rst: $(YNL_YAML_DIR)/%.yaml $(YNL_TOOL) $(YNL_RST_DIR)/%.rst: $(YNL_YAML_DIR)/%.yaml $(YNL_TOOL)
$(Q)$(YNL_TOOL) -i $< -o $@ $(Q)$(YNL_TOOL) -i $< -o $@
htmldocs: $(YNL_INDEX) htmldocs texinfodocs latexdocs epubdocs xmldocs: $(YNL_INDEX)
htmldocs:
@$(srctree)/scripts/sphinx-pre-install --version-check @$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var))) @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
...@@ -176,6 +178,7 @@ refcheckdocs: ...@@ -176,6 +178,7 @@ refcheckdocs:
$(Q)cd $(srctree);scripts/documentation-file-ref-check $(Q)cd $(srctree);scripts/documentation-file-ref-check
cleandocs: cleandocs:
$(Q)rm -f $(YNL_INDEX) $(YNL_RST_FILES)
$(Q)rm -rf $(BUILDDIR) $(Q)rm -rf $(BUILDDIR)
$(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/userspace-api/media clean $(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/userspace-api/media clean
......
...@@ -318,7 +318,7 @@ Suppose that a previous kvm.sh run left its output in this directory:: ...@@ -318,7 +318,7 @@ Suppose that a previous kvm.sh run left its output in this directory::
tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28 tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28
Then this run can be re-run without rebuilding as follow: Then this run can be re-run without rebuilding as follow::
kvm-again.sh tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28 kvm-again.sh tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28
......
...@@ -262,9 +262,11 @@ Compiling the kernel ...@@ -262,9 +262,11 @@ Compiling the kernel
- Make sure you have at least gcc 5.1 available. - Make sure you have at least gcc 5.1 available.
For more information, refer to :ref:`Documentation/process/changes.rst <changes>`. For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
- Do a ``make`` to create a compressed kernel image. It is also - Do a ``make`` to create a compressed kernel image. It is also possible to do
possible to do ``make install`` if you have lilo installed to suit the ``make install`` if you have lilo installed or if your distribution has an
kernel makefiles, but you may want to check your particular lilo setup first. install script recognised by the kernel's installer. Most popular
distributions will have a recognized install script. You may want to
check your distribution's setup first.
To do the actual install, you have to be root, but none of the normal To do the actual install, you have to be root, but none of the normal
build should require that. Don't take the name of root in vain. build should require that. Don't take the name of root in vain.
...@@ -301,32 +303,51 @@ Compiling the kernel ...@@ -301,32 +303,51 @@ Compiling the kernel
image (e.g. .../linux/arch/x86/boot/bzImage after compilation) image (e.g. .../linux/arch/x86/boot/bzImage after compilation)
to the place where your regular bootable kernel is found. to the place where your regular bootable kernel is found.
- Booting a kernel directly from a floppy without the assistance of a - Booting a kernel directly from a storage device without the assistance
bootloader such as LILO, is no longer supported. of a bootloader such as LILO or GRUB, is no longer supported in BIOS
(non-EFI systems). On UEFI/EFI systems, however, you can use EFISTUB
If you boot Linux from the hard drive, chances are you use LILO, which which allows the motherboard to boot directly to the kernel.
uses the kernel image as specified in the file /etc/lilo.conf. The On modern workstations and desktops, it's generally recommended to use a
kernel image file is usually /vmlinuz, /boot/vmlinuz, /bzImage or bootloader as difficulties can arise with multiple kernels and secure boot.
/boot/bzImage. To use the new kernel, save a copy of the old image For more details on EFISTUB,
and copy the new image over the old one. Then, you MUST RERUN LILO see "Documentation/admin-guide/efi-stub.rst".
to update the loading map! If you don't, you won't be able to boot
the new kernel image. - It's important to note that as of 2016 LILO (LInux LOader) is no longer in
active development, though as it was extremely popular, it often comes up
Reinstalling LILO is usually a matter of running /sbin/lilo. in documentation. Popular alternatives include GRUB2, rEFInd, Syslinux,
You may wish to edit /etc/lilo.conf to specify an entry for your systemd-boot, or EFISTUB. For various reasons, it's not recommended to use
old kernel image (say, /vmlinux.old) in case the new one does not software that's no longer in active development.
work. See the LILO docs for more information.
- Chances are your distribution includes an install script and running
After reinstalling LILO, you should be all set. Shutdown the system, ``make install`` will be all that's needed. Should that not be the case
you'll have to identify your bootloader and reference its documentation or
configure your EFI.
Legacy LILO Instructions
------------------------
- If you use LILO the kernel images are specified in the file /etc/lilo.conf.
The kernel image file is usually /vmlinuz, /boot/vmlinuz, /bzImage or
/boot/bzImage. To use the new kernel, save a copy of the old image and copy
the new image over the old one. Then, you MUST RERUN LILO to update the
loading map! If you don't, you won't be able to boot the new kernel image.
- Reinstalling LILO is usually a matter of running /sbin/lilo. You may wish
to edit /etc/lilo.conf to specify an entry for your old kernel image
(say, /vmlinux.old) in case the new one does not work. See the LILO docs
for more information.
- After reinstalling LILO, you should be all set. Shutdown the system,
reboot, and enjoy! reboot, and enjoy!
If you ever need to change the default root device, video mode, - If you ever need to change the default root device, video mode, etc. in the
etc. in the kernel image, use your bootloader's boot options kernel image, use your bootloader's boot options where appropriate. No need
where appropriate. No need to recompile the kernel to change to recompile the kernel to change these parameters.
these parameters.
- Reboot with the new kernel and enjoy. - Reboot with the new kernel and enjoy.
If something goes wrong If something goes wrong
----------------------- -----------------------
......
=================================================
The Linux kernel user's and administrator's guide The Linux kernel user's and administrator's guide
================================================= =================================================
...@@ -37,6 +38,7 @@ problems and bugs in particular. ...@@ -37,6 +38,7 @@ problems and bugs in particular.
reporting-issues reporting-issues
reporting-regressions reporting-regressions
quickly-build-trimmed-linux quickly-build-trimmed-linux
verify-bugs-and-bisect-regressions
bug-hunting bug-hunting
bug-bisect bug-bisect
tainted-kernels tainted-kernels
......
...@@ -4668,6 +4668,11 @@ ...@@ -4668,6 +4668,11 @@
may be specified. may be specified.
Format: <port>,<port>.... Format: <port>,<port>....
possible_cpus= [SMP,S390,X86]
Format: <unsigned int>
Set the number of possible CPUs, overriding the
regular discovery mechanisms (such as ACPI/FW, etc).
powersave=off [PPC] This option disables power saving features. powersave=off [PPC] This option disables power saving features.
It specifically disables cpuidle and sets the It specifically disables cpuidle and sets the
platform machine description specific power_save platform machine description specific power_save
......
...@@ -34,7 +34,7 @@ name of the command ('Comm:') that triggered the event:: ...@@ -34,7 +34,7 @@ name of the command ('Comm:') that triggered the event::
You'll find a 'Not tainted: ' there if the kernel was not tainted at the You'll find a 'Not tainted: ' there if the kernel was not tainted at the
time of the event; if it was, then it will print 'Tainted: ' and characters time of the event; if it was, then it will print 'Tainted: ' and characters
either letters or blanks. In above example it looks like this:: either letters or blanks. In the example above it looks like this::
Tainted: P W O Tainted: P W O
...@@ -52,7 +52,7 @@ At runtime, you can query the tainted state by reading ...@@ -52,7 +52,7 @@ At runtime, you can query the tainted state by reading
tainted; any other number indicates the reasons why it is. The easiest way to tainted; any other number indicates the reasons why it is. The easiest way to
decode that number is the script ``tools/debugging/kernel-chktaint``, which your decode that number is the script ``tools/debugging/kernel-chktaint``, which your
distribution might ship as part of a package called ``linux-tools`` or distribution might ship as part of a package called ``linux-tools`` or
``kernel-tools``; if it doesn't you can download the script from ``kernel-tools``; if it doesn't, you can download the script from
`git.kernel.org <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/tools/debugging/kernel-chktaint>`_ `git.kernel.org <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/tools/debugging/kernel-chktaint>`_
and execute it with ``sh kernel-chktaint``, which would print something like and execute it with ``sh kernel-chktaint``, which would print something like
this on the machine that had the statements in the logs that were quoted earlier:: this on the machine that had the statements in the logs that were quoted earlier::
......
.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
.. [see the bottom of this file for redistribution information]
=========================================
How to verify bugs and bisect regressions
=========================================
This document describes how to check if some Linux kernel problem occurs in code
currently supported by developers -- to then explain how to locate the change
causing the issue, if it is a regression (e.g. did not happen with earlier
versions).
The text aims at people running kernels from mainstream Linux distributions on
commodity hardware who want to report a kernel bug to the upstream Linux
developers. Despite this intent, the instructions work just as well for users
who are already familiar with building their own kernels: they help avoid
mistakes occasionally made even by experienced developers.
..
Note: if you see this note, you are reading the text's source file. You
might want to switch to a rendered version: it makes it a lot easier to
read and navigate this document -- especially when you want to look something
up in the reference section, then jump back to where you left off.
..
Find the latest rendered version of this text here:
https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.rst.html
The essence of the process (aka 'TL;DR')
========================================
*[If you are new to building or bisecting Linux, ignore this section and head
over to the* ":ref:`step-by-step guide<introguide_bissbs>`" *below. It utilizes
the same commands as this section while describing them in brief fashion. The
steps are nevertheless easy to follow and together with accompanying entries
in a reference section mention many alternatives, pitfalls, and additional
aspects, all of which might be essential in your present case.]*
**In case you want to check if a bug is present in code currently supported by
developers**, execute just the *preparations* and *segment 1*; while doing so,
consider the newest Linux kernel you regularly use to be the 'working' kernel.
In the following example that's assumed to be 6.0.13, which is why the sources
of v6.0 will be used to prepare the .config file.
**In case you face a regression**, follow the steps at least till the end of
*segment 2*. Then you can submit a preliminary report -- or continue with
*segment 3*, which describes how to perform a bisection needed for a
full-fledged regression report. In the following example 6.0.13 is assumed to be
the 'working' kernel and 6.1.5 to be the first 'broken', which is why v6.0
will be considered the 'good' release and used to prepare the .config file.
* **Preparations**: set up everything to build your own kernels::
# * Remove any software that depends on externally maintained kernel modules
# or builds any automatically during bootup.
# * Ensure Secure Boot permits booting self-compiled Linux kernels.
# * If you are not already running the 'working' kernel, reboot into it.
# * Install compilers and everything else needed for building Linux.
# * Ensure to have 15 Gigabyte free space in your home directory.
git clone -o mainline --no-checkout \
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/linux/
cd ~/linux/
git remote add -t master stable \
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
git checkout --detach v6.0
# * Hint: if you used an existing clone, ensure no stale .config is around.
make olddefconfig
# * Ensure the former command picked the .config of the 'working' kernel.
# * Connect external hardware (USB keys, tokens, ...), start a VM, bring up
# VPNs, mount network shares, and briefly try the feature that is broken.
yes '' | make localmodconfig
./scripts/config --set-str CONFIG_LOCALVERSION '-local'
./scripts/config -e CONFIG_LOCALVERSION_AUTO
# * Note, when short on storage space, check the guide for an alternative:
./scripts/config -d DEBUG_INFO_NONE -e KALLSYMS_ALL -e DEBUG_KERNEL \
-e DEBUG_INFO -e DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT -e KALLSYMS
# * Hint: at this point you might want to adjust the build configuration;
# you'll have to, if you are running Debian.
make olddefconfig
cp .config ~/kernel-config-working
* **Segment 1**: build a kernel from the latest mainline codebase.
This among others checks if the problem was fixed already and which developers
later need to be told about the problem; in case of a regression, this rules
out a .config change as root of the problem.
a) Checking out latest mainline code::
cd ~/linux/
git checkout --force --detach mainline/master
b) Build, install, and boot a kernel::
cp ~/kernel-config-working .config
make olddefconfig
make -j $(nproc --all)
# * Make sure there is enough disk space to hold another kernel:
df -h /boot/ /lib/modules/
# * Note: on Arch Linux, its derivatives and a few other distributions
# the following commands will do nothing at all or only part of the
# job. See the step-by-step guide for further details.
command -v installkernel && sudo make modules_install install
# * Check how much space your self-built kernel actually needs, which
# enables you to make better estimates later:
du -ch /boot/*$(make -s kernelrelease)* | tail -n 1
du -sh /lib/modules/$(make -s kernelrelease)/
# * Hint: the output of the following command will help you pick the
# right kernel from the boot menu:
make -s kernelrelease | tee -a ~/kernels-built
reboot
# * Once booted, ensure you are running the kernel you just built by
# checking if the output of the next two commands matches:
tail -n 1 ~/kernels-built
uname -r
c) Check if the problem occurs with this kernel as well.
* **Segment 2**: ensure the 'good' kernel is also a 'working' kernel.
This among others verifies the trimmed .config file actually works well, as
bisecting with it otherwise would be a waste of time:
a) Start by checking out the sources of the 'good' version::
cd ~/linux/
git checkout --force --detach v6.0
b) Build, install, and boot a kernel as described earlier in *segment 1,
section b* -- just feel free to skip the 'du' commands, as you have a rough
estimate already.
c) Ensure the feature that regressed with the 'broken' kernel actually works
with this one.
* **Segment 3**: perform and validate the bisection.
a) In case your 'broken' version is a stable/longterm release, add the Git
branch holding it::
git remote set-branches --add stable linux-6.1.y
git fetch stable
b) Initialize the bisection::
cd ~/linux/
git bisect start
git bisect good v6.0
git bisect bad v6.1.5
c) Build, install, and boot a kernel as described earlier in *segment 1,
section b*.
In case building or booting the kernel fails for unrelated reasons, run
``git bisect skip``. In all other outcomes, check if the regressed feature
works with the newly built kernel. If it does, tell Git by executing
``git bisect good``; if it does not, run ``git bisect bad`` instead.
All three commands will make Git checkout another commit; then re-execute
this step (e.g. build, install, boot, and test a kernel to then tell Git
the outcome). Do so again and again until Git shows which commit broke
things. If you run short of disk space during this process, check the
"Supplementary tasks" section below.
d) Once your finished the bisection, put a few things away::
cd ~/linux/
git bisect log > ~/bisect-log
cp .config ~/bisection-config-culprit
git bisect reset
e) Try to verify the bisection result::
git checkout --force --detach mainline/master
git revert --no-edit cafec0cacaca0
This is optional, as some commits are impossible to revert. But if the
second command worked flawlessly, build, install, and boot one more kernel
kernel, which should not show the regression.
* **Supplementary tasks**: cleanup during and after the process.
a) To avoid running out of disk space during a bisection, you might need to
remove some kernels you built earlier. You most likely want to keep those
you built during segment 1 and 2 around for a while, but you will most
likely no longer need kernels tested during the actual bisection
(Segment 3 c). You can list them in build order using::
ls -ltr /lib/modules/*-local*
To then for example erase a kernel that identifies itself as
'6.0-rc1-local-gcafec0cacaca0', use this::
sudo rm -rf /lib/modules/6.0-rc1-local-gcafec0cacaca0
sudo kernel-install -v remove 6.0-rc1-local-gcafec0cacaca0
# * Note, on some distributions kernel-install is missing
# or does only part of the job.
b) If you performed a bisection and successfully validated the result, feel
free to remove all kernels built during the actual bisection (Segment 3 c);
the kernels you built earlier and later you might want to keep around for
a week or two.
.. _introguide_bissbs:
Step-by-step guide on how to verify bugs and bisect regressions
===============================================================
This guide describes how to set up your own Linux kernels for investigating bugs
or regressions you intent to report. How far you want to follow the instructions
depends on your issue:
Execute all steps till the end of *segment 1* to **verify if your kernel problem
is present in code supported by Linux kernel developers**. If it is, you are all
set to report the bug -- unless it did not happen with earlier kernel versions,
as then your want to at least continue with *segment 2* to **check if the issue
qualifies as regression** which receive priority treatment. Depending on the
outcome you then are ready to report a bug or submit a preliminary regression
report; instead of the latter your could also head straight on and follow
*segment 3* to **perform a bisection** for a full-fledged regression report
developers are obliged to act upon.
:ref:`Preparations: set up everything to build your own kernels.<introprep_bissbs>`
:ref:`Segment 1: try to reproduce the problem with the latest codebase.<introlatestcheck_bissbs>`
:ref:`Segment 2: check if the kernels you build work fine.<introworkingcheck_bissbs>`
:ref:`Segment 3: perform a bisection and validate the result.<introbisect_bissbs>`
:ref:`Supplementary tasks: cleanup during and after following this guide.<introclosure_bissbs>`
The steps in each segment illustrate the important aspects of the process, while
a comprehensive reference section holds additional details. The latter sometimes
also outlines alternative approaches, pitfalls, as well as problems that might
occur at the particular step -- and how to get things rolling again.
For further details on how to report Linux kernel issues or regressions check
out Documentation/admin-guide/reporting-issues.rst, which works in conjunction
with this document. It among others explains why you need to verify bugs with
the latest 'mainline' kernel, even if you face a problem with a kernel from a
'stable/longterm' series; for users facing a regression it also explains that
sending a preliminary report after finishing segment 2 might be wise, as the
regression and its culprit might be known already. For further details on
what actually qualifies as a regression check out
Documentation/admin-guide/reporting-regressions.rst.
.. _introprep_bissbs:
Preparations: set up everything to build your own kernels
---------------------------------------------------------
.. _backup_bissbs:
* Create a fresh backup and put system repair and restore tools at hand, just
to be prepared for the unlikely case of something going sideways.
[:ref:`details<backup_bisref>`]
.. _vanilla_bissbs:
* Remove all software that depends on externally developed kernel drivers or
builds them automatically. That includes but is not limited to DKMS, openZFS,
VirtualBox, and Nvidia's graphics drivers (including the GPLed kernel module).
[:ref:`details<vanilla_bisref>`]
.. _secureboot_bissbs:
* On platforms with 'Secure Boot' or similar solutions, prepare everything to
ensure the system will permit your self-compiled kernel to boot. The
quickest and easiest way to achieve this on commodity x86 systems is to
disable such techniques in the BIOS setup utility; alternatively, remove
their restrictions through a process initiated by
``mokutil --disable-validation``.
[:ref:`details<secureboot_bisref>`]
.. _rangecheck_bissbs:
* Determine the kernel versions considered 'good' and 'bad' throughout this
guide.
Do you follow this guide to verify if a bug is present in the code developers
care for? Then consider the mainline release your 'working' kernel (the newest
one you regularly use) is based on to be the 'good' version; if your 'working'
kernel for example is '6.0.11', then your 'good' kernel is 'v6.0'.
In case you face a regression, it depends on the version range where the
regression was introduced:
* Something which used to work in Linux 6.0 broke when switching to Linux
6.1-rc1? Then henceforth regard 'v6.0' as the last known 'good' version
and 'v6.1-rc1' as the first 'bad' one.
* Some function stopped working when updating from 6.0.11 to 6.1.4? Then for
the time being consider 'v6.0' as the last 'good' version and 'v6.1.4' as
the 'bad' one. Note, at this point it is merely assumed that 6.0 is fine;
this assumption will be checked in segment 2.
* A feature you used in 6.0.11 does not work at all or worse in 6.1.13? In
that case you want to bisect within a stable/longterm series: consider
'v6.0.11' as the last known 'good' version and 'v6.0.13' as the first 'bad'
one. Note, in this case you still want to compile and test a mainline kernel
as explained in segment 1: the outcome will determine if you need to report
your issue to the regular developers or the stable team.
*Note, do not confuse 'good' version with 'working' kernel; the latter term
throughout this guide will refer to the last kernel that has been working
fine.*
[:ref:`details<rangecheck_bisref>`]
.. _bootworking_bissbs:
* Boot into the 'working' kernel and briefly use the apparently broken feature.
[:ref:`details<bootworking_bisref>`]
.. _diskspace_bissbs:
* Ensure to have enough free space for building Linux. 15 Gigabyte in your home
directory should typically suffice. If you have less available, be sure to pay
attention to later steps about retrieving the Linux sources and handling of
debug symbols: both explain approaches reducing the amount of space, which
should allow you to master these tasks with about 4 Gigabytes free space.
[:ref:`details<diskspace_bisref>`]
.. _buildrequires_bissbs:
* Install all software required to build a Linux kernel. Often you will need:
'bc', 'binutils' ('ld' et al.), 'bison', 'flex', 'gcc', 'git', 'openssl',
'pahole', 'perl', and the development headers for 'libelf' and 'openssl'. The
reference section shows how to quickly install those on various popular Linux
distributions.
[:ref:`details<buildrequires_bisref>`]
.. _sources_bissbs:
* Retrieve the mainline Linux sources; then change into the directory holding
them, as all further commands in this guide are meant to be executed from
there.
*Note, the following describe how to retrieve the sources using a full
mainline clone, which downloads about 2,75 GByte as of early 2024. The*
:ref:`reference section describes two alternatives <sources_bisref>` *:
one downloads less than 500 MByte, the other works better with unreliable
internet connections.*
Execute the following command to retrieve a fresh mainline codebase while
preparing things to add stable/longterm branches later::
git clone -o mainline --no-checkout \
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/linux/
cd ~/linux/
git remote add -t master stable \
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[:ref:`details<sources_bisref>`]
.. _oldconfig_bissbs:
* Start preparing a kernel build configuration (the '.config' file).
Before doing so, ensure you are still running the 'working' kernel an earlier
step told you to boot; if you are unsure, check the current kernel release
identifier using ``uname -r``.
Afterwards check out the source code for the version earlier established as
'good' (in this example this is assumed to be 6.0) and create a .config file::
git checkout --detach v6.0
make olddefconfig
The second command will try to locate the build configuration file for the
running kernel and then adjust it for the needs of the kernel sources you
checked out. While doing so, it will print a few lines you need to check.
Look out for a line starting with '# using defaults found in'. It should be
followed by a path to a file in '/boot/' that contains the release identifier
of your currently working kernel. If the line instead continues with something
like 'arch/x86/configs/x86_64_defconfig', then the build infra failed to find
the .config file for your running kernel -- in which case you have to put one
there manually, as explained in the reference section.
In case you can not find such a line, look for one containing '# configuration
written to .config'. If that's the case you have a stale build configuration
lying around. Unless you intend to use it, delete it; afterwards run
'make olddefconfig' again and check if it now picked up the right config file
as base.
[:ref:`details<oldconfig_bisref>`]
.. _localmodconfig_bissbs:
* Disable any kernel modules apparently superfluous for your setup. This is
optional, but especially wise for bisections, as it speeds up the build
process enormously -- at least unless the .config file picked up in the
previous step was already tailored to your and your hardware needs, in which
case you should skip this step.
To prepare the trimming, connect external hardware you occasionally use (USB
keys, tokens, ...), quickly start a VM, and bring up VPNs. And if you rebooted
since you started that guide, ensure that you tried using the feature causing
trouble since you started the system. Only then trim your .config::
yes '' | make localmodconfig
There is a catch to this, as the 'apparently' in initial sentence of this step
and the preparation instructions already hinted at:
The 'localmodconfig' target easily disables kernel modules for features only
used occasionally -- like modules for external peripherals not yet connected
since booting, virtualization software not yet utilized, VPN tunnels, and a
few other things. That's because some tasks rely on kernel modules Linux only
loads when you execute tasks like the aforementioned ones for the first time.
This drawback of localmodconfig is nothing you should lose sleep over, but
something to keep in mind: if something is misbehaving with the kernels built
during this guide, this is most likely the reason. You can reduce or nearly
eliminate the risk with tricks outlined in the reference section; but when
building a kernel just for quick testing purposes this is usually not worth
spending much effort on, as long as it boots and allows to properly test the
feature that causes trouble.
[:ref:`details<localmodconfig_bisref>`]
.. _tagging_bissbs:
* Ensure all the kernels you will build are clearly identifiable using a special
tag and a unique version number::
./scripts/config --set-str CONFIG_LOCALVERSION '-local'
./scripts/config -e CONFIG_LOCALVERSION_AUTO
[:ref:`details<tagging_bisref>`]
.. _debugsymbols_bissbs:
* Decide how to handle debug symbols.
In the context of this document it is often wise to enable them, as there is a
decent chance you will need to decode a stack trace from a 'panic', 'Oops',
'warning', or 'BUG'::
./scripts/config -d DEBUG_INFO_NONE -e KALLSYMS_ALL -e DEBUG_KERNEL \
-e DEBUG_INFO -e DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT -e KALLSYMS
But if you are extremely short on storage space, you might want to disable
debug symbols instead::
./scripts/config -d DEBUG_INFO -d DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT \
-d DEBUG_INFO_DWARF4 -d DEBUG_INFO_DWARF5 -e CONFIG_DEBUG_INFO_NONE
[:ref:`details<debugsymbols_bisref>`]
.. _configmods_bissbs:
* Check if you may want or need to adjust some other kernel configuration
options:
* Are you running Debian? Then you want to avoid known problems by performing
additional adjustments explained in the reference section.
[:ref:`details<configmods_distros_bisref>`].
* If you want to influence other aspects of the configuration, do so now using
your preferred tool. Note, to use make targets like 'menuconfig' or
'nconfig', you will need to install the development files of ncurses; for
'xconfig' you likewise need the Qt5 or Qt6 headers.
[:ref:`details<configmods_individual_bisref>`].
.. _saveconfig_bissbs:
* Reprocess the .config after the latest adjustments and store it in a safe
place::
make olddefconfig
cp .config ~/kernel-config-working
[:ref:`details<saveconfig_bisref>`]
.. _introlatestcheck_bissbs:
Segment 1: try to reproduce the problem with the latest codebase
----------------------------------------------------------------
The following steps verify if the problem occurs with the code currently
supported by developers. In case you face a regression, it also checks that the
problem is not caused by some .config change, as reporting the issue then would
be a waste of time. [:ref:`details<introlatestcheck_bisref>`]
.. _checkoutmaster_bissbs:
* Check out the latest Linux codebase::
cd ~/linux/
git checkout --force --detach mainline/master
[:ref:`details<checkoutmaster_bisref>`]
.. _build_bissbs:
* Build the image and the modules of your first kernel using the config file you
prepared::
cp ~/kernel-config-working .config
make olddefconfig
make -j $(nproc --all)
If you want your kernel packaged up as deb, rpm, or tar file, see the
reference section for alternatives, which obviously will require other
steps to install as well.
[:ref:`details<build_bisref>`]
.. _install_bissbs:
* Install your newly built kernel.
Before doing so, consider checking if there is still enough room for it::
df -h /boot/ /lib/modules/
150 MByte in /boot/ and 200 in /lib/modules/ usually suffice. Those are rough
estimates assuming the worst case. How much your kernels actually require will
be determined later.
Now install the kernel, which will be saved in parallel to the kernels from
your Linux distribution::
command -v installkernel && sudo make modules_install install
On many commodity Linux distributions this will take care of everything
required to boot your kernel. You might want to ensure that's the case by
checking if your boot loader's configuration was updated; furthermore ensure
an initramfs (also known as initrd) exists, which on many distributions can be
achieved by running ``ls -l /boot/init*$(make -s kernelrelease)*``. Those
steps are recommended, as there are quite a few Linux distribution where above
command is insufficient:
* On Arch Linux, its derivatives, many immutable Linux distributions, and a
few others the above command does nothing at, as they lack 'installkernel'
executable.
* Some distributions install the kernel, but don't add an entry for your
kernel in your boot loader's configuration -- the kernel thus won't show up
in the boot menu.
* Some distributions add a boot loader menu entry, but don't create an
initramfs on installation -- in that case your kernel most likely will be
unable to mount the root partition during bootup.
If any of that applies to you, see the reference section for further guidance.
Once you figured out what to do, consider writing down the necessary
installation steps: if you will build more kernels as described in
segment 2 and 3, you will have to execute these commands every time that
``command -v installkernel [...]`` comes up again.
[:ref:`details<install_bisref>`]
.. _storagespace_bissbs:
* In case you plan to follow this guide further, check how much storage space
the kernel, its modules, and other related files like the initramfs consume::
du -ch /boot/*$(make -s kernelrelease)* | tail -n 1
du -sh /lib/modules/$(make -s kernelrelease)/
Write down or remember those two values for later: they enable you to prevent
running out of disk space accidentally during a bisection.
[:ref:`details<storagespace_bisref>`]
.. _kernelrelease_bissbs:
* Show and store the kernelrelease identifier of the kernel you just built::
make -s kernelrelease | tee -a ~/kernels-built
Remember the identifier momentarily, as it will help you pick the right kernel
from the boot menu upon restarting.
.. _recheckbroken_bissbs:
* Reboot into the kernel you just built and check if the feature that is
expected to be broken really is.
Start by making sure the kernel you booted is the one you just built. When
unsure, check if the output of these commands show the exact same release
identifier::
tail -n 1 ~/kernels-built
uname -r
Now verify if the feature that causes trouble works with your newly built
kernel. If things work while investigating a regression, check the reference
section for further details.
[:ref:`details<recheckbroken_bisref>`]
.. _recheckstablebroken_bissbs:
* Are you facing a problem within a stable/longterm release, but failed to
reproduce it with the mainline kernel you just built? Then check if the latest
codebase for the particular series might already fix the problem. To do so,
add the stable series Git branch for your 'good' kernel (again, this here is
assumed to be 6.0) and check out the latest version::
cd ~/linux/
git remote set-branches --add stable linux-6.0.y
git fetch stable
git checkout --force --detach linux-6.0.y
Now use the checked out code to build and install another kernel using the
commands the earlier steps already described in more detail::
cp ~/kernel-config-working .config
make olddefconfig
make -j $(nproc --all)
# * Check if the free space suffices holding another kernel:
df -h /boot/ /lib/modules/
command -v installkernel && sudo make modules_install install
make -s kernelrelease | tee -a ~/kernels-built
reboot
Now verify if you booted the kernel you intended to start, to then check if
everything works fine with this kernel::
tail -n 1 ~/kernels-built
uname -r
[:ref:`details<recheckstablebroken_bisref>`]
Do you follow this guide to verify if a problem is present in the code
currently supported by Linux kernel developers? Then you are done at this
point. If you later want to remove the kernel you just built, check out
:ref:`Supplementary tasks: cleanup during and after following this guide.<introclosure_bissbs>`.
In case you face a regression, move on and execute at least the next segment
as well.
.. _introworkingcheck_bissbs:
Segment 2: check if the kernels you build work fine
---------------------------------------------------
In case of a regression, you now want to ensure the trimmed configuration file
you created earlier works as expected; a bisection with the .config file
otherwise would be a waste of time. [:ref:`details<introworkingcheck_bisref>`]
.. _recheckworking_bissbs:
* Build your own variant of the 'working' kernel and check if the feature that
regressed works as expected with it.
Start by checking out the sources for the version earlier established as
'good' (once again assumed to be 6.0 here)::
cd ~/linux/
git checkout --detach v6.0
Now use the checked out code to configure, build, and install another kernel
using the commands the previous subsection explained in more detail::
cp ~/kernel-config-working .config
make olddefconfig
make -j $(nproc --all)
# * Check if the free space suffices holding another kernel:
df -h /boot/ /lib/modules/
command -v installkernel && sudo make modules_install install
make -s kernelrelease | tee -a ~/kernels-built
reboot
When the system booted, you may want to verify once again that the
kernel you started is the one you just built:
tail -n 1 ~/kernels-built
uname -r
Now check if this kernel works as expected; if not, consult the reference
section for further instructions.
[:ref:`details<recheckworking_bisref>`]
.. _introbisect_bissbs:
Segment 3: perform the bisection and validate the result
--------------------------------------------------------
With all the preparations and precaution builds taken care of, you are now ready
to begin the bisection. This will make you build quite a few kernels -- usually
about 15 in case you encountered a regression when updating to a newer series
(say from 6.0.11 to 6.1.3). But do not worry, due to the trimmed build
configuration created earlier this works a lot faster than many people assume:
overall on average it will often just take about 10 to 15 minutes to compile
each kernel on commodity x86 machines.
* In case your 'bad' version is a stable/longterm release (say v6.1.5), add its
stable branch, unless you already did so earlier::
cd ~/linux/
git remote set-branches --add stable linux-6.1.y
git fetch stable
.. _bisectstart_bissbs:
* Start the bisection and tell Git about the versions earlier established as
'good' (6.0 in the following example command) and 'bad' (6.1.5)::
cd ~/linux/
git bisect start
git bisect good v6.0
git bisect bad v6.1.5
[:ref:`details<bisectstart_bisref>`]
.. _bisectbuild_bissbs:
* Now use the code Git checked out to build, install, and boot a kernel using
the commands introduced earlier::
cp ~/kernel-config-working .config
make olddefconfig
make -j $(nproc --all)
# * Check if the free space suffices holding another kernel:
df -h /boot/ /lib/modules/
command -v installkernel && sudo make modules_install install
make -s kernelrelease | tee -a ~/kernels-built
reboot
If compilation fails for some reason, run ``git bisect skip`` and restart
executing the stack of commands from the beginning.
In case you skipped the "test latest codebase" step in the guide, check its
description as for why the 'df [...]' and 'make -s kernelrelease [...]'
commands are here.
Important note: the latter command from this point on will print release
identifiers that might look odd or wrong to you -- which they are not, as it's
totally normal to see release identifiers like '6.0-rc1-local-gcafec0cacaca0'
if you bisect between versions 6.1 and 6.2 for example.
[:ref:`details<bisectbuild_bisref>`]
.. _bisecttest_bissbs:
* Now check if the feature that regressed works in the kernel you just built.
You again might want to start by making sure the kernel you booted is the one
you just built::
cd ~/linux/
tail -n 1 ~/kernels-built
uname -r
Now verify if the feature that regressed works at this kernel bisection point.
If it does, run this::
git bisect good
If it does not, run this::
git bisect bad
Be sure about what you tell Git, as getting this wrong just once will send the
rest of the bisection totally off course.
While the bisection is ongoing, Git will use the information you provided to
find and check out another bisection point for you to test. While doing so, it
will print something like 'Bisecting: 675 revisions left to test after this
(roughly 10 steps)' to indicate how many further changes it expects to be
tested. Now build and install another kernel using the instructions from the
previous step; afterwards follow the instructions in this step again.
Repeat this again and again until you finish the bisection -- that's the case
when Git after tagging a change as 'good' or 'bad' prints something like
'cafecaca0c0dacafecaca0c0dacafecaca0c0da is the first bad commit'; right
afterwards it will show some details about the culprit including the patch
description of the change. The latter might fill your terminal screen, so you
might need to scroll up to see the message mentioning the culprit;
alternatively, run ``git bisect log > ~/bisection-log``.
[:ref:`details<bisecttest_bisref>`]
.. _bisectlog_bissbs:
* Store Git's bisection log and the current .config file in a safe place before
telling Git to reset the sources to the state before the bisection::
cd ~/linux/
git bisect log > ~/bisection-log
cp .config ~/bisection-config-culprit
git bisect reset
[:ref:`details<bisectlog_bisref>`]
.. _revert_bissbs:
* Try reverting the culprit on top of latest mainline to see if this fixes your
regression.
This is optional, as it might be impossible or hard to realize. The former is
the case, if the bisection determined a merge commit as the culprit; the
latter happens if other changes depend on the culprit. But if the revert
succeeds, it is worth building another kernel, as it validates the result of
a bisection, which can easily deroute; it furthermore will let kernel
developers know, if they can resolve the regression with a quick revert.
Begin by checking out the latest codebase depending on the range you bisected:
* Did you face a regression within a stable/longterm series (say between
6.0.11 and 6.0.13) that does not happen in mainline? Then check out the
latest codebase for the affected series like this::
git fetch stable
git checkout --force --detach linux-6.0.y
* In all other cases check out latest mainline::
git fetch mainline
git checkout --force --detach mainline/master
If you bisected a regression within a stable/longterm series that also
happens in mainline, there is one more thing to do: look up the mainline
commit-id. To do so, use a command like ``git show abcdcafecabcd`` to
view the patch description of the culprit. There will be a line near
the top which looks like 'commit cafec0cacaca0 upstream.' or
'Upstream commit cafec0cacaca0'; use that commit-id in the next command
and not the one the bisection blamed.
Now try reverting the culprit by specifying its commit id::
git revert --no-edit cafec0cacaca0
If that fails, give up trying and move on to the next step. But if it works,
build a kernel again using the familiar command sequence::
cp ~/kernel-config-working .config
make olddefconfig &&
make -j $(nproc --all) &&
# * Check if the free space suffices holding another kernel:
df -h /boot/ /lib/modules/
command -v installkernel && sudo make modules_install install
Make -s kernelrelease | tee -a ~/kernels-built
reboot
Now check one last time if the feature that made you perform a bisection work
with that kernel.
[:ref:`details<revert_bisref>`]
.. _introclosure_bissbs:
Supplementary tasks: cleanup during and after the bisection
-----------------------------------------------------------
During and after following this guide you might want or need to remove some of
the kernels you installed: the boot menu otherwise will become confusing or
space might run out.
.. _makeroom_bissbs:
* To remove one of the kernels you installed, look up its 'kernelrelease'
identifier. This guide stores them in '~/kernels-built', but the following
command will print them as well::
ls -ltr /lib/modules/*-local*
You in most situations want to remove the oldest kernels built during the
actual bisection (e.g. segment 3 of this guide). The two ones you created
beforehand (e.g. to test the latest codebase and the version considered
'good') might become handy to verify something later -- thus better keep them
around, unless you are really short on storage space.
To remove the modules of a kernel with the kernelrelease identifier
'*6.0-rc1-local-gcafec0cacaca0*', start by removing the directory holding its
modules::
sudo rm -rf /lib/modules/6.0-rc1-local-gcafec0cacaca0
Afterwards try the following command::
sudo kernel-install -v remove 6.0-rc1-local-gcafec0cacaca0
On quite a few distributions this will delete all other kernel files installed
while also removing the kernel's entry from the boot menu. But on some
distributions kernel-install does not exist or leaves boot-loader entries or
kernel image and related files behind; in that case remove them as described
in the reference section.
[:ref:`details<makeroom_bisref>`]
.. _finishingtouch_bissbs:
* Once you have finished the bisection, do not immediately remove anything you
set up, as you might need a few things again. What is safe to remove depends
on the outcome of the bisection:
* Could you initially reproduce the regression with the latest codebase and
after the bisection were able to fix the problem by reverting the culprit on
top of the latest codebase? Then you want to keep those two kernels around
for a while, but safely remove all others with a '-local' in the release
identifier.
* Did the bisection end on a merge-commit or seems questionable for other
reasons? Then you want to keep as many kernels as possible around for a few
days: it's pretty likely that you will be asked to recheck something.
* In other cases it likely is a good idea to keep the following kernels around
for some time: the one built from the latest codebase, the one created from
the version considered 'good', and the last three or four you compiled
during the actual bisection process.
[:ref:`details<finishingtouch_bisref>`]
.. _submit_improvements:
This concludes the step-by-step guide.
Did you run into trouble following any of the above steps not cleared up by the
reference section below? Did you spot errors? Or do you have ideas how to
improve the guide? Then please take a moment and let the maintainer of this
document know by email (Thorsten Leemhuis <linux@leemhuis.info>), ideally while
CCing the Linux docs mailing list (linux-doc@vger.kernel.org). Such feedback is
vital to improve this document further, which is in everybody's interest, as it
will enable more people to master the task described here -- and hopefully also
improve similar guides inspired by this one.
Reference section for the step-by-step guide
============================================
This section holds additional information for almost all the items in the above
step-by-step guide.
.. _backup_bisref:
Prepare for emergencies
-----------------------
*Create a fresh backup and put system repair and restore tools at hand.*
[:ref:`... <backup_bissbs>`]
Remember, you are dealing with computers, which sometimes do unexpected things
-- especially if you fiddle with crucial parts like the kernel of an operating
system. That's what you are about to do in this process. Hence, better prepare
for something going sideways, even if that should not happen.
[:ref:`back to step-by-step guide <backup_bissbs>`]
.. _vanilla_bisref:
Remove anything related to externally maintained kernel modules
---------------------------------------------------------------
*Remove all software that depends on externally developed kernel drivers or
builds them automatically.* [:ref:`...<vanilla_bissbs>`]
Externally developed kernel modules can easily cause trouble during a bisection.
But there is a more important reason why this guide contains this step: most
kernel developers will not care about reports about regressions occurring with
kernels that utilize such modules. That's because such kernels are not
considered 'vanilla' anymore, as Documentation/admin-guide/reporting-issues.rst
explains in more detail.
[:ref:`back to step-by-step guide <vanilla_bissbs>`]
.. _secureboot_bisref:
Deal with techniques like Secure Boot
-------------------------------------
*On platforms with 'Secure Boot' or similar techniques, prepare everything to
ensure the system will permit your self-compiled kernel to boot later.*
[:ref:`... <secureboot_bissbs>`]
Many modern systems allow only certain operating systems to start; that's why
they reject booting self-compiled kernels by default.
You ideally deal with this by making your platform trust your self-built kernels
with the help of a certificate. How to do that is not described
here, as it requires various steps that would take the text too far away from
its purpose; 'Documentation/admin-guide/module-signing.rst' and various web
sides already explain everything needed in more detail.
Temporarily disabling solutions like Secure Boot is another way to make your own
Linux boot. On commodity x86 systems it is possible to do this in the BIOS Setup
utility; the required steps vary a lot between machines and therefore cannot be
described here.
On mainstream x86 Linux distributions there is a third and universal option:
disable all Secure Boot restrictions for your Linux environment. You can
initiate this process by running ``mokutil --disable-validation``; this will
tell you to create a one-time password, which is safe to write down. Now
restart; right after your BIOS performed all self-tests the bootloader Shim will
show a blue box with a message 'Press any key to perform MOK management'. Hit
some key before the countdown exposes, which will open a menu. Choose 'Change
Secure Boot state'. Shim's 'MokManager' will now ask you to enter three
randomly chosen characters from the one-time password specified earlier. Once
you provided them, confirm you really want to disable the validation.
Afterwards, permit MokManager to reboot the machine.
[:ref:`back to step-by-step guide <secureboot_bissbs>`]
.. _bootworking_bisref:
Boot the last kernel that was working
-------------------------------------
*Boot into the last working kernel and briefly recheck if the feature that
regressed really works.* [:ref:`...<bootworking_bissbs>`]
This will make later steps that cover creating and trimming the configuration do
the right thing.
[:ref:`back to step-by-step guide <bootworking_bissbs>`]
.. _diskspace_bisref:
Space requirements
------------------
*Ensure to have enough free space for building Linux.*
[:ref:`... <diskspace_bissbs>`]
The numbers mentioned are rough estimates with a big extra charge to be on the
safe side, so often you will need less.
If you have space constraints, be sure to hay attention to the :ref:`step about
debug symbols' <debugsymbols_bissbs>` and its :ref:`accompanying reference
section' <debugsymbols_bisref>`, as disabling then will reduce the consumed disk
space by quite a few gigabytes.
[:ref:`back to step-by-step guide <diskspace_bissbs>`]
.. _rangecheck_bisref:
Bisection range
---------------
*Determine the kernel versions considered 'good' and 'bad' throughout this
guide.* [:ref:`...<rangecheck_bissbs>`]
Establishing the range of commits to be checked is mostly straightforward,
except when a regression occurred when switching from a release of one stable
series to a release of a later series (e.g. from 6.0.11 to 6.1.4). In that case
Git will need some hand holding, as there is no straight line of descent.
That's because with the release of 6.0 mainline carried on to 6.1 while the
stable series 6.0.y branched to the side. It's therefore theoretically possible
that the issue you face with 6.1.4 only worked in 6.0.11, as it was fixed by a
commit that went into one of the 6.0.y releases, but never hit mainline or the
6.1.y series. Thankfully that normally should not happen due to the way the
stable/longterm maintainers maintain the code. It's thus pretty safe to assume
6.0 as a 'good' kernel. That assumption will be tested anyway, as that kernel
will be built and tested in the segment '2' of this guide; Git would force you
to do this as well, if you tried bisecting between 6.0.11 and 6.1.13.
[:ref:`back to step-by-step guide <rangecheck_bissbs>`]
.. _buildrequires_bisref:
Install build requirements
--------------------------
*Install all software required to build a Linux kernel.*
[:ref:`...<buildrequires_bissbs>`]
The kernel is pretty stand-alone, but besides tools like the compiler you will
sometimes need a few libraries to build one. How to install everything needed
depends on your Linux distribution and the configuration of the kernel you are
about to build.
Here are a few examples what you typically need on some mainstream
distributions:
* Arch Linux and derivatives::
sudo pacman --needed -S bc binutils bison flex gcc git kmod libelf openssl \
pahole perl zlib ncurses qt6-base
* Debian, Ubuntu, and derivatives::
sudo apt install bc binutils bison dwarves flex gcc git kmod libelf-dev \
libssl-dev make openssl pahole perl-base pkg-config zlib1g-dev \
libncurses-dev qt6-base-dev g++
* Fedora and derivatives::
sudo dnf install binutils \
/usr/bin/{bc,bison,flex,gcc,git,openssl,make,perl,pahole,rpmbuild} \
/usr/include/{libelf.h,openssl/pkcs7.h,zlib.h,ncurses.h,qt6/QtGui/QAction}
* openSUSE and derivatives::
sudo zypper install bc binutils bison dwarves flex gcc git \
kernel-install-tools libelf-devel make modutils openssl openssl-devel \
perl-base zlib-devel rpm-build ncurses-devel qt6-base-devel
These commands install a few packages that are often, but not always needed. You
for example might want to skip installing the development headers for ncurses,
which you will only need in case you later might want to adjust the kernel build
configuration using make the targets 'menuconfig' or 'nconfig'; likewise omit
the headers of Qt6 is you do not plan to adjust the .config using 'xconfig'.
You furthermore might need additional libraries and their development headers
for tasks not covered in this guide -- for example when building utilities from
the kernel's tools/ directory.
[:ref:`back to step-by-step guide <buildrequires_bissbs>`]
.. _sources_bisref:
Download the sources using Git
------------------------------
*Retrieve the Linux mainline sources.*
[:ref:`...<sources_bissbs>`]
The step-by-step guide outlines how to download the Linux sources using a full
Git clone of Linus' mainline repository. There is nothing more to say about
that -- but there are two alternatives ways to retrieve the sources that might
work better for you:
* If you have an unreliable internet connection, consider
:ref:`using a 'Git bundle'<sources_bundle_bisref>`.
* If downloading the complete repository would take too long or requires too
much storage space, consider :ref:`using a 'shallow
clone'<sources_shallow_bisref>`.
.. _sources_bundle_bisref:
Downloading Linux mainline sources using a bundle
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use the following commands to retrieve the Linux mainline sources using a
bundle::
wget -c \
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/clone.bundle
git clone --no-checkout clone.bundle ~/linux/
cd ~/linux/
git remote remove origin
git remote add mainline \
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch mainline
git remote add -t master stable \
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
In case the 'wget' command fails, just re-execute it, it will pick up where
it left off.
[:ref:`back to step-by-step guide <sources_bissbs>`]
[:ref:`back to section intro <sources_bisref>`]
.. _sources_shallow_bisref:
Downloading Linux mainline sources using a shallow clone
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
First, execute the following command to retrieve the latest mainline codebase::
git clone -o mainline --no-checkout --depth 1 -b master \
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/linux/
cd ~/linux/
git remote add -t master stable \
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Now deepen your clone's history to the second predecessor of the mainline
release of your 'good' version. In case the latter are 6.0 or 6.0.11, 5.19 would
be the first predecessor and 5.18 the second -- hence deepen the history up to
that version::
git fetch --shallow-exclude=v5.18 mainline
Afterwards add the stable Git repository as remote and all required stable
branches as explained in the step-by-step guide.
Note, shallow clones have a few peculiar characteristics:
* For bisections the history needs to be deepened a few mainline versions
farther than it seems necessary, as explained above already. That's because
Git otherwise will be unable to revert or describe most of the commits within
a range (say v6.1..v6.2), as they are internally based on earlier kernels
releases (like v6.0-rc2 or 5.19-rc3).
* This document in most places uses ``git fetch`` with ``--shallow-exclude=``
to specify the earliest version you care about (or to be precise: its git
tag). You alternatively can use the parameter ``--shallow-since=`` to specify
an absolute (say ``'2023-07-15'``) or relative (``'12 months'``) date to
define the depth of the history you want to download. When using them while
bisecting mainline, ensure to deepen the history to at least 7 months before
the release of the mainline release your 'good' kernel is based on.
* Be warned, when deepening your clone you might encounter an error like
'fatal: error in object: unshallow cafecaca0c0dacafecaca0c0dacafecaca0c0da'.
In that case run ``git repack -d`` and try again.
[:ref:`back to step-by-step guide <sources_bissbs>`]
[:ref:`back to section intro <sources_bisref>`]
.. _oldconfig_bisref:
Start defining the build configuration for your kernel
------------------------------------------------------
*Start preparing a kernel build configuration (the '.config' file).*
[:ref:`... <oldconfig_bissbs>`]
*Note, this is the first of multiple steps in this guide that create or modify
build artifacts. The commands used in this guide store them right in the source
tree to keep things simple. In case you prefer storing the build artifacts
separately, create a directory like '~/linux-builddir/' and add the parameter
``O=~/linux-builddir/`` to all make calls used throughout this guide. You will
have to point other commands there as well -- among them the ``./scripts/config
[...]`` commands, which will require ``--file ~/linux-builddir/.config`` to
locate the right build configuration.*
Two things can easily go wrong when creating a .config file as advised:
* The oldconfig target will use a .config file from your build directory, if
one is already present there (e.g. '~/linux/.config'). That's totally fine if
that's what you intend (see next step), but in all other cases you want to
delete it. This for example is important in case you followed this guide
further, but due to problems come back here to redo the configuration from
scratch.
* Sometimes olddefconfig is unable to locate the .config file for your running
kernel and will use defaults, as briefly outlined in the guide. In that case
check if your distribution ships the configuration somewhere and manually put
it in the right place (e.g. '~/linux/.config') if it does. On distributions
where /proc/config.gz exists this can be achieved using this command::
zcat /proc/config.gz > .config
Once you put it there, run ``make olddefconfig`` again to adjust it to the
needs of the kernel about to be built.
Note, the olddefconfig target will set any undefined build options to their
default value. If you prefer to set such configuration options manually, use
``make oldconfig`` instead. Then for each undefined configuration option you
will be asked how to proceed; in case you are unsure what to answer, simply hit
'enter' to apply the default value. Note though that for bisections you normally
want to go with the defaults, as you otherwise might enable a new feature that
causes a problem looking like regressions (for example due to security
restrictions).
Occasionally odd things happen when trying to use a config file prepared for one
kernel (say 6.1) on an older mainline release -- especially if it is much older
(say v5.15). That's one of the reasons why the previous step in the guide told
you to boot the kernel where everything works. If you manually add a .config
file you thus want to ensure it's from the working kernel and not from a one
that shows the regression.
In case you want to build kernels for another machine, locate its kernel build
configuration; usually ``ls /boot/config-$(uname -r)`` will print its name. Copy
that file to the build machine and store it as ~/linux/.config; afterwards run
``make olddefconfig`` to adjust it.
[:ref:`back to step-by-step guide <oldconfig_bissbs>`]
.. _localmodconfig_bisref:
Trim the build configuration for your kernel
--------------------------------------------
*Disable any kernel modules apparently superfluous for your setup.*
[:ref:`... <localmodconfig_bissbs>`]
As explained briefly in the step-by-step guide already: with localmodconfig it
can easily happen that your self-built kernels will lack modules for tasks you
did not perform at least once before utilizing this make target. That happens
when a task requires kernel modules which are only autoloaded when you execute
it for the first time. So when you never performed that task since starting your
kernel the modules will not have been loaded -- and from localmodonfig's point
of view look superfluous, which thus disables them to reduce the amount of code
to be compiled.
You can try to avoid this by performing typical tasks that often will autoload
additional kernel modules: start a VM, establish VPN connections, loop-mount a
CD/DVD ISO, mount network shares (CIFS, NFS, ...), and connect all external
devices (2FA keys, headsets, webcams, ...) as well as storage devices with file
systems you otherwise do not utilize (btrfs, ext4, FAT, NTFS, XFS, ...). But it
is hard to think of everything that might be needed -- even kernel developers
often forget one thing or another at this point.
Do not let that risk bother you, especially when compiling a kernel only for
testing purposes: everything typically crucial will be there. And if you forget
something important you can turn on a missing feature manually later and quickly
run the commands again to compile and install a kernel that has everything you
need.
But if you plan to build and use self-built kernels regularly, you might want to
reduce the risk by recording which modules your system loads over the course of
a few weeks. You can automate this with `modprobed-db
<https://github.com/graysky2/modprobed-db>`_. Afterwards use ``LSMOD=<path>`` to
point localmodconfig to the list of modules modprobed-db noticed being used::
yes '' | make LSMOD='${HOME}'/.config/modprobed.db localmodconfig
That parameter also allows you to build trimmed kernels for another machine in
case you copied a suitable .config over to use as base (see previous step). Just
run ``lsmod > lsmod_foo-machine`` on that system and copy the generated file to
your build's host home directory. Then run these commands instead of the one the
step-by-step guide mentions::
yes '' | make LSMOD=~/lsmod_foo-machine localmodconfig
[:ref:`back to step-by-step guide <localmodconfig_bissbs>`]
.. _tagging_bisref:
Tag the kernels about to be build
---------------------------------
*Ensure all the kernels you will build are clearly identifiable using a
special tag and a unique version identifier.* [:ref:`... <tagging_bissbs>`]
This allows you to differentiate your distribution's kernels from those created
during this process, as the file or directories for the latter will contain
'-local' in the name; it also helps picking the right entry in the boot menu and
not lose track of you kernels, as their version numbers will look slightly
confusing during the bisection.
[:ref:`back to step-by-step guide <tagging_bissbs>`]
.. _debugsymbols_bisref:
Decide to enable or disable debug symbols
-----------------------------------------
*Decide how to handle debug symbols.* [:ref:`... <debugsymbols_bissbs>`]
Having debug symbols available can be important when your kernel throws a
'panic', 'Oops', 'warning', or 'BUG' later when running, as then you will be
able to find the exact place where the problem occurred in the code. But
collecting and embedding the needed debug information takes time and consumes
quite a bit of space: in late 2022 the build artifacts for a typical x86 kernel
trimmed with localmodconfig consumed around 5 Gigabyte of space with debug
symbols, but less than 1 when they were disabled. The resulting kernel image and
modules are bigger as well, which increases storage requirements for /boot/ and
load times.
In case you want a small kernel and are unlikely to decode a stack trace later,
you thus might want to disable debug symbols to avoid those downsides. If it
later turns out that you need them, just enable them as shown and rebuild the
kernel.
You on the other hand definitely want to enable them for this process, if there
is a decent chance that you need to decode a stack trace later. The section
'Decode failure messages' in Documentation/admin-guide/reporting-issues.rst
explains this process in more detail.
[:ref:`back to step-by-step guide <debugsymbols_bissbs>`]
.. _configmods_bisref:
Adjust build configuration
--------------------------
*Check if you may want or need to adjust some other kernel configuration
options:*
Depending on your needs you at this point might want or have to adjust some
kernel configuration options.
.. _configmods_distros_bisref:
Distro specific adjustments
~~~~~~~~~~~~~~~~~~~~~~~~~~~
*Are you running* [:ref:`... <configmods_bissbs>`]
The following sections help you to avoid build problems that are known to occur
when following this guide on a few commodity distributions.
**Debian:**
* Remove a stale reference to a certificate file that would cause your build to
fail::
./scripts/config --set-str SYSTEM_TRUSTED_KEYS ''
Alternatively, download the needed certificate and make that configuration
option point to it, as `the Debian handbook explains in more detail
<https://debian-handbook.info/browse/stable/sect.kernel-compilation.html>`_
-- or generate your own, as explained in
Documentation/admin-guide/module-signing.rst.
[:ref:`back to step-by-step guide <configmods_bissbs>`]
.. _configmods_individual_bisref:
Individual adjustments
~~~~~~~~~~~~~~~~~~~~~~
*If you want to influence the other aspects of the configuration, do so
now.* [:ref:`... <configmods_bissbs>`]
You at this point can use a command like ``make menuconfig`` to enable or
disable certain features using a text-based user interface; to use a graphical
configuration utility, call the make target ``xconfig`` or ``gconfig`` instead.
All of them require development libraries from toolkits they are based on
(ncurses, Qt5, Gtk2); an error message will tell you if something required is
missing.
[:ref:`back to step-by-step guide <configmods_bissbs>`]
.. _saveconfig_bisref:
Put the .config file aside
--------------------------
*Reprocess the .config after the latest changes and store it in a safe place.*
[:ref:`... <saveconfig_bissbs>`]
Put the .config you prepared aside, as you want to copy it back to the build
directory every time during this guide before you start building another
kernel. That's because going back and forth between different versions can alter
.config files in odd ways; those occasionally cause side effects that could
confuse testing or in some cases render the result of your bisection
meaningless.
[:ref:`back to step-by-step guide <saveconfig_bissbs>`]
.. _introlatestcheck_bisref:
Try to reproduce the regression
-----------------------------------------
*Verify the regression is not caused by some .config change and check if it
still occurs with the latest codebase.* [:ref:`... <introlatestcheck_bissbs>`]
For some readers it might seem unnecessary to check the latest codebase at this
point, especially if you did that already with a kernel prepared by your
distributor or face a regression within a stable/longterm series. But it's
highly recommended for these reasons:
* You will run into any problems caused by your setup before you actually begin
a bisection. That will make it a lot easier to differentiate between 'this
most likely is some problem in my setup' and 'this change needs to be skipped
during the bisection, as the kernel sources at that stage contain an unrelated
problem that causes building or booting to fail'.
* These steps will rule out if your problem is caused by some change in the
build configuration between the 'working' and the 'broken' kernel. This for
example can happen when your distributor enabled an additional security
feature in the newer kernel which was disabled or not yet supported by the
older kernel. That security feature might get into the way of something you
do -- in which case your problem from the perspective of the Linux kernel
upstream developers is not a regression, as
Documentation/admin-guide/reporting-regressions.rst explains in more detail.
You thus would waste your time if you'd try to bisect this.
* If the cause for your regression was already fixed in the latest mainline
codebase, you'd perform the bisection for nothing. This holds true for a
regression you encountered with a stable/longterm release as well, as they are
often caused by problems in mainline changes that were backported -- in which
case the problem will have to be fixed in mainline first. Maybe it already was
fixed there and the fix is already in the process of being backported.
* For regressions within a stable/longterm series it's furthermore crucial to
know if the issue is specific to that series or also happens in the mainline
kernel, as the report needs to be sent to different people:
* Regressions specific to a stable/longterm series are the stable team's
responsibility; mainline Linux developers might or might not care.
* Regressions also happening in mainline are something the regular Linux
developers and maintainers have to handle; the stable team does not care
and does not need to be involved in the report, they just should be told
to backport the fix once it's ready.
Your report might be ignored if you send it to the wrong party -- and even
when you get a reply there is a decent chance that developers tell you to
evaluate which of the two cases it is before they take a closer look.
[:ref:`back to step-by-step guide <introlatestcheck_bissbs>`]
.. _checkoutmaster_bisref:
Checkout the latest Linux codebase
----------------------------------
*Checkout the latest Linux codebase.*
[:ref:`... <introlatestcheck_bissbs>`]
In case you later want to recheck if an ever newer codebase might fix the
problem, remember to run that ``git fetch --shallow-exclude [...]`` command
again mentioned earlier to update your local Git repository.
[:ref:`back to step-by-step guide <introlatestcheck_bissbs>`]
.. _build_bisref:
Build your kernel
-----------------
*Build the image and the modules of your first kernel using the config file
you prepared.* [:ref:`... <build_bissbs>`]
A lot can go wrong at this stage, but the instructions below will help you help
yourself. Another subsection explains how to directly package your kernel up as
deb, rpm or tar file.
Dealing with build errors
~~~~~~~~~~~~~~~~~~~~~~~~~
When a build error occurs, it might be caused by some aspect of your machine's
setup that often can be fixed quickly; other times though the problem lies in
the code and can only be fixed by a developer. A close examination of the
failure messages coupled with some research on the internet will often tell you
which of the two it is. To perform such a investigation, restart the build
process like this::
make V=1
The ``V=1`` activates verbose output, which might be needed to see the actual
error. To make it easier to spot, this command also omits the ``-j $(nproc
--all)`` used earlier to utilize every CPU core in the system for the job -- but
this parallelism also results in some clutter when failures occur.
After a few seconds the build process should run into the error again. Now try
to find the most crucial line describing the problem. Then search the internet
for the most important and non-generic section of that line (say 4 to 8 words);
avoid or remove anything that looks remotely system-specific, like your username
or local path names like ``/home/username/linux/``. First try your regular
internet search engine with that string, afterwards search Linux kernel mailing
lists via `lore.kernel.org/all/ <https://lore.kernel.org/all/>`_.
This most of the time will find something that will explain what is wrong; quite
often one of the hits will provide a solution for your problem, too. If you
do not find anything that matches your problem, try again from a different angle
by modifying your search terms or using another line from the error messages.
In the end, most trouble you are to run into has likely been encountered and
reported by others already. That includes issues where the cause is not your
system, but lies the code. If you run into one of those, you might thus find a
solution (e.g. a patch) or workaround for your problem, too.
Package your kernel up
~~~~~~~~~~~~~~~~~~~~~~
The step-by-step guide uses the default make targets (e.g. 'bzImage' and
'modules' on x86) to build the image and the modules of your kernel, which later
steps of the guide then install. You instead can also directly build everything
and directly package it up by using one of the following targets:
* ``make -j $(nproc --all) bindeb-pkg`` to generate a deb package
* ``make -j $(nproc --all) binrpm-pkg`` to generate a rpm package
* ``make -j $(nproc --all) tarbz2-pkg`` to generate a bz2 compressed tarball
This is just a selection of available make targets for this purpose, see
``make help`` for others. You can also use these targets after running
``make -j $(nproc --all)``, as they will pick up everything already built.
If you employ the targets to generate deb or rpm packages, ignore the
step-by-step guide's instructions on installing and removing your kernel;
instead install and remove the packages using the package utility for the format
(e.g. dpkg and rpm) or a package management utility build on top of them (apt,
aptitude, dnf/yum, zypper, ...). Be aware that the packages generated using
these two make targets are designed to work on various distributions utilizing
those formats, they thus will sometimes behave differently than your
distribution's kernel packages.
[:ref:`back to step-by-step guide <build_bissbs>`]
.. _install_bisref:
Put the kernel in place
-----------------------
*Install the kernel you just built.* [:ref:`... <install_bissbs>`]
What you need to do after executing the command in the step-by-step guide
depends on the existence and the implementation of an ``installkernel``
executable. Many commodity Linux distributions ship such a kernel installer in
'/sbin/' that does everything needed, hence there is nothing left for you
except rebooting. But some distributions contain an installkernel that does
only part of the job -- and a few lack it completely and leave all the work to
you.
If ``installkernel`` is found, the kernel's build system will delegate the
actual installation of your kernel's image and related files to this executable.
On almost all Linux distributions it will store the image as '/boot/vmlinuz-
<kernelrelease identifier>' and put a 'System.map-<kernelrelease
identifier>' alongside it. Your kernel will thus be installed in parallel to any
existing ones, unless you already have one with exactly the same release name.
Installkernel on many distributions will afterwards generate an 'initramfs'
(often also called 'initrd'), which commodity distributions rely on for booting;
hence be sure to keep the order of the two make targets used in the step-by-step
guide, as things will go sideways if you install your kernel's image before its
modules. Often installkernel will then add your kernel to the bootloader
configuration, too. You have to take care of one or both of these tasks
yourself, if your distributions installkernel doesn't handle them.
A few distributions like Arch Linux and its derivatives totally lack an
installkernel executable. On those just install the modules using the kernel's
build system and then install the image and the System.map file manually::
sudo make modules_install
sudo install -m 0600 $(make -s image_name) /boot/vmlinuz-$(make -s kernelrelease)
sudo install -m 0600 System.map /boot/System.map-$(make -s kernelrelease)
If your distribution boots with the help of an initramfs, now generate one for
your kernel using the tools your distribution provides for this process.
Afterwards add your kernel to your bootloader configuration and reboot.
[:ref:`back to step-by-step guide <install_bissbs>`]
.. _storagespace_bisref:
Storage requirements per kernel
-------------------------------
*Check how much storage space the kernel, its modules, and other related files
like the initramfs consume.* [:ref:`... <storagespace_bissbs>`]
The kernels built during a bisection consume quite a bit of space in /boot/ and
/lib/modules/, especially if you enabled debug symbols. That makes it easy to
fill up volumes during a bisection -- and due to that even kernels which used to
work earlier might fail to boot. To prevent that you will need to know how much
space each installed kernel typically requires.
Note, most of the time the pattern '/boot/*$(make -s kernelrelease)*' used in
the guide will match all files needed to boot your kernel -- but neither the
path nor the naming scheme are mandatory. On some distributions you thus will
need to look in different places.
[:ref:`back to step-by-step guide <storagespace_bissbs>`]
.. _recheckbroken_bisref:
Check the kernel built from the latest codebase
-----------------------------------------------
*Reboot into the kernel you just built and check if the feature that regressed
is really broken there.* [:ref:`... <recheckbroken_bissbs>`]
There are a couple of reasons why the regression you face might not show up with
your own kernel built from the latest codebase. These are the most frequent:
* The cause for the regression was fixed meanwhile.
* The regression with the broken kernel was caused by a change in the build
configuration the provider of your kernel carried out.
* Your problem might be a race condition that does not show up with your kernel;
the trimmed build configuration, a different setting for debug symbols, the
compiler used, and various other things can cause this.
* In case you encountered the regression with a stable/longterm kernel it might
be a problem that is specific to that series; the next step in this guide will
check this.
[:ref:`back to step-by-step guide <recheckbroken_bissbs>`]
.. _recheckstablebroken_bisref:
Check the kernel built from the latest stable/longterm codebase
---------------------------------------------------------------
*Are you facing a regression within a stable/longterm release, but failed to
reproduce it with the kernel you just built using the latest mainline sources?
Then check if the latest codebase for the particular series might already fix
the problem.* [:ref:`... <recheckstablebroken_bissbs>`]
If this kernel does not show the regression either, there most likely is no need
for a bisection.
[:ref:`back to step-by-step guide <recheckstablebroken_bissbs>`]
.. _introworkingcheck_bisref:
Ensure the 'good' version is really working well
------------------------------------------------
*Check if the kernels you build work fine.*
[:ref:`... <introworkingcheck_bissbs>`]
This section will reestablish a known working base. Skipping it might be
appealing, but is usually a bad idea, as it does something important:
It will ensure the .config file you prepared earlier actually works as expected.
That is in your own interest, as trimming the configuration is not foolproof --
and you might be building and testing ten or more kernels for nothing before
starting to suspect something might be wrong with the build configuration.
That alone is reason enough to spend the time on this, but not the only reason.
Many readers of this guide normally run kernels that are patched, use add-on
modules, or both. Those kernels thus are not considered 'vanilla' -- therefore
it's possible that the thing that regressed might never have worked in vanilla
builds of the 'good' version in the first place.
There is a third reason for those that noticed a regression between
stable/longterm kernels of different series (e.g. v6.0.13..v6.1.5): it will
ensure the kernel version you assumed to be 'good' earlier in the process (e.g.
v6.0) actually is working.
[:ref:`back to step-by-step guide <introworkingcheck_bissbs>`]
.. _recheckworking_bisref:
Build your own version of the 'good' kernel
-------------------------------------------
*Build your own variant of the working kernel and check if the feature that
regressed works as expected with it.* [:ref:`... <recheckworking_bissbs>`]
In case the feature that broke with newer kernels does not work with your first
self-built kernel, find and resolve the cause before moving on. There are a
multitude of reasons why this might happen. Some ideas where to look:
* Maybe localmodconfig did something odd and disabled the module required to
test the feature? Then you might want to recreate a .config file based on the
one from the last working kernel and skip trimming it down; manually disabling
some features in the .config might work as well to reduce the build time.
* Maybe it's not a kernel regression and something that is caused by some fluke,
a broken initramfs (also known as initrd), new firmware files, or an updated
userland software?
* Maybe it was a feature added to your distributor's kernel which vanilla Linux
at that point never supported?
Note, if you found and fixed problems with the .config file, you want to use it
to build another kernel from the latest codebase, as your earlier tests with
mainline and the latest version from an affected stable/longterm series most
likely has been flawed.
[:ref:`back to step-by-step guide <recheckworking_bissbs>`]
.. _bisectstart_bisref:
Start the bisection
-------------------
*Start the bisection and tell Git about the versions earlier established as
'good' and 'bad'.* [:ref:`... <bisectstart_bissbs>`]
This will start the bisection process; the last of the commands will make Git
checkout a commit round about half-way between the 'good' and the 'bad' changes
for your to test.
[:ref:`back to step-by-step guide <bisectstart_bissbs>`]
.. _bisectbuild_bisref:
Build a kernel from the bisection point
---------------------------------------
*Build, install, and boot a kernel from the code Git checked out using the
same commands you used earlier.* [:ref:`... <bisectbuild_bissbs>`]
There are two things worth of note here:
* Occasionally building the kernel will fail or it might not boot due some
problem in the code at the bisection point. In that case run this command::
git bisect skip
Git will then check out another commit nearby which with a bit of luck should
work better. Afterwards restart executing this step.
* Those slightly odd looking version identifiers can happen during bisections,
because the Linux kernel subsystems prepare their changes for a new mainline
release (say 6.2) before its predecessor (e.g. 6.1) is finished. They thus
base them on a somewhat earlier point like v6.1-rc1 or even v6.0 -- and then
get merged for 6.2 without rebasing nor squashing them once 6.1 is out. This
leads to those slightly odd looking version identifiers coming up during
bisections.
[:ref:`back to step-by-step guide <bisectbuild_bissbs>`]
.. _bisecttest_bisref:
Bisection checkpoint
--------------------
*Check if the feature that regressed works in the kernel you just built.*
[:ref:`... <bisecttest_bissbs>`]
Ensure what you tell Git is accurate: getting it wrong just one time will bring
the rest of the bisection totally of course, hence all testing after that point
will be for nothing.
[:ref:`back to step-by-step guide <bisecttest_bissbs>`]
.. _bisectlog_bisref:
Put the bisection log away
--------------------------
*Store Git's bisection log and the current .config file in a safe place.*
[:ref:`... <bisectlog_bissbs>`]
As indicated above: declaring just one kernel wrongly as 'good' or 'bad' will
render the end result of a bisection useless. In that case you'd normally have
to restart the bisection from scratch. The log can prevent that, as it might
allow someone to point out where a bisection likely went sideways -- and then
instead of testing ten or more kernels you might only have to build a few to
resolve things.
The .config file is put aside, as there is a decent chance that developers might
ask for it after you reported the regression.
[:ref:`back to step-by-step guide <bisectlog_bissbs>`]
.. _revert_bisref:
Try reverting the culprit
-------------------------
*Try reverting the culprit on top of the latest codebase to see if this fixes
your regression.* [:ref:`... <revert_bissbs>`]
This is an optional step, but whenever possible one you should try: there is a
decent chance that developers will ask you to perform this step when you bring
the bisection result up. So give it a try, you are in the flow already, building
one more kernel shouldn't be a big deal at this point.
The step-by-step guide covers everything relevant already except one slightly
rare thing: did you bisected a regression that also happened with mainline using
a stable/longterm series, but Git failed to revert the commit in mainline? Then
try to revert the culprit in the affected stable/longterm series -- and if that
succeeds, test that kernel version instead.
[:ref:`back to step-by-step guide <revert_bissbs>`]
Supplementary tasks: cleanup during and after the bisection
-----------------------------------------------------------
.. _makeroom_bisref:
Cleaning up during the bisection
--------------------------------
*To remove one of the kernels you installed, look up its 'kernelrelease'
identifier.* [:ref:`... <makeroom_bissbs>`]
The kernels you install during this process are easy to remove later, as its
parts are only stored in two places and clearly identifiable. You thus do not
need to worry to mess up your machine when you install a kernel manually (and
thus bypass your distribution's packaging system): all parts of your kernels are
relatively easy to remove later.
One of the two places is a directory in /lib/modules/, which holds the modules
for each installed kernel. This directory is named after the kernel's release
identifier; hence, to remove all modules for one of the kernels you built,
simply remove its modules directory in /lib/modules/.
The other place is /boot/, where typically two up to five files will be placed
during installation of a kernel. All of them usually contain the release name in
their file name, but how many files and their exact name depends somewhat on
your distribution's installkernel executable and its initramfs generator. On
some distributions the ``kernel-install remove...`` command mentioned in the
step-by-step guide will delete all of these files for you while also removing
the menu entry for the kernel from your bootloader configuration. On others you
have to take care of these two tasks yourself. The following command should
interactively remove the three main files of a kernel with the release name
'6.0-rc1-local-gcafec0cacaca0'::
rm -i /boot/{System.map,vmlinuz,initr}-6.0-rc1-local-gcafec0cacaca0
Afterwards check for other files in /boot/ that have
'6.0-rc1-local-gcafec0cacaca0' in their name and consider deleting them as well.
Now remove the boot entry for the kernel from your bootloader's configuration;
the steps to do that vary quite a bit between Linux distributions.
Note, be careful with wildcards like '*' when deleting files or directories
for kernels manually: you might accidentally remove files of a 6.0.11 kernel
when all you want is to remove 6.0 or 6.0.1.
[:ref:`back to step-by-step guide <makeroom_bissbs>`]
Cleaning up after the bisection
-------------------------------
.. _finishingtouch_bisref:
*Once you have finished the bisection, do not immediately remove anything
you set up, as you might need a few things again.*
[:ref:`... <finishingtouch_bissbs>`]
When you are really short of storage space removing the kernels as described in
the step-by-step guide might not free as much space as you would like. In that
case consider running ``rm -rf ~/linux/*`` as well now. This will remove the
build artifacts and the Linux sources, but will leave the Git repository
(~/linux/.git/) behind -- a simple ``git reset --hard`` thus will bring the
sources back.
Removing the repository as well would likely be unwise at this point: there is a
decent chance developers will ask you to build another kernel to perform
additional tests. This is often required to debug an issue or check proposed
fixes. Before doing so you want to run the ``git fetch mainline`` command again
followed by ``git checkout mainline/master`` to bring your clone up to date and
checkout the latest codebase. Then apply the patch using ``git apply
<filename>`` or ``git am <filename>`` and build yet another kernel using the
familiar commands.
Additional tests are also the reason why you want to keep the
~/kernel-config-working file around for a few weeks.
[:ref:`back to step-by-step guide <finishingtouch_bissbs>`]
Additional reading material
===========================
Further sources
---------------
* The `man page for 'git bisect' <https://git-scm.com/docs/git-bisect>`_ and
`fighting regressions with 'git bisect' <https://git-scm.com/docs/git-bisect-lk2009.html>`_
in the Git documentation.
* `Working with git bisect <https://nathanchance.dev/posts/working-with-git-bisect/>`_
from kernel developer Nathan Chancellor.
* `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_.
* `Fully automated bisecting with 'git bisect run' <https://lwn.net/Articles/317154>`_.
..
end-of-content
..
This document is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If
you spot a typo or small mistake, feel free to let him know directly and
he'll fix it. You are free to do the same in a mostly informal way if you
want to contribute changes to the text -- but for copyright reasons please CC
linux-doc@vger.kernel.org and 'sign-off' your contribution as
Documentation/process/submitting-patches.rst explains in the section 'Sign
your work - the Developer's Certificate of Origin'.
..
This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top
of the file. If you want to distribute this text under CC-BY-4.0 only,
please use 'The Linux kernel development community' for author attribution
and link this as source:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
..
Note: Only the content of this RST file as found in the Linux kernel sources
is available under CC-BY-4.0, as versions of this text that were processed
(for example by the kernel's build system) might contain content taken from
files which use a more restrictive license.
...@@ -346,9 +346,9 @@ sys.stderr.write("Using %s theme\n" % html_theme) ...@@ -346,9 +346,9 @@ sys.stderr.write("Using %s theme\n" % html_theme)
html_static_path = ['sphinx-static'] html_static_path = ['sphinx-static']
# If true, Docutils "smart quotes" will be used to convert quotes and dashes # If true, Docutils "smart quotes" will be used to convert quotes and dashes
# to typographically correct entities. This will convert "--" to "—", # to typographically correct entities. However, conversion of "--" to "—"
# which is not always what we want, so disable it. # is not always what we want, so enable only quotes.
smartquotes = False smartquotes_action = 'q'
# Custom sidebar templates, maps document names to template names. # Custom sidebar templates, maps document names to template names.
# Note that the RTD theme ignores this # Note that the RTD theme ignores this
......
...@@ -168,7 +168,7 @@ Available options: ...@@ -168,7 +168,7 @@ Available options:
- --fix - --fix
This is an EXPERIMENTAL feature. If correctable errors exists, a file This is an EXPERIMENTAL feature. If correctable errors exist, a file
<inputfile>.EXPERIMENTAL-checkpatch-fixes is created which has the <inputfile>.EXPERIMENTAL-checkpatch-fixes is created which has the
automatically fixable errors corrected. automatically fixable errors corrected.
...@@ -181,7 +181,7 @@ Available options: ...@@ -181,7 +181,7 @@ Available options:
- --ignore-perl-version - --ignore-perl-version
Override checking of perl version. Runtime errors maybe encountered after Override checking of perl version. Runtime errors may be encountered after
enabling this flag if the perl version does not meet the minimum specified. enabling this flag if the perl version does not meet the minimum specified.
- --codespell - --codespell
......
...@@ -277,6 +277,27 @@ traces point to places in code that interacted with the object but that are not ...@@ -277,6 +277,27 @@ traces point to places in code that interacted with the object but that are not
directly present in the bad access stack trace. Currently, this includes directly present in the bad access stack trace. Currently, this includes
call_rcu() and workqueue queuing. call_rcu() and workqueue queuing.
CONFIG_KASAN_EXTRA_INFO
~~~~~~~~~~~~~~~~~~~~~~~
Enabling CONFIG_KASAN_EXTRA_INFO allows KASAN to record and report more
information. The extra information currently supported is the CPU number and
timestamp at allocation and free. More information can help find the cause of
the bug and correlate the error with other system events, at the cost of using
extra memory to record more information (more cost details in the help text of
CONFIG_KASAN_EXTRA_INFO).
Here is the report with CONFIG_KASAN_EXTRA_INFO enabled (only the
different parts are shown)::
==================================================================
...
Allocated by task 134 on cpu 5 at 229.133855s:
...
Freed by task 136 on cpu 3 at 230.199335s:
...
==================================================================
Implementation details Implementation details
---------------------- ----------------------
......
...@@ -341,6 +341,51 @@ Typedefs with function prototypes can also be documented:: ...@@ -341,6 +341,51 @@ Typedefs with function prototypes can also be documented::
*/ */
typedef void (*type_name)(struct v4l2_ctrl *arg1, void *arg2); typedef void (*type_name)(struct v4l2_ctrl *arg1, void *arg2);
Object-like macro documentation
-------------------------------
Object-like macros are distinct from function-like macros. They are
differentiated by whether the macro name is immediately followed by a
left parenthesis ('(') for function-like macros or not followed by one
for object-like macros.
Function-like macros are handled like functions by ``scripts/kernel-doc``.
They may have a parameter list. Object-like macros have do not have a
parameter list.
The general format of an object-like macro kernel-doc comment is::
/**
* define object_name - Brief description.
*
* Description of the object.
*/
Example::
/**
* define MAX_ERRNO - maximum errno value that is supported
*
* Kernel pointers have redundant information, so we can use a
* scheme where we can return either an error code or a normal
* pointer with the same return value.
*/
#define MAX_ERRNO 4095
Example::
/**
* define DRM_GEM_VRAM_PLANE_HELPER_FUNCS - \
* Initializes struct drm_plane_helper_funcs for VRAM handling
*
* This macro initializes struct drm_plane_helper_funcs to use the
* respective helper functions.
*/
#define DRM_GEM_VRAM_PLANE_HELPER_FUNCS \
.prepare_fb = drm_gem_vram_plane_helper_prepare_fb, \
.cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb
Highlights and cross-references Highlights and cross-references
------------------------------- -------------------------------
......
...@@ -27,6 +27,13 @@ documentation and ensure that no new errors or warnings have been ...@@ -27,6 +27,13 @@ documentation and ensure that no new errors or warnings have been
introduced. Generating HTML documents and looking at the result will help introduced. Generating HTML documents and looking at the result will help
to avoid unsightly misunderstandings about how things will be rendered. to avoid unsightly misunderstandings about how things will be rendered.
All new documentation (including additions to existing documents) should
ideally justify who the intended target audience is somewhere in the
changelog; this way, we ensure that the documentation ends up in the correct
place. Some possible categories are: kernel developers (experts or
beginners), userspace programmers, end users and/or system administrators,
and distributors.
Key cycle dates Key cycle dates
--------------- ---------------
......
...@@ -48,13 +48,14 @@ or ``virtualenv``, depending on how your distribution packaged Python 3. ...@@ -48,13 +48,14 @@ or ``virtualenv``, depending on how your distribution packaged Python 3.
on the Sphinx version, it should be installed separately, on the Sphinx version, it should be installed separately,
with ``pip install sphinx_rtd_theme``. with ``pip install sphinx_rtd_theme``.
In summary, if you want to install Sphinx version 2.4.4, you should do:: In summary, if you want to install the latest version of Sphinx, you
should do::
$ virtualenv sphinx_2.4.4 $ virtualenv sphinx_latest
$ . sphinx_2.4.4/bin/activate $ . sphinx_latest/bin/activate
(sphinx_2.4.4) $ pip install -r Documentation/sphinx/requirements.txt (sphinx_latest) $ pip install -r Documentation/sphinx/requirements.txt
After running ``. sphinx_2.4.4/bin/activate``, the prompt will change, After running ``. sphinx_latest/bin/activate``, the prompt will change,
in order to indicate that you're using the new environment. If you in order to indicate that you're using the new environment. If you
open a new shell, you need to rerun this command to enter again at open a new shell, you need to rerun this command to enter again at
the virtual environment before building the documentation. the virtual environment before building the documentation.
...@@ -63,8 +64,7 @@ Image output ...@@ -63,8 +64,7 @@ Image output
------------ ------------
The kernel documentation build system contains an extension that The kernel documentation build system contains an extension that
handles images on both GraphViz and SVG formats (see handles images in both GraphViz and SVG formats (see :ref:`sphinx_kfigure`).
:ref:`sphinx_kfigure`).
For it to work, you need to install both GraphViz and ImageMagick For it to work, you need to install both GraphViz and ImageMagick
packages. If those packages are not installed, the build system will packages. If those packages are not installed, the build system will
...@@ -108,7 +108,7 @@ further info. ...@@ -108,7 +108,7 @@ further info.
Checking for Sphinx dependencies Checking for Sphinx dependencies
-------------------------------- --------------------------------
There's a script that automatically check for Sphinx dependencies. If it can There's a script that automatically checks for Sphinx dependencies. If it can
recognize your distribution, it will also give a hint about the install recognize your distribution, it will also give a hint about the install
command line options for your distro:: command line options for your distro::
...@@ -283,7 +283,7 @@ Here are some specific guidelines for the kernel documentation: ...@@ -283,7 +283,7 @@ Here are some specific guidelines for the kernel documentation:
from highlighting. For a short snippet of code embedded in the text, use \`\`. from highlighting. For a short snippet of code embedded in the text, use \`\`.
the C domain The C domain
------------ ------------
The **Sphinx C Domain** (name c) is suited for documentation of C API. E.g. a The **Sphinx C Domain** (name c) is suited for documentation of C API. E.g. a
......
...@@ -9,110 +9,141 @@ of device drivers. This document is an only somewhat organized collection ...@@ -9,110 +9,141 @@ of device drivers. This document is an only somewhat organized collection
of some of those interfaces — it will hopefully get better over time! The of some of those interfaces — it will hopefully get better over time! The
available subsections can be seen below. available subsections can be seen below.
General information for driver authors
======================================
This section contains documentation that should, at some point or other, be
of interest to most developers working on device drivers.
.. toctree:: .. toctree::
:caption: Table of contents :maxdepth: 1
:maxdepth: 2
driver-model/index
basics basics
driver-model/index
device_link
infrastructure infrastructure
ioctl ioctl
early-userspace/index
pm/index pm/index
clk
Useful support libraries
========================
This section contains documentation that should, at some point or other, be
of interest to most developers working on device drivers.
.. toctree::
:maxdepth: 1
early-userspace/index
connector
device-io device-io
devfreq
dma-buf dma-buf
device_link
component component
message-based io-mapping
infiniband io_ordering
aperture uio-howto
frame-buffer vfio-mediated-device
regulator vfio
reset vfio-pci-device-specific-driver-acceptance
iio/index
input Bus-level documentation
usb/index =======================
firewire
pci/index .. toctree::
:maxdepth: 1
auxiliary_bus
cxl/index cxl/index
spi eisa
i2c firewire
ipmb
ipmi
i3c/index i3c/index
interconnect isa
devfreq men-chameleon-bus
hsi pci/index
edac
scsi
libata
target
mailbox
mtdnand
miscellaneous
mei/index
mtd/index
mmc/index
nvdimm/index
w1
rapidio/index rapidio/index
s390-drivers slimbus
usb/index
virtio/index
vme vme
w1
xillybus
Subsystem-specific APIs
=======================
.. toctree::
:maxdepth: 1
80211/index 80211/index
uio-howto acpi/index
backlight/lp855x-driver.rst
clk
console
crypto/index
dmaengine/index
dpll
edac
firmware/index firmware/index
pin-control fpga/index
frame-buffer
aperture
generic-counter
gpio/index gpio/index
hsi
hte/index
i2c
iio/index
infiniband
input
interconnect
ipmb
ipmi
libata
mailbox
md/index md/index
media/index media/index
mei/index
memory-devices/index
message-based
misc_devices misc_devices
miscellaneous
mmc/index
mtd/index
mtdnand
nfc/index nfc/index
dmaengine/index
slimbus
soundwire/index
thermal/index
fpga/index
acpi/index
auxiliary_bus
backlight/lp855x-driver.rst
connector
console
eisa
isa
io-mapping
io_ordering
generic-counter
memory-devices/index
men-chameleon-bus
ntb ntb
nvdimm/index
nvmem nvmem
parport-lowlevel parport-lowlevel
phy/index
pin-control
pldmfw/index
pps pps
ptp ptp
phy/index
pwm pwm
pldmfw/index regulator
reset
rfkill rfkill
s390-drivers
scsi
serial/index serial/index
sm501 sm501
soundwire/index
spi
surface_aggregator/index surface_aggregator/index
switchtec switchtec
sync_file sync_file
target
tee
thermal/index
tty/index tty/index
vfio-mediated-device wbrf
vfio wmi
vfio-pci-device-specific-driver-acceptance
virtio/index
xilinx/index xilinx/index
xillybus
zorro zorro
hte/index
wmi
dpll
wbrf
crypto/index
tee
.. only:: subproject and html .. only:: subproject and html
......
.. SPDX-License-Identifier: GPL-2.0 .. SPDX-License-Identifier: GPL-2.0
=============== ===============
fault-injection Fault-injection
=============== ===============
.. toctree:: .. toctree::
......
...@@ -1899,8 +1899,8 @@ For more information on mount propagation see: ...@@ -1899,8 +1899,8 @@ For more information on mount propagation see:
These files provide a method to access a task's comm value. It also allows for These files provide a method to access a task's comm value. It also allows for
a task to set its own or one of its thread siblings comm value. The comm value a task to set its own or one of its thread siblings comm value. The comm value
is limited in size compared to the cmdline value, so writing anything longer is limited in size compared to the cmdline value, so writing anything longer
then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated then the kernel's TASK_COMM_LEN (currently 16 chars, including the NUL
comm value. terminator) will result in a truncated comm value.
3.7 /proc/<pid>/task/<tid>/children - Information about task children 3.7 /proc/<pid>/task/<tid>/children - Information about task children
......
...@@ -22,10 +22,10 @@ community and getting your work upstream. ...@@ -22,10 +22,10 @@ community and getting your work upstream.
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
process/development-process Development process <process/development-process>
process/submitting-patches Submitting patches <process/submitting-patches>
Code of conduct <process/code-of-conduct> Code of conduct <process/code-of-conduct>
maintainer/index Maintainer handbook <maintainer/index>
All development-process docs <process/index> All development-process docs <process/index>
...@@ -38,10 +38,10 @@ kernel. ...@@ -38,10 +38,10 @@ kernel.
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
core-api/index Core API <core-api/index>
driver-api/index Driver APIs <driver-api/index>
subsystem-apis Subsystems <subsystem-apis>
Locking in the kernel <locking/index> Locking <locking/index>
Development tools and processes Development tools and processes
=============================== ===============================
...@@ -51,15 +51,15 @@ Various other manuals with useful information for all kernel developers. ...@@ -51,15 +51,15 @@ Various other manuals with useful information for all kernel developers.
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
process/license-rules Licensing rules <process/license-rules>
doc-guide/index Writing documentation <doc-guide/index>
dev-tools/index Development tools <dev-tools/index>
dev-tools/testing-overview Testing guide <dev-tools/testing-overview>
kernel-hacking/index Hacking guide <kernel-hacking/index>
trace/index Tracing <trace/index>
fault-injection/index Fault injection <fault-injection/index>
livepatch/index Livepatching <livepatch/index>
rust/index Rust <rust/index>
User-oriented documentation User-oriented documentation
...@@ -72,11 +72,11 @@ developers seeking information on the kernel's user-space APIs. ...@@ -72,11 +72,11 @@ developers seeking information on the kernel's user-space APIs.
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
admin-guide/index Administration <admin-guide/index>
The kernel build system <kbuild/index> Build system <kbuild/index>
admin-guide/reporting-issues.rst Reporting issues <admin-guide/reporting-issues.rst>
User-space tools <tools/index> Userspace tools <tools/index>
userspace-api/index Userspace API <userspace-api/index>
See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_, See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_,
which are kept separately from the kernel's own documentation. which are kept separately from the kernel's own documentation.
...@@ -89,8 +89,8 @@ platform firmwares. ...@@ -89,8 +89,8 @@ platform firmwares.
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
firmware-guide/index Firmware <firmware-guide/index>
devicetree/index Firmware and Devicetree <devicetree/index>
Architecture-specific documentation Architecture-specific documentation
...@@ -99,7 +99,7 @@ Architecture-specific documentation ...@@ -99,7 +99,7 @@ Architecture-specific documentation
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
arch/index CPU architectures <arch/index>
Other documentation Other documentation
...@@ -112,7 +112,7 @@ to ReStructured Text format, or are simply too old. ...@@ -112,7 +112,7 @@ to ReStructured Text format, or are simply too old.
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
staging/index Unsorted documentation <staging/index>
Translations Translations
...@@ -121,7 +121,7 @@ Translations ...@@ -121,7 +121,7 @@ Translations
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
translations/index Translations <translations/index>
Indices and tables Indices and tables
================== ==================
......
...@@ -102,7 +102,10 @@ to do something different in the near future. ...@@ -102,7 +102,10 @@ to do something different in the near future.
../doc-guide/maintainer-profile ../doc-guide/maintainer-profile
../nvdimm/maintainer-entry-profile ../nvdimm/maintainer-entry-profile
../arch/riscv/patch-acceptance ../arch/riscv/patch-acceptance
../process/maintainer-soc
../process/maintainer-soc-clean-dts
../driver-api/media/maintainer-entry-profile ../driver-api/media/maintainer-entry-profile
../process/maintainer-netdev
../driver-api/vfio-pci-device-specific-driver-acceptance ../driver-api/vfio-pci-device-specific-driver-acceptance
../nvme/feature-and-quirk-policy ../nvme/feature-and-quirk-policy
../filesystems/xfs/xfs-maintainer-entry-profile ../filesystems/xfs/xfs-maintainer-entry-profile
...@@ -324,7 +324,7 @@ Contact Info ...@@ -324,7 +324,7 @@ Contact Info
The code is currently maintained by Roopa Prabhu <roopa@nvidia.com> and The code is currently maintained by Roopa Prabhu <roopa@nvidia.com> and
Nikolay Aleksandrov <razor@blackwall.org>. Bridge bugs and enhancements Nikolay Aleksandrov <razor@blackwall.org>. Bridge bugs and enhancements
are discussed on the linux-netdev mailing list netdev@vger.kernel.org and are discussed on the linux-netdev mailing list netdev@vger.kernel.org and
bridge@lists.linux-foundation.org. bridge@lists.linux.dev.
The list is open to anyone interested: http://vger.kernel.org/vger-lists.html#netdev The list is open to anyone interested: http://vger.kernel.org/vger-lists.html#netdev
......
...@@ -144,8 +144,8 @@ Bison ...@@ -144,8 +144,8 @@ Bison
Since Linux 4.16, the build system generates parsers Since Linux 4.16, the build system generates parsers
during build. This requires bison 2.0 or later. during build. This requires bison 2.0 or later.
pahole: pahole
------- ------
Since Linux 5.2, if CONFIG_DEBUG_INFO_BTF is selected, the build system Since Linux 5.2, if CONFIG_DEBUG_INFO_BTF is selected, the build system
generates BTF (BPF Type Format) from DWARF in vmlinux, a bit later from kernel generates BTF (BPF Type Format) from DWARF in vmlinux, a bit later from kernel
......
...@@ -203,7 +203,7 @@ Do not unnecessarily use braces where a single statement will do. ...@@ -203,7 +203,7 @@ Do not unnecessarily use braces where a single statement will do.
and and
.. code-block:: none .. code-block:: c
if (condition) if (condition)
do_this(); do_this();
...@@ -660,7 +660,7 @@ make a good program). ...@@ -660,7 +660,7 @@ make a good program).
So, you can either get rid of GNU emacs, or change it to use saner So, you can either get rid of GNU emacs, or change it to use saner
values. To do the latter, you can stick the following in your .emacs file: values. To do the latter, you can stick the following in your .emacs file:
.. code-block:: none .. code-block:: elisp
(defun c-lineup-arglist-tabs-only (ignored) (defun c-lineup-arglist-tabs-only (ignored)
"Line up argument lists by tabs, not spaces" "Line up argument lists by tabs, not spaces"
...@@ -899,7 +899,8 @@ which you should use to make sure messages are matched to the right device ...@@ -899,7 +899,8 @@ which you should use to make sure messages are matched to the right device
and driver, and are tagged with the right level: dev_err(), dev_warn(), and driver, and are tagged with the right level: dev_err(), dev_warn(),
dev_info(), and so forth. For messages that aren't associated with a dev_info(), and so forth. For messages that aren't associated with a
particular device, <linux/printk.h> defines pr_notice(), pr_info(), particular device, <linux/printk.h> defines pr_notice(), pr_info(),
pr_warn(), pr_err(), etc. pr_warn(), pr_err(), etc. When drivers are working properly they are quiet,
so prefer to use dev_dbg/pr_debug unless something is wrong.
Coming up with good debugging messages can be quite a challenge; and once Coming up with good debugging messages can be quite a challenge; and once
you have them, they can be a huge help for remote troubleshooting. However you have them, they can be a huge help for remote troubleshooting. However
......
...@@ -255,7 +255,7 @@ an involved disclosed party. The current ambassadors list: ...@@ -255,7 +255,7 @@ an involved disclosed party. The current ambassadors list:
IBM Power Anton Blanchard <anton@linux.ibm.com> IBM Power Anton Blanchard <anton@linux.ibm.com>
IBM Z Christian Borntraeger <borntraeger@de.ibm.com> IBM Z Christian Borntraeger <borntraeger@de.ibm.com>
Intel Tony Luck <tony.luck@intel.com> Intel Tony Luck <tony.luck@intel.com>
Qualcomm Trilok Soni <tsoni@codeaurora.org> Qualcomm Trilok Soni <quic_tsoni@quicinc.com>
RISC-V Palmer Dabbelt <palmer@dabbelt.com> RISC-V Palmer Dabbelt <palmer@dabbelt.com>
Samsung Javier González <javier.gonz@samsung.com> Samsung Javier González <javier.gonz@samsung.com>
......
...@@ -351,8 +351,8 @@ Managing bug reports ...@@ -351,8 +351,8 @@ Managing bug reports
-------------------- --------------------
One of the best ways to put into practice your hacking skills is by fixing One of the best ways to put into practice your hacking skills is by fixing
bugs reported by other people. Not only you will help to make the kernel bugs reported by other people. Not only will you help to make the kernel
more stable, but you'll also learn to fix real world problems and you will more stable, but you'll also learn to fix real-world problems and you will
improve your skills, and other developers will be aware of your presence. improve your skills, and other developers will be aware of your presence.
Fixing bugs is one of the best ways to get merits among other developers, Fixing bugs is one of the best ways to get merits among other developers,
because not many people like wasting time fixing other people's bugs. because not many people like wasting time fixing other people's bugs.
......
...@@ -167,4 +167,4 @@ If no one can be found to internally review patches and you need ...@@ -167,4 +167,4 @@ If no one can be found to internally review patches and you need
help finding such a person, or if you have any other questions help finding such a person, or if you have any other questions
related to this document and the developer community's expectations, related to this document and the developer community's expectations,
please reach out to the private Technical Advisory Board mailing list: please reach out to the private Technical Advisory Board mailing list:
<tech-board@lists.linux-foundation.org>. <tech-board@groups.linuxfoundation.org>.
.. _submitchecklist: .. _submitchecklist:
=======================================
Linux Kernel patch submission checklist Linux Kernel patch submission checklist
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ =======================================
Here are some basic things that developers should do if they want to see their Here are some basic things that developers should do if they want to see their
kernel patch submissions accepted more quickly. kernel patch submissions accepted more quickly.
...@@ -10,111 +11,123 @@ These are all above and beyond the documentation that is provided in ...@@ -10,111 +11,123 @@ These are all above and beyond the documentation that is provided in
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` :ref:`Documentation/process/submitting-patches.rst <submittingpatches>`
and elsewhere regarding submitting Linux kernel patches. and elsewhere regarding submitting Linux kernel patches.
Review your code
================
1) If you use a facility then #include the file that defines/declares 1) If you use a facility then #include the file that defines/declares
that facility. Don't depend on other header files pulling in ones that facility. Don't depend on other header files pulling in ones
that you use. that you use.
2) Builds cleanly: 2) Check your patch for general style as detailed in
:ref:`Documentation/process/coding-style.rst <codingstyle>`.
a) with applicable or modified ``CONFIG`` options ``=y``, ``=m``, and
``=n``. No ``gcc`` warnings/errors, no linker warnings/errors.
b) Passes ``allnoconfig``, ``allmodconfig``
c) Builds successfully when using ``O=builddir``
d) Any Documentation/ changes build successfully without new warnings/errors.
Use ``make htmldocs`` or ``make pdfdocs`` to check the build and
fix any issues.
3) Builds on multiple CPU architectures by using local cross-compile tools 3) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a
or some other build farm. comment in the source code that explains the logic of what they are doing
and why.
4) ppc64 is a good architecture for cross-compilation checking because it Review Kconfig changes
tends to use ``unsigned long`` for 64-bit quantities. ======================
5) Check your patch for general style as detailed in 1) Any new or modified ``CONFIG`` options do not muck up the config menu and
:ref:`Documentation/process/coding-style.rst <codingstyle>`.
Check for trivial violations with the patch style checker prior to
submission (``scripts/checkpatch.pl``).
You should be able to justify all violations that remain in
your patch.
6) Any new or modified ``CONFIG`` options do not muck up the config menu and
default to off unless they meet the exception criteria documented in default to off unless they meet the exception criteria documented in
``Documentation/kbuild/kconfig-language.rst`` Menu attributes: default value. ``Documentation/kbuild/kconfig-language.rst`` Menu attributes: default value.
7) All new ``Kconfig`` options have help text. 2) All new ``Kconfig`` options have help text.
8) Has been carefully reviewed with respect to relevant ``Kconfig`` 3) Has been carefully reviewed with respect to relevant ``Kconfig``
combinations. This is very hard to get right with testing -- brainpower combinations. This is very hard to get right with testing---brainpower
pays off here. pays off here.
9) Check cleanly with sparse. Provide documentation
=====================
10) Use ``make checkstack`` and fix any problems that it finds. 1) Include :ref:`kernel-doc <kernel_doc>` to document global kernel APIs.
(Not required for static functions, but OK there also.)
.. note:: 2) All new ``/proc`` entries are documented under ``Documentation/``
``checkstack`` does not point out problems explicitly, 3) All new kernel boot parameters are documented in
but any one function that uses more than 512 bytes on the stack is a ``Documentation/admin-guide/kernel-parameters.rst``.
candidate for change.
11) Include :ref:`kernel-doc <kernel_doc>` to document global kernel APIs. 4) All new module parameters are documented with ``MODULE_PARM_DESC()``
(Not required for static functions, but OK there also.) Use
``make htmldocs`` or ``make pdfdocs`` to check the
:ref:`kernel-doc <kernel_doc>` and fix any issues.
12) Has been tested with ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``, 5) All new userspace interfaces are documented in ``Documentation/ABI/``.
``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``, See ``Documentation/ABI/README`` for more information.
``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``, Patches that change userspace interfaces should be CCed to
``CONFIG_PROVE_RCU`` and ``CONFIG_DEBUG_OBJECTS_RCU_HEAD`` all linux-api@vger.kernel.org.
simultaneously enabled.
13) Has been build- and runtime tested with and without ``CONFIG_SMP`` and 6) If any ioctl's are added by the patch, then also update
``CONFIG_PREEMPT.`` ``Documentation/userspace-api/ioctl/ioctl-number.rst``.
Check your code with tools
==========================
14) All codepaths have been exercised with all lockdep features enabled. 1) Check for trivial violations with the patch style checker prior to
submission (``scripts/checkpatch.pl``).
You should be able to justify all violations that remain in
your patch.
15) All new ``/proc`` entries are documented under ``Documentation/`` 2) Check cleanly with sparse.
16) All new kernel boot parameters are documented in 3) Use ``make checkstack`` and fix any problems that it finds.
``Documentation/admin-guide/kernel-parameters.rst``. Note that ``checkstack`` does not point out problems explicitly,
but any one function that uses more than 512 bytes on the stack is a
candidate for change.
17) All new module parameters are documented with ``MODULE_PARM_DESC()`` Build your code
===============
18) All new userspace interfaces are documented in ``Documentation/ABI/``. 1) Builds cleanly:
See ``Documentation/ABI/README`` for more information.
Patches that change userspace interfaces should be CCed to
linux-api@vger.kernel.org.
19) Has been checked with injection of at least slab and page-allocation a) with applicable or modified ``CONFIG`` options ``=y``, ``=m``, and
failures. See ``Documentation/fault-injection/``. ``=n``. No ``gcc`` warnings/errors, no linker warnings/errors.
If the new code is substantial, addition of subsystem-specific fault b) Passes ``allnoconfig``, ``allmodconfig``
injection might be appropriate.
20) Newly-added code has been compiled with ``gcc -W`` (use c) Builds successfully when using ``O=builddir``
``make KCFLAGS=-W``). This will generate lots of noise, but is good
for finding bugs like "warning: comparison between signed and unsigned".
21) Tested after it has been merged into the -mm patchset to make sure d) Any Documentation/ changes build successfully without new warnings/errors.
that it still works with all of the other queued patches and various Use ``make htmldocs`` or ``make pdfdocs`` to check the build and
changes in the VM, VFS, and other subsystems. fix any issues.
22) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a 2) Builds on multiple CPU architectures by using local cross-compile tools
comment in the source code that explains the logic of what they are doing or some other build farm. Note that ppc64 is a good architecture for
and why. cross-compilation checking because it tends to use ``unsigned long`` for
64-bit quantities.
23) If any ioctl's are added by the patch, then also update 3) Newly-added code has been compiled with ``gcc -W`` (use
``Documentation/userspace-api/ioctl/ioctl-number.rst``. ``make KCFLAGS=-W``). This will generate lots of noise, but is good
for finding bugs like "warning: comparison between signed and unsigned".
24) If your modified source code depends on or uses any of the kernel 4) If your modified source code depends on or uses any of the kernel
APIs or features that are related to the following ``Kconfig`` symbols, APIs or features that are related to the following ``Kconfig`` symbols,
then test multiple builds with the related ``Kconfig`` symbols disabled then test multiple builds with the related ``Kconfig`` symbols disabled
and/or ``=m`` (if that option is available) [not all of these at the and/or ``=m`` (if that option is available) [not all of these at the
same time, just various/random combinations of them]: same time, just various/random combinations of them]:
``CONFIG_SMP``, ``CONFIG_SYSFS``, ``CONFIG_PROC_FS``, ``CONFIG_INPUT``, ``CONFIG_PCI``, ``CONFIG_BLOCK``, ``CONFIG_PM``, ``CONFIG_MAGIC_SYSRQ``, ``CONFIG_SMP``, ``CONFIG_SYSFS``, ``CONFIG_PROC_FS``, ``CONFIG_INPUT``,
``CONFIG_PCI``, ``CONFIG_BLOCK``, ``CONFIG_PM``, ``CONFIG_MAGIC_SYSRQ``,
``CONFIG_NET``, ``CONFIG_INET=n`` (but latter with ``CONFIG_NET=y``). ``CONFIG_NET``, ``CONFIG_INET=n`` (but latter with ``CONFIG_NET=y``).
Test your code
==============
1) Has been tested with ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``,
``CONFIG_SLUB_DEBUG``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``,
``CONFIG_PROVE_RCU`` and ``CONFIG_DEBUG_OBJECTS_RCU_HEAD`` all
simultaneously enabled.
2) Has been build- and runtime tested with and without ``CONFIG_SMP`` and
``CONFIG_PREEMPT.``
3) All codepaths have been exercised with all lockdep features enabled.
4) Has been checked with injection of at least slab and page-allocation
failures. See ``Documentation/fault-injection/``.
If the new code is substantial, addition of subsystem-specific fault
injection might be appropriate.
5) Tested with the most recent tag of linux-next to make sure that it still
works with all of the other queued patches and various changes in the VM,
VFS, and other subsystems.
...@@ -54,9 +54,7 @@ ...@@ -54,9 +54,7 @@
\renewcommand*\l@section{\@dottedtocline{1}{2.4em}{3.2em}} \renewcommand*\l@section{\@dottedtocline{1}{2.4em}{3.2em}}
\renewcommand*\l@subsection{\@dottedtocline{2}{5.6em}{4.3em}} \renewcommand*\l@subsection{\@dottedtocline{2}{5.6em}{4.3em}}
\makeatother \makeatother
%% Sphinx < 1.8 doesn't have \sphinxtableofcontentshook %% Prevent default \sphinxtableofcontentshook from overwriting above tweaks.
\providecommand{\sphinxtableofcontentshook}{}
%% Undefine it for compatibility with Sphinx 1.7.9
\renewcommand{\sphinxtableofcontentshook}{} % Empty the hook \renewcommand{\sphinxtableofcontentshook}{} % Empty the hook
% Prevent column squeezing of tabulary. \tymin is set by Sphinx as: % Prevent column squeezing of tabulary. \tymin is set by Sphinx as:
...@@ -136,9 +134,6 @@ ...@@ -136,9 +134,6 @@
} }
\newCJKfontfamily[JPsans]\jpsans{Noto Sans CJK JP}[AutoFakeSlant] \newCJKfontfamily[JPsans]\jpsans{Noto Sans CJK JP}[AutoFakeSlant]
\newCJKfontfamily[JPmono]\jpmono{Noto Sans Mono CJK JP}[AutoFakeSlant] \newCJKfontfamily[JPmono]\jpmono{Noto Sans Mono CJK JP}[AutoFakeSlant]
% Dummy commands for Sphinx < 2.3 (no 'extrapackages' support)
\providecommand{\onehalfspacing}{}
\providecommand{\singlespacing}{}
% Define custom macros to on/off CJK % Define custom macros to on/off CJK
%% One and half spacing for CJK contents %% One and half spacing for CJK contents
\newcommand{\kerneldocCJKon}{\makexeCJKactive\onehalfspacing} \newcommand{\kerneldocCJKon}{\makexeCJKactive\onehalfspacing}
......
# jinja2>=3.1 is not compatible with Sphinx<4.0 alabaster
jinja2<3.1 Sphinx
# alabaster>=0.7.14 is not compatible with Sphinx<=3.3
alabaster<0.7.14
Sphinx==2.4.4
pyyaml pyyaml
...@@ -157,7 +157,7 @@ Returns 0 on success and an appropriate error value on failure. ...@@ -157,7 +157,7 @@ Returns 0 on success and an appropriate error value on failure.
int rpmsg_trysendto(struct rpmsg_endpoint *ept, void *data, int len, u32 dst) int rpmsg_trysendto(struct rpmsg_endpoint *ept, void *data, int len, u32 dst)
sends a message across to the remote processor from a given endoint, sends a message across to the remote processor from a given endpoint,
to a destination address provided by the user. to a destination address provided by the user.
The user should specify the channel, the data it wants to send, The user should specify the channel, the data it wants to send,
......
...@@ -61,6 +61,8 @@ Storage interfaces ...@@ -61,6 +61,8 @@ Storage interfaces
scsi/index scsi/index
target/index target/index
Other subsystems
----------------
**Fixme**: much more organizational work is needed here. **Fixme**: much more organizational work is needed here.
.. toctree:: .. toctree::
......
.. SPDX-License-Identifier: GPL-2.0
.. _it_rcu_concepts:
===============
Concetti su RCU
===============
.. toctree::
:maxdepth: 3
torture
.. only:: subproject and html
Indici
======
* :ref:`genindex`
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-ita.rst
=============================================
Le operazioni RCU per le verifiche *torture*
=============================================
CONFIG_RCU_TORTURE_TEST
=======================
L'opzione CONFIG_RCU_TORTURE_TEST è disponibile per tutte le implementazione di
RCU. L'opzione creerà un modulo rcutorture che potrete caricare per avviare le
verifiche. La verifica userà printk() per riportare lo stato, dunque potrete
visualizzarlo con dmesg (magari usate grep per filtrare "torture"). Le verifiche
inizieranno al caricamento, e si fermeranno alla sua rimozione.
I parametri di modulo hanno tutti il prefisso "rcutortute.", vedere
Documentation/admin-guide/kernel-parameters.txt.
Rapporto
========
Il rapporto sulle verifiche si presenta nel seguente modo::
rcu-torture:--- Start of test: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
rcu-torture: rtc: (null) ver: 155441 tfle: 0 rta: 155441 rtaf: 8884 rtf: 155440 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 3055767
rcu-torture: Reader Pipe: 727860534 34213 0 0 0 0 0 0 0 0 0
rcu-torture: Reader Batch: 727877838 17003 0 0 0 0 0 0 0 0 0
rcu-torture: Free-Block Circulation: 155440 155440 155440 155440 155440 155440 155440 155440 155440 155440 0
rcu-torture:--- End of test: SUCCESS: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
Sulla maggior parte dei sistemi questo rapporto si produce col comando "dmesg |
grep torture:". Su configurazioni più esoteriche potrebbe essere necessario
usare altri comandi per visualizzare i messaggi di printk(). La funzione
printk() usa KERN_ALERT, dunque i messaggi dovrebbero essere ben visibili. ;-)
La prima e l'ultima riga mostrano i parametri di module di rcutorture, e solo
sull'ultima riga abbiamo il risultato finale delle verifiche effettuate che può
essere "SUCCESS" (successo) or "FAILURE" (insuccesso).
Le voci sono le seguenti:
* "rtc": L'indirizzo in esadecimale della struttura attualmente visibile dai
lettori.
* "ver": Il numero di volte dall'avvio che il processo scrittore di RCU ha
cambiato la struttura visible ai lettori.
* "tfle": se non è zero, indica la lista di strutture "torture freelist" da
mettere in "rtc" è vuota. Questa condizione è importante perché potrebbe
illuderti che RCU stia funzionando mentre invece non è il caso. :-/
* "rta": numero di strutture allocate dalla lista "torture freelist".
* "rtaf": il numero di allocazioni fallite dalla lista "torture freelist" a
causa del fatto che fosse vuota. Non è inusuale che sia diverso da zero, ma è
un brutto segno se questo numero rappresenta una frazione troppo alta di
"rta".
* "rtf": il numero di rilasci nella lista "torture freelist"
* "rtmbe": Un valore diverso da zero indica che rcutorture crede che
rcu_assign_pointer() e rcu_dereference() non funzionino correttamente. Il
valore dovrebbe essere zero.
* "rtbe": un valore diverso da zero indica che le funzioni della famiglia
rcu_barrier() non funzionano correttamente.
* "rtbke": rcutorture è stato capace di creare dei kthread real-time per forzare
l'inversione di priorità di RCU. Il valore dovrebbe essere zero.
* "rtbre": sebbene rcutorture sia riuscito a creare dei kthread capaci di
forzare l'inversione di priorità, non è riuscito però ad impostarne la
priorità real-time al livello 1. Il valore dovrebbe essere zero.
* "rtbf": Il numero di volte che è fallita la promozione della priorità per
risolvere un'inversione.
* "rtb": Il numero di volte che rcutorture ha provato a forzare l'inversione di
priorità. Il valore dovrebbe essere diverso da zero Se state verificando la
promozione della priorità col parametro "test_bootst".
* "nt": il numero di volte che rcutorture ha eseguito codice lato lettura
all'interno di un gestore di *timer*. Questo valore dovrebbe essere diverso da
zero se avete specificato il parametro "irqreader".
* "Reader Pipe": un istogramma dell'età delle strutture viste dai lettori. RCU
non funziona correttamente se una qualunque voce, dalla terza in poi, ha un
valore diverso da zero. Se dovesse succedere, rcutorture stampa la stringa
"!!!" per renderlo ben visibile. L'età di una struttura appena creata è zero,
diventerà uno quando sparisce dalla visibilità di un lettore, e incrementata
successivamente per ogni periodo di grazia; infine rilasciata dopo essere
passata per (RCU_TORTURE_PIPE_LEN-2) periodi di grazia.
L'istantanea qui sopra è stata presa da una corretta implementazione di RCU.
Se volete vedere come appare quando non funziona, sbizzarritevi nel romperla.
;-)
* "Reader Batch": un istogramma di età di strutture viste dai lettori, ma
conteggiata in termini di lotti piuttosto che periodi. Anche qui dalla terza
voce in poi devono essere zero. La ragione d'esistere di questo rapporto è che
a volte è più facile scatenare un terzo valore diverso da zero qui piuttosto
che nella lista "Reader Pipe".
* "Free-Block Circulation": il numero di strutture *torture* che hanno raggiunto
un certo punto nella catena. Il primo numero dovrebbe corrispondere
strettamente al numero di strutture allocate; il secondo conta quelle rimosse
dalla vista dei lettori. Ad eccezione dell'ultimo valore, gli altri
corrispondono al numero di passaggi attraverso il periodo di grazia. L'ultimo
valore dovrebbe essere zero, perché viene incrementato solo se il contatore
della struttura torture viene in un qualche modo incrementato oltre il
normale.
Una diversa implementazione di RCU potrebbe fornire informazioni aggiuntive. Per
esempio, *Tree SRCU* fornisce anche la seguente riga::
srcud-torture: Tree SRCU per-CPU(idx=0): 0(35,-21) 1(-4,24) 2(1,1) 3(-26,20) 4(28,-47) 5(-9,4) 6(-10,14) 7(-14,11) T(1,6)
Questa riga mostra lo stato dei contatori per processore, in questo caso per
*Tree SRCU*, usando un'allocazione dinamica di srcu_struct (dunque "srcud-"
piuttosto che "srcu-"). I numeri fra parentesi sono i valori del "vecchio"
contatore e di quello "corrente" per ogni processore. Il valore "idx" mappa
questi due valori nell'array, ed è utile per il *debug*. La "T" finale contiene
il valore totale dei contatori.
Uso su specifici kernel
=======================
A volte può essere utile eseguire RCU torture su un kernel già compilato, ad
esempio quando lo si sta per mettere in proeduzione. In questo caso, il kernel
dev'essere compilato con CONFIG_RCU_TORTURE_TEST=m, cosicché le verifiche possano
essere avviate usano modprobe e terminate con rmmod.
Per esempio, potreste usare questo script::
#!/bin/sh
modprobe rcutorture
sleep 3600
rmmod rcutorture
dmesg | grep torture:
Potete controllare il rapporto verificando manualmente la presenza del marcatore
di errore "!!!". Ovviamente, siete liberi di scriverne uno più elaborato che
identifichi automaticamente gli errori. Il comando "rmmod" forza la stampa di
"SUCCESS" (successo), "FAILURE" (fallimento), o "RCU_HOTPLUG". I primi due sono
autoesplicativi; invece, l'ultimo indica che non son stati trovati problemi in
RCU, tuttavia ci sono stati problemi con CPU-hotplug.
Uso sul kernel di riferimento
=============================
Quando si usa rcutorture per verificare modifiche ad RCU stesso, spesso è
necessario compilare un certo numero di kernel usando configurazioni diverse e
con parametri d'avvio diversi. In questi casi, usare modprobe ed rmmod potrebbe
richiedere molto tempo ed il processo essere suscettibile ad errori.
Dunque, viene messo a disposizione il programma
tools/testing/selftests/rcutorture/bin/kvm.sh per le architetture x86, arm64 e
powerpc. Di base, eseguirà la serie di verifiche elencate in
tools/testing/selftests/rcutorture/configs/rcu/CFLIST. Ognuna di queste verrà
eseguita per 30 minuti in una macchina virtuale con uno spazio utente minimale
fornito da un initrd generato automaticamente. Al completamento, gli artefatti
prodotti e i messaggi vengono analizzati alla ricerca di errori, ed i risultati
delle esecuzioni riassunti in un rapporto.
Su grandi sistemi, le verifiche di rcutorture posso essere velocizzare passano a
kvm.sh l'argomento --cpus. Per esempio, su un sistema a 64 processori, "--cpus
43" userà fino a 43 processori per eseguire contemporaneamente le verifiche. Su
un kernel v5.4 per eseguire tutti gli scenari in due serie, riduce il tempo
d'esecuzione da otto ore a un'ora (senza contare il tempo per compilare sedici
kernel). L'argomento "--dryrun sched" non eseguirà verifiche, piuttosto vi
informerà su come queste verranno organizzate in serie. Questo può essere utile
per capire quanti processori riservare per le verifiche in --cpus.
Non serve eseguire tutti gli scenari di verifica per ogni modifica. Per esempio,
per una modifica a Tree SRCU potete eseguire gli scenari SRCU-N e SRCU-P. Per
farlo usate l'argomento --configs di kvm.sh in questo modo: "--configs 'SRCU-N
SRCU-P'". Su grandi sistemi si possono eseguire più copie degli stessi scenari,
per esempio, un hardware che permette di eseguire 448 thread, può eseguire 5
istanze complete contemporaneamente. Per farlo::
kvm.sh --cpus 448 --configs '5*CFLIST'
Oppure, lo stesso sistema, può eseguire contemporaneamente 56 istanze dello
scenario su otto processori::
kvm.sh --cpus 448 --configs '56*TREE04'
O ancora 28 istanze per ogni scenario su otto processori::
kvm.sh --cpus 448 --configs '28*TREE03 28*TREE04'
Ovviamente, ogni esecuzione utilizzerà della memoria. Potete limitarne l'uso con
l'argomento --memory, che di base assume il valore 512M. Per poter usare valori
piccoli dovrete disabilitare le verifiche *callback-flooding* usando il
parametro --bootargs che vedremo in seguito.
A volte è utile avere informazioni aggiuntive di debug, in questo caso potete
usare il parametro --kconfig, per esempio, ``--kconfig
'CONFIG_RCU_EQS_DEBUG=y'``. In aggiunta, ci sono i parametri --gdb, --kasan, and
kcsan. Da notare che --gdb vi limiterà all'uso di un solo scenario per
esecuzione di kvm.sh e richiede di avere anche un'altra finestra aperta dalla
quale eseguire ``gdb`` come viene spiegato dal programma.
Potete passare anche i parametri d'avvio del kernel, per esempio, per
controllare i parametri del modulo rcutorture. Per esempio, per verificare
modifiche del codice RCU CPU stall-warning, usate ``bootargs
'rcutorture.stall_cpu=30``. Il programma riporterà un fallimento, ossia il
risultato della verifica. Come visto in precedenza, ridurre la memoria richiede
la disabilitazione delle verifiche *callback-flooding*::
kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \
--bootargs 'rcutorture.fwd_progress=0'
A volte tutto quello che serve è una serie completa di compilazioni del kernel.
Questo si ottiene col parametro --buildonly.
Il parametro --duration sovrascrive quello di base di 30 minuti. Per esempio,
con ``--duration 2d`` l'esecuzione sarà di due giorni, ``--duraction 5min`` di
cinque minuti, e ``--duration 45s`` di 45 secondi. L'ultimo può essere utile per
scovare rari errori nella sequenza d'avvio.
Infine, il parametro --trust-make permette ad ogni nuova compilazione del kernel
di riutilizzare tutto il possibile da quelle precedenti. Da notare che senza il
parametro --trust-make, i vostri file di *tag* potrebbero essere distrutti.
Ci sono altri parametri più misteriosi che sono documentati nel codice sorgente
dello programma kvm.sh.
Se un'esecuzione contiene degli errori, il loro numero durante la compilazione e
all'esecuzione verranno elencati alla fine fra i risultati di kvm.sh (che vi
consigliamo caldamente di reindirizzare verso un file). I file prodotti dalla
compilazione ed i risultati stampati vengono salvati, usando un riferimento
temporale, nelle cartella tools/testing/selftests/rcutorture/res. Una cartella
di queste cartelle può essere fornita a kvm-find-errors.sh per estrarne gli
errori. Per esempio::
tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh \
tools/testing/selftests/rcutorture/res/2020.01.20-15.54.23
Tuttavia, molto spesso è più conveniente aprire i file direttamente. I file
riguardanti tutti gli scenari di un'esecuzione di trovano nella cartella
principale (2020.01.20-15.54.23 nell'esempio precedente), mentre quelli
specifici per scenario si trovano in sotto cartelle che prendono il nome dello
scenario stesso (per esempio, "TREE04"). Se un dato scenario viene eseguito più
di una volta (come abbiamo visto con "--configs '56*TREE04'"), allora dalla
seconda esecuzione in poi le sottocartelle includeranno un numero di
progressione, per esempio "TREE04.2", "TREE04.3", e via dicendo.
Il file solitamente più usato nella cartella principale è testid.txt. Se la
verifica viene eseguita in un repositorio git, allora questo file conterrà il
*commit* sul quale si basano le verifiche, mentre tutte le modifiche non
registrare verranno mostrate in formato diff.
I file solitamente più usati nelle cartelle di scenario sono:
.config
Questo file contiene le opzioni di Kconfig
Make.out
Questo file contiene il risultato di compilazione per uno specifico scenario
console.log
Questo file contiene il risultato d'esecuzione per uno specifico scenario.
Questo file può essere esaminato una volta che il kernel è stato avviato,
ma potrebbe non esistere se l'avvia non è fallito.
vmlinux
Questo file contiene il kernel, e potrebbe essere utile da esaminare con
programmi come pbjdump e gdb
Ci sono altri file, ma vengono usati meno. Molti sono utili all'analisi di
rcutorture stesso o dei suoi programmi.
Nel kernel v5.4, su un sistema a 12 processori, un'esecuzione senza errori
usando gli scenari di base produce il seguente risultato::
SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ]
SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ]
SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ]
SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ]
TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ]
TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ]
TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ]
TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198
TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631
TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ]
TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844
TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497
CPU count limited from 16 to 12
TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961
TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997
TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732
CPU count limited from 16 to 12
TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011
Ripetizioni
===========
Immaginate di essere alla caccia di un raro problema che si verifica all'avvio.
Potreste usare kvm.sh, tuttavia questo ricompilerebbe il kernel ad ogni
esecuzione. Se avete bisogno di (diciamo) 1000 esecuzioni per essere sicuri di
aver risolto il problema, allora queste inutili ricompilazioni possono diventare
estremamente fastidiose.
Per questo motivo esiste kvm-again.sh.
Immaginate che un'esecuzione precedente di kvm.sh abbia lasciato i suoi
artefatti nella cartella::
tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28
Questa esecuzione può essere rieseguita senza ricompilazioni::
kvm-again.sh tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28
Alcuni dei parametri originali di kvm.sh possono essere sovrascritti, in
particolare --duration e --bootargs. Per esempio::
kvm-again.sh tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28 \
--duration 45s
rieseguirebbe il test precedente, ma solo per 45 secondi, e quindi aiutando a
trovare quel raro problema all'avvio sopracitato.
Esecuzioni distribuite
======================
Sebbene kvm.sh sia utile, le sue verifiche sono limitate ad un singolo sistema.
Non è poi così difficile usare un qualsiasi ambiente di sviluppo per eseguire
(diciamo) 5 istanze di kvm.sh su altrettanti sistemi, ma questo avvierebbe
inutili ricompilazioni del kernel. In aggiunta, il processo di distribuzione
degli scenari di verifica per rcutorture sui sistemi disponibili richiede
scrupolo perché soggetto ad errori.
Per questo esiste kvm-remote.sh.
Se il seguente comando funziona::
ssh system0 date
e funziona anche per system1, system2, system3, system4, e system5, e tutti
questi sistemi hanno 64 CPU, allora potere eseguire::
kvm-remote.sh "system0 system1 system2 system3 system4 system5" \
--cpus 64 --duration 8h --configs "5*CFLIST"
Questo compilerà lo scenario di base sul sistema locale, poi lo distribuirà agli
altri cinque sistemi elencati fra i parametri, ed eseguirà ogni scenario per
otto ore. Alla fine delle esecuzioni, i risultati verranno raccolti, registrati,
e stampati. La maggior parte dei parametri di kvm.sh possono essere usati con
kvm-remote.sh, tuttavia la lista dei sistemi deve venire sempre per prima.
L'argomento di kvm.sh ``--dryrun scenarios`` può essere utile per scoprire
quanti scenari potrebbero essere eseguiti in gruppo di sistemi.
Potete rieseguire anche una precedente esecuzione remota come abbiamo già fatto
per kvm.sh::
kvm-remote.sh "system0 system1 system2 system3 system4 system5" \
tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \
--duration 24h
In questo caso, la maggior parte dei parametri di kvm-again.sh possono essere
usati dopo il percorso alla cartella contenente gli artefatti dell'esecuzione da
ripetere.
...@@ -10,6 +10,18 @@ Utilità di base ...@@ -10,6 +10,18 @@ Utilità di base
symbol-namespaces symbol-namespaces
Primitive di sincronizzazione
=============================
Come Linux impedisce che tutto si verifichi contemporaneamente. Consultate
Documentation/translations/it_IT/locking/index.rst per maggiorni informazioni
sul tema.
.. toctree::
:maxdepth: 1
../RCU/index
.. only:: subproject and html .. only:: subproject and html
Indices Indices
......
=================
Il protocollo I2C
=================
Questo documento è una panoramica delle transazioni di base I2C e delle API
del kernel per eseguirli.
Spiegazione dei simboli
=======================
=============== ===========================================================
S Condizione di avvio
P Condizione di stop
Rd/Wr (1 bit) Bit di lettura/scrittura. Rd vale 1, Wr vale 0.
A, NA (1 bit) Bit di riconoscimento (ACK) e di non riconoscimento (NACK).
Addr (7 bit) Indirizzo I2C a 7 bit. Nota che questo può essere espanso
per ottenere un indirizzo I2C a 10 bit.
Dati (8 bit) Un byte di dati.
[..] Fra parentesi quadre i dati inviati da dispositivi I2C,
anziché dal master.
=============== ===========================================================
Transazione semplice di invio
=============================
Implementato da i2c_master_send()::
S Addr Wr [A] Dati [A] Dati [A] ... [A] Dati [A] P
Transazione semplice di ricezione
=================================
Implementato da i2c_master_recv()::
S Addr Rd [A] [Dati] A [Dati] A ... A [Dati] NA P
Transazioni combinate
=====================
Implementato da i2c_transfer().
Sono come le transazioni di cui sopra, ma invece di uno condizione di stop P
viene inviata una condizione di inizio S e la transazione continua.
Un esempio di lettura di un byte, seguita da una scrittura di un byte::
S Addr Rd [A] [Dati] NA S Addr Wr [A] Dati [A] P
Transazioni modificate
======================
Le seguenti modifiche al protocollo I2C possono essere generate
impostando questi flag per i messaggi I2C. Ad eccezione di I2C_M_NOSTART, sono
di solito necessari solo per risolvere problemi di un dispositivo:
I2C_M_IGNORE_NAK:
Normalmente il messaggio viene interrotto immediatamente se il dispositivo
risponde con [NA]. Impostando questo flag, si considera qualsiasi [NA] come
[A] e tutto il messaggio viene inviato.
Questi messaggi potrebbero comunque non riuscire a raggiungere il timeout
SCL basso->alto.
I2C_M_NO_RD_ACK:
In un messaggio di lettura, il bit A/NA del master viene saltato.
I2C_M_NOSTART:
In una transazione combinata, potrebbe non essere generato alcun
"S Addr Wr/Rd [A]".
Ad esempio, impostando I2C_M_NOSTART sul secondo messaggio parziale
genera qualcosa del tipo::
S Addr Rd [A] [Dati] NA Dati [A] P
Se si imposta il flag I2C_M_NOSTART per il primo messaggio parziale,
non viene generato Addr, ma si genera la condizione di avvio S.
Questo probabilmente confonderà tutti gli altri dispositivi sul bus, quindi
meglio non usarlo.
Questo viene spesso utilizzato per raccogliere le trasmissioni da più
buffer di dati presenti nella memoria di sistema in qualcosa che appare
come un singolo trasferimento verso il dispositivo I2C. Inoltre, alcuni
dispositivi particolari lo utilizzano anche tra i cambi di direzione.
I2C_M_REV_DIR_ADDR:
Questo inverte il flag Rd/Wr. Cioè, se si vuole scrivere, ma si ha bisogno
di emettere una Rd invece di una Wr, o viceversa, si può impostare questo
flag.
Per esempio::
S Addr Rd [A] Dati [A] Dati [A] ... [A] Dati [A] P
I2C_M_STOP:
Forza una condizione di stop (P) dopo il messaggio. Alcuni protocolli
simili a I2C come SCCB lo richiedono. Normalmente, non si vuole essere
interrotti tra i messaggi di un trasferimento.
.. SPDX-License-Identifier: GPL-2.0
=========================
Il sottosistema I2C/SMBus
=========================
Introduzione
============
.. toctree::
:maxdepth: 1
summary
i2c-protocol
Scrivere un device driver
=========================
.. toctree::
:maxdepth: 1
Debugging
=========
.. toctree::
:maxdepth: 1
Slave I2C
=========
.. toctree::
:maxdepth: 1
Argomenti avanzati
==================
.. toctree::
:maxdepth: 1
.. only:: subproject and html
Indici
======
* :ref:`genindex`
==========================
Introduzione a I2C e SMBus
==========================
I²C (letteralmente "I al quadrato C" e scritto I2C nella documentazione del
kernel) è un protocollo sviluppato da Philips. É un protocollo lento a 2 fili
(a velocità variabile, al massimo 400KHz), con un'estensione per le velocità
elevate (3.4 MHz). Questo protocollo offre un bus a basso costo per collegare
dispositivi di vario genere a cui si accede sporadicamente e utilizzando
poca banda. Alcuni sistemi usano varianti che non rispettano i requisiti
originali, per cui non sono indicati come I2C, ma hanno nomi diversi, per
esempio TWI (Interfaccia a due fili), IIC.
L'ultima specifica ufficiale I2C è la `"Specifica I2C-bus e manuale utente"
(UM10204) <https://www.nxp.com/webapp/Download?colCode=UM10204>`_
pubblicata da NXP Semiconductors. Tuttavia, è necessario effettuare il login
al sito per accedere al PDF. Una versione precedente della specifica
(revisione 6) è archiviata
`qui <https://web.archive.org/web/20210813122132/
https://www.nxp.com/docs/en/user-guide/UM10204.pdf>`_.
SMBus (Bus per la gestione del sistema) si basa sul protocollo I2C ed è
principalmente un sottoinsieme di protocolli e segnali I2C. Molti dispositivi
I2C funzioneranno su SMBus, ma alcuni protocolli SMBus aggiungono semantica
oltre quanto richiesto da I2C. Le moderne schede madri dei PC si affidano a
SMBus. I più comuni dispositivi collegati tramite SMBus sono moduli RAM
configurati utilizzando EEPROM I2C, e circuiti integrati di monitoraggio
hardware.
Poiché SMBus è principalmente un sottoinsieme del bus I2C,
possiamo farne uso su molti sistemi I2C. Ci sono però sistemi che non
soddisfano i vincoli elettrici sia di SMBus che di I2C; e altri che non possono
implementare tutta la semantica o messaggi comuni del protocollo SMBus.
Terminologia
============
Utilizzando la terminologia della documentazione ufficiale, il bus I2C connette
uno o più circuiti integrati *master* e uno o più circuiti integrati *slave*.
.. kernel-figure:: ../../../i2c/i2c_bus.svg
:alt: Un semplice bus I2C con un master e 3 slave
Un semplice Bus I2C
Un circuito integrato **master** è un nodo che inizia le comunicazioni con gli
slave. Nell'implementazione del kernel Linux è chiamato **adattatore** o bus. I
driver degli adattatori si trovano nella sottocartella ``drivers/i2c/busses/``.
Un **algoritmo** contiene codice generico che può essere utilizzato per
implementare una intera classe di adattatori I2C. Ciascun driver dell'
adattatore specifico dipende da un driver dell'algoritmo nella sottocartella
``drivers/i2c/algos/`` o include la propria implementazione.
Un circuito integrato **slave** è un nodo che risponde alle comunicazioni
quando indirizzato dal master. In Linux è chiamato **client** (dispositivo). I
driver dei dispositivi sono contenuti in una cartella specifica per la
funzionalità che forniscono, ad esempio ``drivers/media/gpio/`` per espansori
GPIO e ``drivers/media/i2c/`` per circuiti integrati relativi ai video.
Per la configurazione di esempio in figura, avrai bisogno di un driver per il
tuo adattatore I2C e driver per i tuoi dispositivi I2C (solitamente un driver
per ciascuno dispositivo).
...@@ -91,6 +91,8 @@ interfacciarsi con il resto del kernel. ...@@ -91,6 +91,8 @@ interfacciarsi con il resto del kernel.
:maxdepth: 1 :maxdepth: 1
core-api/index core-api/index
Sincronizzazione nel kernel <locking/index>
subsystem-apis
Strumenti e processi per lo sviluppo Strumenti e processi per lo sviluppo
==================================== ====================================
......
.. SPDX-License-Identifier: GPL-2.0
================
Sincronizzazione
================
.. toctree::
:maxdepth: 1
locktypes
lockdep-design
lockstat
locktorture
.. only:: subproject and html
Indici
======
* :ref:`genindex`
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-ita.rst
Validatore di sincronizzazione durante l'esecuzione
===================================================
Classi di blocchi
-----------------
L'oggetto su cui il validatore lavora è una "classe" di blocchi.
Una classe di blocchi è un gruppo di blocchi che seguono le stesse regole di
sincronizzazione, anche quando i blocchi potrebbero avere più istanze (anche
decine di migliaia). Per esempio un blocco nella struttura inode è una classe,
mentre ogni inode sarà un'istanza di questa classe di blocco.
Il validatore traccia lo "stato d'uso" di una classe di blocchi e le sue
dipendenze con altre classi. L'uso di un blocco indica come quel blocco viene
usato rispetto al suo contesto d'interruzione, mentre le dipendenze di un blocco
possono essere interpretate come il loro ordine; per esempio L1 -> L2 suggerisce
che un processo cerca di acquisire L2 mentre già trattiene L1. Dal punto di
vista di lockdep, i due blocchi (L1 ed L2) non sono per forza correlati: quella
dipendenza indica solamente l'ordine in cui sono successe le cose. Il validatore
verifica permanentemente la correttezza dell'uso dei blocchi e delle loro
dipendenze, altrimenti ritornerà un errore.
Il comportamento di una classe di blocchi viene costruito dall'insieme delle sue
istanze. Una classe di blocco viene registrata alla creazione della sua prima
istanza, mentre tutte le successive istanze verranno mappate; dunque, il loro
uso e le loro dipendenze contribuiranno a costruire quello della classe. Una
classe di blocco non sparisce quando sparisce una sua istanza, ma può essere
rimossa quando il suo spazio in memoria viene reclamato. Per esempio, questo
succede quando si rimuove un modulo, o quando una *workqueue* viene eliminata.
Stato
-----
Il validatore traccia l'uso cronologico delle classi di blocchi e ne divide
l'uso in categorie (4 USI * n STATI + 1).
I quattro USI possono essere:
- 'sempre trattenuto nel contesto <STATO>'
- 'sempre trattenuto come blocco di lettura nel contesto <STATO>'
- 'sempre trattenuto con <STATO> abilitato'
- 'sempre trattenuto come blocco di lettura con <STATO> abilitato'
gli `n` STATI sono codificati in kernel/locking/lockdep_states.h, ad oggi
includono:
- hardirq
- softirq
infine l'ultima categoria è:
- 'sempre trattenuto' [ == !unused ]
Quando vengono violate le regole di sincronizzazione, questi bit di utilizzo
vengono presentati nei messaggi di errore di sincronizzazione, fra parentesi
graffe, per un totale di `2 * n` (`n`: bit STATO). Un esempio inventato::
modprobe/2287 is trying to acquire lock:
(&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
but task is already holding lock:
(&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
Per un dato blocco, da sinistra verso destra, la posizione del bit indica l'uso
del blocco e di un eventuale blocco di lettura, per ognuno degli `n` STATI elencati
precedentemente. Il carattere mostrato per ogni bit indica:
=== ===========================================================================
'.' acquisito con interruzioni disabilitate fuori da un contesto d'interruzione
'-' acquisito in contesto d'interruzione
'+' acquisito con interruzioni abilitate
'?' acquisito in contesto d'interruzione con interruzioni abilitate
=== ===========================================================================
Il seguente esempio mostra i bit::
(&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
||||
||| \-> softirq disabilitati e fuori da un contesto di softirq
|| \--> acquisito in un contesto di softirq
| \---> hardirq disabilitati e fuori da un contesto di hardirq
\----> acquisito in un contesto di hardirq
Per un dato STATO, che il blocco sia mai stato acquisito in quel contesto di
STATO, o che lo STATO sia abilitato, ci lascia coi quattro possibili scenari
mostrati nella seguente tabella. Il carattere associato al bit indica con
esattezza in quale scenario ci si trova al momento del rapporto.
+---------------+---------------+------------------+
| | irq abilitati | irq disabilitati |
+---------------+---------------+------------------+
| sempre in irq | '?' | '-' |
+---------------+---------------+------------------+
| mai in irq | '+' | '.' |
+---------------+---------------+------------------+
Il carattere '-' suggerisce che le interruzioni sono disabilitate perché
altrimenti verrebbe mostrato il carattere '?'. Una deduzione simile può essere
fatta anche per '+'
I blocchi inutilizzati (ad esempio i mutex) non possono essere fra le cause di
un errore.
Regole dello stato per un blocco singolo
----------------------------------------
Avere un blocco sicuro in interruzioni (*irq-safe*) significa che è sempre stato
usato in un contesto d'interruzione, mentre un blocco insicuro in interruzioni
(*irq-unsafe*) significa che è sempre stato acquisito con le interruzioni
abilitate.
Una classe softirq insicura è automaticamente insicura anche per hardirq. I
seguenti stati sono mutualmente esclusivi: solo una può essere vero quando viene
usata una classe di blocco::
<hardirq-safe> o <hardirq-unsafe>
<softirq-safe> o <softirq-unsafe>
Questo perché se un blocco può essere usato in un contesto di interruzioni
(sicuro in interruzioni), allora non può mai essere acquisito con le
interruzioni abilitate (insicuro in interruzioni). Altrimenti potrebbe
verificarsi uno stallo. Per esempio, questo blocco viene acquisito, ma prima di
essere rilasciato il contesto d'esecuzione viene interrotto nuovamente, e quindi
si tenterà di acquisirlo nuovamente. Questo porterà ad uno stallo, in
particolare uno stallo ricorsivo.
Il validatore rileva e riporta gli usi di blocchi che violano queste regole per
blocchi singoli.
Regole per le dipendenze di blocchi multipli
--------------------------------------------
La stessa classe di blocco non deve essere acquisita due volte, questo perché
potrebbe portare ad uno blocco ricorsivo e dunque ad uno stallo.
Inoltre, due blocchi non possono essere trattenuti in ordine inverso::
<L1> -> <L2>
<L2> -> <L1>
perché porterebbe ad uno stallo - chiamato stallo da blocco inverso - in cui si
cerca di trattenere i due blocchi in un ciclo in cui entrambe i contesti
aspettano per sempre che l'altro termini. Il validatore è in grado di trovare
queste dipendenze cicliche di qualsiasi complessità, ovvero nel mezzo ci
potrebbero essere altre sequenze di blocchi. Il validatore troverà se questi
blocchi possono essere acquisiti circolarmente.
In aggiunta, le seguenti sequenze di blocco nei contesti indicati non sono
permesse, indipendentemente da quale che sia la classe di blocco::
<hardirq-safe> -> <hardirq-unsafe>
<softirq-safe> -> <softirq-unsafe>
La prima regola deriva dal fatto che un blocco sicuro in interruzioni può essere
trattenuto in un contesto d'interruzione che, per definizione, ha la possibilità
di interrompere un blocco insicuro in interruzioni; questo porterebbe ad uno
stallo da blocco inverso. La seconda, analogamente, ci dice che un blocco sicuro
in interruzioni software potrebbe essere trattenuto in un contesto di
interruzione software, dunque potrebbe interrompere un blocco insicuro in
interruzioni software.
Le suddette regole vengono applicate per qualsiasi sequenza di blocchi: quando
si acquisiscono nuovi blocchi, il validatore verifica se vi è una violazione
delle regole fra il nuovo blocco e quelli già trattenuti.
Quando una classe di blocco cambia stato, applicheremo le seguenti regole:
- se viene trovato un nuovo blocco sicuro in interruzioni, verificheremo se
abbia mai trattenuto dei blocchi insicuri in interruzioni.
- se viene trovato un nuovo blocco sicuro in interruzioni software,
verificheremo se abbia trattenuto dei blocchi insicuri in interruzioni
software.
- se viene trovato un nuovo blocco insicuro in interruzioni, verificheremo se
abbia trattenuto dei blocchi sicuri in interruzioni.
- se viene trovato un nuovo blocco insicuro in interruzioni software,
verificheremo se abbia trattenuto dei blocchi sicuri in interruzioni
software.
(Di nuovo, questi controlli vengono fatti perché un contesto d'interruzione
potrebbe interrompere l'esecuzione di qualsiasi blocco insicuro portando ad uno
stallo; questo anche se lo stallo non si verifica in pratica)
Eccezione: dipendenze annidate sui dati portano a blocchi annidati
------------------------------------------------------------------
Ci sono alcuni casi in cui il kernel Linux acquisisce più volte la stessa
istanza di una classe di blocco. Solitamente, questo succede quando esiste una
gerarchia fra oggetti dello stesso tipo. In questi casi viene ereditato
implicitamente l'ordine fra i due oggetti (definito dalle proprietà di questa
gerarchia), ed il kernel tratterrà i blocchi in questo ordine prefissato per
ognuno degli oggetti.
Un esempio di questa gerarchia di oggetti che producono "blocchi annidati" sono
i *block-dev* che rappresentano l'intero disco e quelli che rappresentano una
sua partizione; la partizione è una parte del disco intero, e l'ordine dei
blocchi sarà corretto fintantoche uno acquisisce il blocco del disco intero e
poi quello della partizione. Il validatore non rileva automaticamente questo
ordine implicito, perché queste regole di sincronizzazione non sono statiche.
Per istruire il validatore riguardo a questo uso corretto dei blocchi sono stati
introdotte nuove primitive per specificare i "livelli di annidamento". Per
esempio, per i blocchi a mutua esclusione dei *block-dev* si avrebbe una
chiamata simile a::
enum bdev_bd_mutex_lock_class
{
BD_MUTEX_NORMAL,
BD_MUTEX_WHOLE,
BD_MUTEX_PARTITION
};
mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
In questo caso la sincronizzazione viene fatta su un *block-dev* sapendo che si
tratta di una partizione.
Ai fini della validazione, il validatore lo considererà con una - sotto - classe
di blocco separata.
Nota: Prestate estrema attenzione che la vostra gerarchia sia corretta quando si
vogliono usare le primitive _nested(); altrimenti potreste avere sia falsi
positivi che falsi negativi.
Annotazioni
-----------
Si possono utilizzare due costrutti per verificare ed annotare se certi blocchi
devono essere trattenuti: lockdep_assert_held*(&lock) e
lockdep_*pin_lock(&lock).
Come suggerito dal nome, la famiglia di macro lockdep_assert_held* asseriscono
che un dato blocco in un dato momento deve essere trattenuto (altrimenti, verrà
generato un WARN()). Queste vengono usate abbondantemente nel kernel, per
esempio in kernel/sched/core.c::
void update_rq_clock(struct rq *rq)
{
s64 delta;
lockdep_assert_held(&rq->lock);
[...]
}
dove aver trattenuto rq->lock è necessario per aggiornare in sicurezza il clock
rq.
L'altra famiglia di macro è lockdep_*pin_lock(), che a dire il vero viene usata
solo per rq->lock ATM. Se per caso un blocco non viene trattenuto, queste
genereranno un WARN(). Questo si rivela particolarmente utile quando si deve
verificare la correttezza di codice con *callback*, dove livelli superiori
potrebbero assumere che un blocco rimanga trattenuto, ma livelli inferiori
potrebbero invece pensare che il blocco possa essere rilasciato e poi
riacquisito (involontariamente si apre una sezione critica). lockdep_pin_lock()
restituisce 'struct pin_cookie' che viene usato da lockdep_unpin_lock() per
verificare che nessuno abbia manomesso il blocco. Per esempio in
kernel/sched/sched.h abbiamo::
static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
{
rf->cookie = lockdep_pin_lock(&rq->lock);
[...]
}
static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
{
[...]
lockdep_unpin_lock(&rq->lock, rf->cookie);
}
I commenti riguardo alla sincronizzazione possano fornire informazioni utili,
tuttavia sono le verifiche in esecuzione effettuate da queste macro ad essere
vitali per scovare problemi di sincronizzazione, ed inoltre forniscono lo stesso
livello di informazioni quando si ispeziona il codice. Nel dubbio, preferite
queste annotazioni!
Dimostrazione di correttezza al 100%
------------------------------------
Il validatore verifica la proprietà di chiusura in senso matematico. Ovvero, per
ogni sequenza di sincronizzazione di un singolo processo che si verifichi almeno
una volta nel kernel, il validatore dimostrerà con una certezza del 100% che
nessuna combinazione e tempistica di queste sequenze possa causare uno stallo in
una qualsiasi classe di blocco. [1]_
In pratica, per dimostrare l'esistenza di uno stallo non servono complessi
scenari di sincronizzazione multi-processore e multi-processo. Il validatore può
dimostrare la correttezza basandosi sulla sola sequenza di sincronizzazione
apparsa almeno una volta (in qualunque momento, in qualunque processo o
contesto). Uno scenario complesso che avrebbe bisogno di 3 processori e una
sfortunata presenza di processi, interruzioni, e pessimo tempismo, può essere
riprodotto su un sistema a singolo processore.
Questo riduce drasticamente la complessità del controllo di qualità della
sincronizzazione nel kernel: quello che deve essere fatto è di innescare nel
kernel quante più possibili "semplici" sequenze di sincronizzazione, almeno una
volta, allo scopo di dimostrarne la correttezza. Questo al posto di innescare
una verifica per ogni possibile combinazione di sincronizzazione fra processori,
e differenti scenari con hardirq e softirq e annidamenti vari (nella pratica,
impossibile da fare)
.. [1]
assumendo che il validatore sia corretto al 100%, e che nessun altra parte
del sistema possa corromperne lo stato. Assumiamo anche che tutti i percorsi
MNI/SMM [potrebbero interrompere anche percorsi dove le interruzioni sono
disabilitate] sono corretti e non interferiscono con il validatore. Inoltre,
assumiamo che un hash a 64-bit sia unico per ogni sequenza di
sincronizzazione nel sistema. Infine, la ricorsione dei blocchi non deve
essere maggiore di 20.
Prestazione
-----------
Le regole sopracitate hanno bisogno di una quantità **enorme** di verifiche
durante l'esecuzione. Il sistema sarebbe diventato praticamente inutilizzabile
per la sua lentezza se le avessimo fatte davvero per ogni blocco trattenuto e
per ogni abilitazione delle interruzioni. La complessità della verifica è
O(N^2), quindi avremmo dovuto fare decine di migliaia di verifiche per ogni
evento, il tutto per poche centinaia di classi.
Il problema è stato risolto facendo una singola verifica per ogni 'scenario di
sincronizzazione' (una sequenza unica di blocchi trattenuti uno dopo l'altro).
Per farlo, viene mantenuta una pila dei blocchi trattenuti, e viene calcolato un
hash a 64-bit unico per ogni sequenza. Quando la sequenza viene verificata per
la prima volta, l'hash viene inserito in una tabella hash. La tabella potrà
essere verificata senza bisogno di blocchi. Se la sequenza dovesse ripetersi, la
tabella ci dirà che non è necessario verificarla nuovamente.
Risoluzione dei problemi
------------------------
Il massimo numero di classi di blocco che il validatore può tracciare è:
MAX_LOCKDEP_KEYS. Oltrepassare questo limite indurrà lokdep a generare il
seguente avviso::
(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
Di base questo valore è 8191, e un classico sistema da ufficio ha meno di 1000
classi, dunque questo avviso è solitamente la conseguenza di un problema di
perdita delle classi di blocco o d'inizializzazione dei blocchi. Di seguito una
descrizione dei due problemi:
1. caricare e rimuovere continuamente i moduli mentre il validatore è in
esecuzione porterà ad una perdita di classi di blocco. Il problema è che ogni
caricamento crea un nuovo insieme di classi di blocco per tutti i blocchi di
quel modulo. Tuttavia, la rimozione del modulo non rimuove le vecchie classi
(vedi dopo perché non le riusiamo). Dunque, il continuo caricamento e
rimozione di un modulo non fa altro che aumentare il contatore di classi fino
a raggiungere, eventualmente, il limite.
2. Usare array con un gran numero di blocchi che non vengono esplicitamente
inizializzati. Per esempio, una tabella hash con 8192 *bucket* dove ognuno ha
il proprio spinlock_t consumerà 8192 classi di blocco a meno che non vengano
esplicitamente inizializzati in esecuzione usando spin_lock_init() invece
dell'inizializzazione durante la compilazione con __SPIN_LOCK_UNLOCKED().
Sbagliare questa inizializzazione garantisce un esaurimento di classi di
blocco. Viceversa, un ciclo che invoca spin_lock_init() su tutti i blocchi li
mapperebbe tutti alla stessa classe di blocco.
La morale della favola è che dovete sempre inizializzare esplicitamente i
vostri blocchi.
Qualcuno potrebbe argomentare che il validatore debba permettere il riuso di
classi di blocco. Tuttavia, se siete tentati dall'argomento, prima revisionate
il codice e pensate alla modifiche necessarie, e tenendo a mente che le classi
di blocco da rimuovere probabilmente sono legate al grafo delle dipendenze. Più
facile a dirsi che a farsi.
Ovviamente, se non esaurite le classi di blocco, la prossima cosa da fare è
quella di trovare le classi non funzionanti. Per prima cosa, il seguente comando
ritorna il numero di classi attualmente in uso assieme al valore massimo::
grep "lock-classes" /proc/lockdep_stats
Questo comando produce il seguente messaggio::
lock-classes: 748 [max: 8191]
Se il numero di assegnazioni (748 qui sopra) aumenta continuamente nel tempo,
allora c'è probabilmente un problema da qualche parte. Il seguente comando può
essere utilizzato per identificare le classi di blocchi problematiche::
grep "BD" /proc/lockdep
Eseguite il comando e salvatene l'output, quindi confrontatelo con l'output di
un'esecuzione successiva per identificare eventuali problemi. Questo stesso
output può anche aiutarti a trovare situazioni in cui l'inizializzazione del
blocco è stata omessa.
Lettura ricorsiva dei blocchi
-----------------------------
Il resto di questo documento vuole dimostrare che certi cicli equivalgono ad una
possibilità di stallo.
Ci sono tre tipi di bloccatori: gli scrittori (bloccatori esclusivi, come
spin_lock() o write_lock()), lettori non ricorsivi (bloccatori condivisi, come
down_read()), e lettori ricorsivi (bloccatori condivisi ricorsivi, come
rcu_read_lock()). D'ora in poi, per questi tipi di bloccatori, useremo la
seguente notazione:
W o E: per gli scrittori (bloccatori esclusivi) (W dall'inglese per
*Writer*, ed E per *Exclusive*).
r: per i lettori non ricorsivi (r dall'inglese per *reader*).
R: per i lettori ricorsivi (R dall'inglese per *Reader*).
S: per qualsiasi lettore (non ricorsivi + ricorsivi), dato che entrambe
sono bloccatori condivisi (S dall'inglese per *Shared*).
N: per gli scrittori ed i lettori non ricorsivi, dato che entrambe sono
non ricorsivi.
Ovviamente, N equivale a "r o W" ed S a "r o R".
Come suggerisce il nome, i lettori ricorsivi sono dei bloccatori a cui è
permesso di acquisire la stessa istanza di blocco anche all'interno della
sezione critica di un altro lettore. In altre parole, permette di annidare la
stessa istanza di blocco nelle sezioni critiche dei lettori.
Dall'altro canto, lo stesso comportamento indurrebbe un lettore non ricorsivo ad
auto infliggersi uno stallo.
La differenza fra questi due tipi di lettori esiste perché: quelli ricorsivi
vengono bloccati solo dal trattenimento di un blocco di scrittura, mentre quelli
non ricorsivi possono essere bloccati dall'attesa di un blocco di scrittura.
Consideriamo il seguente esempio::
TASK A: TASK B:
read_lock(X);
write_lock(X);
read_lock_2(X);
L'attività A acquisisce il blocco di lettura X (non importa se di tipo ricorsivo
o meno) usando read_lock(). Quando l'attività B tenterà di acquisire il blocco
X, si fermerà e rimarrà in attesa che venga rilasciato. Ora se read_lock_2() è
un tipo lettore ricorsivo, l'attività A continuerà perché gli scrittori in
attesa non possono bloccare lettori ricorsivi, e non avremo alcuno stallo.
Tuttavia, se read_lock_2() è un lettore non ricorsivo, allora verrà bloccato
dall'attività B e si causerà uno stallo.
Condizioni bloccanti per lettori/scrittori su uno stesso blocco
---------------------------------------------------------------
Essenzialmente ci sono quattro condizioni bloccanti:
1. Uno scrittore blocca un altro scrittore.
2. Un lettore blocca uno scrittore.
3. Uno scrittore blocca sia i lettori ricorsivi che non ricorsivi.
4. Un lettore (ricorsivo o meno) non blocca altri lettori ricorsivi ma potrebbe
bloccare quelli non ricorsivi (perché potrebbero esistere degli scrittori in
attesa).
Di seguito le tabella delle condizioni bloccanti, Y (*Yes*) significa che il
tipo in riga blocca quello in colonna, mentre N l'opposto.
+---+---+---+---+
| | W | r | R |
+---+---+---+---+
| W | Y | Y | Y |
+---+---+---+---+
| r | Y | Y | N |
+---+---+---+---+
| R | Y | Y | N |
+---+---+---+---+
(W: scrittori, r: lettori non ricorsivi, R: lettori ricorsivi)
Al contrario dei blocchi per lettori non ricorsivi, quelli ricorsivi vengono
trattenuti da chi trattiene il blocco di scrittura piuttosto che da chi ne
attende il rilascio. Per esempio::
TASK A: TASK B:
read_lock(X);
write_lock(X);
read_lock(X);
non produce uno stallo per i lettori ricorsivi, in quanto il processo B rimane
in attesta del blocco X, mentre il secondo read_lock() non ha bisogno di
aspettare perché si tratta di un lettore ricorsivo. Tuttavia, se read_lock()
fosse un lettore non ricorsivo, questo codice produrrebbe uno stallo.
Da notare che in funzione dell'operazione di blocco usate per l'acquisizione (in
particolare il valore del parametro 'read' in lock_acquire()), un blocco può
essere di scrittura (blocco esclusivo), di lettura non ricorsivo (blocco
condiviso e non ricorsivo), o di lettura ricorsivo (blocco condiviso e
ricorsivo). In altre parole, per un'istanza di blocco esistono tre tipi di
acquisizione che dipendono dalla funzione di acquisizione usata: esclusiva, di
lettura non ricorsiva, e di lettura ricorsiva.
In breve, chiamiamo "non ricorsivi" blocchi di scrittura e quelli di lettura non
ricorsiva, mentre "ricorsivi" i blocchi di lettura ricorsivi.
I blocchi ricorsivi non si bloccano a vicenda, mentre quelli non ricorsivi sì
(anche in lettura). Un blocco di lettura non ricorsivi può bloccare uno
ricorsivo, e viceversa.
Il seguente esempio mostra uno stallo con blocchi ricorsivi::
TASK A: TASK B:
read_lock(X);
read_lock(Y);
write_lock(Y);
write_lock(X);
Il processo A attende che il processo B esegua read_unlock() so Y, mentre il
processo B attende che A esegua read_unlock() su X.
Tipi di dipendenze e percorsi forti
-----------------------------------
Le dipendenze fra blocchi tracciano l'ordine con cui una coppia di blocchi viene
acquisita, e perché vi sono 3 tipi di bloccatori, allora avremo 9 tipi di
dipendenze. Tuttavia, vi mostreremo che 4 sono sufficienti per individuare gli
stalli.
Per ogni dipendenza fra blocchi avremo::
L1 -> L2
Questo significa che lockdep ha visto acquisire L1 prima di L2 nello stesso
contesto di esecuzione. Per quanto riguarda l'individuazione degli stalli, ci
interessa sapere se possiamo rimanere bloccati da L2 mentre L1 viene trattenuto.
In altre parole, vogliamo sapere se esiste un bloccatore L3 che viene bloccato
da L1 e un L2 che viene bloccato da L3. Dunque, siamo interessati a (1) quello
che L1 blocca e (2) quello che blocca L2. Di conseguenza, possiamo combinare
lettori ricorsivi e non per L1 (perché bloccano gli stessi tipi) e possiamo
combinare scrittori e lettori non ricorsivi per L2 (perché vengono bloccati
dagli stessi tipi).
Con questa semplificazione, possiamo dedurre che ci sono 4 tipi di rami nel
grafo delle dipendenze di lockdep:
1) -(ER)->:
dipendenza da scrittore esclusivo a lettore ricorsivo. "X -(ER)-> Y"
significa X -> Y, dove X è uno scrittore e Y un lettore ricorsivo.
2) -(EN)->:
dipendenza da scrittore esclusivo a bloccatore non ricorsivo.
"X -(EN)->" significa X-> Y, dove X è uno scrittore e Y può essere
o uno scrittore o un lettore non ricorsivo.
3) -(SR)->:
dipendenza da lettore condiviso a lettore ricorsivo. "X -(SR)->"
significa X -> Y, dove X è un lettore (ricorsivo o meno) e Y è un
lettore ricorsivo.
4) -(SN)->:
dipendenza da lettore condiviso a bloccatore non ricorsivo.
"X -(SN)-> Y" significa X -> Y , dove X è un lettore (ricorsivo
o meno) e Y può essere o uno scrittore o un lettore non ricorsivo.
Da notare che presi due blocchi, questi potrebbero avere più dipendenza fra di
loro. Per esempio::
TASK A:
read_lock(X);
write_lock(Y);
...
TASK B:
write_lock(X);
write_lock(Y);
Nel grafo delle dipendenze avremo sia X -(SN)-> Y che X -(EN)-> Y.
Usiamo -(xN)-> per rappresentare i rami sia per -(EN)-> che -(SN)->, allo stesso
modo -(Ex)->, -(xR)-> e -(Sx)->
Un "percorso" in un grafo è una serie di nodi e degli archi che li congiungono.
Definiamo un percorso "forte", come il percorso che non ha archi (dipendenze) di
tipo -(xR)-> e -(Sx)->. In altre parole, un percorso "forte" è un percorso da un
blocco ad un altro attraverso le varie dipendenze, e se sul percorso abbiamo X
-> Y -> Z (dove X, Y, e Z sono blocchi), e da X a Y si ha una dipendenza -(SR)->
o -(ER)->, allora fra Y e Z non deve esserci una dipendenza -(SN)-> o -(SR)->.
Nella prossima sezione vedremo perché definiamo questo percorso "forte".
Identificazione di stalli da lettura ricorsiva
----------------------------------------------
Ora vogliamo dimostrare altre due cose:
Lemma 1:
Se esiste un percorso chiuso forte (ciclo forte), allora esiste anche una
combinazione di sequenze di blocchi che causa uno stallo. In altre parole,
l'esistenza di un ciclo forte è sufficiente alla scoperta di uno stallo.
Lemma 2:
Se non esiste un percorso chiuso forte (ciclo forte), allora non esiste una
combinazione di sequenze di blocchi che causino uno stallo. In altre parole, i
cicli forti sono necessari alla rilevazione degli stallo.
Con questi due lemmi possiamo facilmente affermare che un percorso chiuso forte
è sia sufficiente che necessario per avere gli stalli, dunque averli equivale
alla possibilità di imbattersi concretamente in uno stallo. Un percorso chiuso
forte significa che può causare stalli, per questo lo definiamo "forte", ma ci
sono anche cicli di dipendenze che non causeranno stalli.
Dimostrazione di sufficienza (lemma 1):
Immaginiamo d'avere un ciclo forte::
L1 -> L2 ... -> Ln -> L1
Questo significa che abbiamo le seguenti dipendenze::
L1 -> L2
L2 -> L3
...
Ln-1 -> Ln
Ln -> L1
Ora possiamo costruire una combinazione di sequenze di blocchi che causano lo
stallo.
Per prima cosa facciamo sì che un processo/processore prenda L1 in L1 -> L2, poi
un altro prende L2 in L2 -> L3, e così via. Alla fine, tutti i Lx in Lx -> Lx+1
saranno trattenuti da processi/processori diversi.
Poi visto che abbiamo L1 -> L2, chi trattiene L1 vorrà acquisire L2 in L1 -> L2,
ma prima dovrà attendere che venga rilasciato da chi lo trattiene. Questo perché
L2 è già trattenuto da un altro processo/processore, ed in più L1 -> L2 e L2 ->
L3 non sono -(xR)-> né -(Sx)-> (la definizione di forte). Questo significa che L2
in L1 -> L2 non è un bloccatore non ricorsivo (bloccabile da chiunque), e L2 in
L2 -> L3 non è uno scrittore (che blocca chiunque).
In aggiunta, possiamo trarre una simile conclusione per chi sta trattenendo L2:
deve aspettare che L3 venga rilasciato, e così via. Ora possiamo dimostrare che
chi trattiene Lx deve aspettare che Lx+1 venga rilasciato. Notiamo che Ln+1 è
L1, dunque si è creato un ciclo dal quale non possiamo uscire, quindi si ha uno
stallo.
Dimostrazione della necessità (lemma 2):
Questo lemma equivale a dire che: se siamo in uno scenario di stallo, allora
deve esiste un ciclo forte nel grafo delle dipendenze.
Secondo Wikipedia[1], se c'è uno stallo, allora deve esserci un ciclo di attese,
ovvero ci sono N processi/processori dove P1 aspetta un blocco trattenuto da P2,
e P2 ne aspetta uno trattenuto da P3, ... e Pn attende che il blocco P1 venga
rilasciato. Chiamiamo Lx il blocco che attende Px, quindi P1 aspetta L1 e
trattiene Ln. Quindi avremo Ln -> L1 nel grafo delle dipendenze. Similarmente,
nel grafo delle dipendenze avremo L1 -> L2, L2 -> L3, ..., Ln-1 -> Ln, il che
significa che abbiamo un ciclo::
Ln -> L1 -> L2 -> ... -> Ln
, ed ora dimostriamo d'avere un ciclo forte.
Per un blocco Lx, il processo Px contribuisce alla dipendenza Lx-1 -> Lx e Px+1
contribuisce a quella Lx -> Lx+1. Visto che Px aspetta che Px+1 rilasci Lx, sarà
impossibile che Lx in Px+1 sia un lettore e che Lx in Px sia un lettore
ricorsivo. Questo perché i lettori (ricorsivi o meno) non bloccano lettori
ricorsivi. Dunque, Lx-1 -> Lx e Lx -> Lx+1 non possono essere una coppia di
-(xR)-> -(Sx)->. Questo è vero per ogni ciclo, dunque, questo è un ciclo forte.
Riferimenti
-----------
[1]: https://it.wikipedia.org/wiki/Stallo_(informatica)
[2]: Shibu, K. (2009). Intro To Embedded Systems (1st ed.). Tata McGraw-Hill
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-ita.rst
=======================
Statistiche sui blocchi
=======================
Cosa
====
Come suggerisce il nome, fornisce statistiche sui blocchi.
Perché
======
Perché, tanto per fare un esempio, le contese sui blocchi possono influenzare
significativamente le prestazioni.
Come
====
*Lockdep* ha punti di collegamento nelle funzioni di blocco e inoltre
mappa le istanze di blocco con le relative classi. Partiamo da questo punto
(vedere Documentation/translations/it_IT/locking/lockdep-design.rst).
Il grafico sottostante mostra la relazione che intercorre fra le
funzioni di blocco e i vari punti di collegamenti che ci sono al loro
interno::
__acquire
|
lock _____
| \
| __contended
| |
| <wait>
| _______/
|/
|
__acquired
|
.
<hold>
.
|
__release
|
unlock
lock, unlock - le classiche funzioni di blocco
__* - i punti di collegamento
<> - stati
Grazie a questi punti di collegamento possiamo fornire le seguenti statistiche:
con-bounces
- numero di contese su un blocco che riguarda dati di un processore
contentions
- numero di acquisizioni di blocchi che hanno dovuto attendere
wait time
min
- tempo minimo (diverso da zero) che sia mai stato speso in attesa di
un blocco
max
- tempo massimo che sia mai stato speso in attesa di un blocco
total
- tempo totale speso in attesa di un blocco
avg
- tempo medio speso in attesa di un blocco
acq-bounces
- numero di acquisizioni di blocco che riguardavano i dati su un processore
acquisitions
- numero di volte che un blocco è stato ottenuto
hold time
min
- tempo minimo (diverso da zero) che sia mai stato speso trattenendo un blocco
max
- tempo massimo che sia mai stato speso trattenendo un blocco
total
- tempo totale di trattenimento di un blocco
avg
- tempo medio di trattenimento di un blocco
Questi numeri vengono raccolti per classe di blocco, e per ogni stato di
lettura/scrittura (quando applicabile).
Inoltre, questa raccolta di statistiche tiene traccia di 4 punti di contesa
per classe di blocco. Un punto di contesa è una chiamata che ha dovuto
aspettare l'acquisizione di un blocco.
Configurazione
--------------
Le statistiche sui blocchi si abilitano usando l'opzione di configurazione
CONFIG_LOCK_STAT.
Uso
---
Abilitare la raccolta di statistiche::
# echo 1 >/proc/sys/kernel/lock_stat
Disabilitare la raccolta di statistiche::
# echo 0 >/proc/sys/kernel/lock_stat
Per vedere le statistiche correnti sui blocchi::
( i numeri di riga non fanno parte dell'output del comando, ma sono stati
aggiunti ai fini di questa spiegazione )
# less /proc/lock_stat
01 lock_stat version 0.4
02-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
03 class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
04-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
05
06 &mm->mmap_sem-W: 46 84 0.26 939.10 16371.53 194.90 47291 2922365 0.16 2220301.69 17464026916.32 5975.99
07 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77
08 ---------------
09 &mm->mmap_sem 1 [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280
10 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
11 &mm->mmap_sem 34 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
12 &mm->mmap_sem 17 [<ffffffff81127e71>] vm_munmap+0x41/0x80
13 ---------------
14 &mm->mmap_sem 1 [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0
15 &mm->mmap_sem 60 [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250
16 &mm->mmap_sem 41 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
17 &mm->mmap_sem 68 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
18
19.............................................................................................................................................................................................................................
20
21 unix_table_lock: 110 112 0.21 49.24 163.91 1.46 21094 66312 0.12 624.42 31589.81 0.48
22 ---------------
23 unix_table_lock 45 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
24 unix_table_lock 47 [<ffffffff8150b111>] unix_release_sock+0x31/0x250
25 unix_table_lock 15 [<ffffffff8150ca37>] unix_find_other+0x117/0x230
26 unix_table_lock 5 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
27 ---------------
28 unix_table_lock 39 [<ffffffff8150b111>] unix_release_sock+0x31/0x250
29 unix_table_lock 49 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
30 unix_table_lock 20 [<ffffffff8150ca37>] unix_find_other+0x117/0x230
31 unix_table_lock 4 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
Questo estratto mostra le statistiche delle prime due classi di
blocco. La riga 01 mostra la versione dell'output - la versione
cambierà ogni volta che cambia il formato. Le righe dalla 02 alla 04
rappresentano l'intestazione con la descrizione delle colonne. Le
statistiche sono mostrate nelle righe dalla 05 alla 18 e dalla 20
alla 31. Queste statistiche sono divise in due parti: le statistiche,
seguite dai punti di contesa (righe 08 e 13) separati da un divisore.
Le righe dalla 09 alla 12 mostrano i primi quattro punti di contesa
registrati (il codice che tenta di acquisire un blocco) e le righe
dalla 14 alla 17 mostrano i primi quattro punti contesi registrati
(ovvero codice che ha acquisito un blocco). È possibile che nelle
statistiche manchi il valore *max con-bounces*.
Il primo blocco (righe dalla 05 alla 18) è di tipo lettura/scrittura e quindi
mostra due righe prima del divisore. I punti di contesa non corrispondono alla
descrizione delle colonne nell'intestazione; essi hanno due colonne: *punti di
contesa* e *[<IP>] simboli*. Il secondo gruppo di punti di contesa sono i punti
con cui si contende il blocco.
La parte interna del tempo è espressa in us (microsecondi).
Quando si ha a che fare con blocchi annidati si potrebbero vedere le
sottoclassi di blocco::
32...........................................................................................................................................................................................................................
33
34 &rq->lock: 13128 13128 0.43 190.53 103881.26 7.91 97454 3453404 0.00 401.11 13224683.11 3.82
35 ---------
36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a
39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb
40 ---------
41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8
45
46...........................................................................................................................................................................................................................
47
48 &rq->lock/1: 1526 11488 0.33 388.73 136294.31 11.86 21461 38404 0.00 37.93 109388.53 2.84
49 -----------
50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
51 -----------
52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8
54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
La riga 48 mostra le statistiche per la seconda sottoclasse (/1) della
classe *&irq->lock* (le sottoclassi partono da 0); in questo caso,
come suggerito dalla riga 50, ``double_rq_lock`` tenta di acquisire un blocco
annidato di due spinlock.
Per vedere i blocco più contesi::
# grep : /proc/lock_stat | head
clockevents_lock: 2926159 2947636 0.15 46882.81 1784540466.34 605.41 3381345 3879161 0.00 2260.97 53178395.68 13.71
tick_broadcast_lock: 346460 346717 0.18 2257.43 39364622.71 113.54 3642919 4242696 0.00 2263.79 49173646.60 11.59
&mapping->i_mmap_mutex: 203896 203899 3.36 645530.05 31767507988.39 155800.21 3361776 8893984 0.17 2254.15 14110121.02 1.59
&rq->lock: 135014 136909 0.18 606.09 842160.68 6.15 1540728 10436146 0.00 728.72 17606683.41 1.69
&(&zone->lru_lock)->rlock: 93000 94934 0.16 59.18 188253.78 1.98 1199912 3809894 0.15 391.40 3559518.81 0.93
tasklist_lock-W: 40667 41130 0.23 1189.42 428980.51 10.43 270278 510106 0.16 653.51 3939674.91 7.72
tasklist_lock-R: 21298 21305 0.20 1310.05 215511.12 10.12 186204 241258 0.14 1162.33 1179779.23 4.89
rcu_node_1: 47656 49022 0.16 635.41 193616.41 3.95 844888 1865423 0.00 764.26 1656226.96 0.89
&(&dentry->d_lockref.lock)->rlock: 39791 40179 0.15 1302.08 88851.96 2.21 2790851 12527025 0.10 1910.75 3379714.27 0.27
rcu_node_0: 29203 30064 0.16 786.55 1555573.00 51.74 88963 244254 0.00 398.87 428872.51 1.76
Per cancellare le statistiche::
# echo 0 > /proc/lock_stat
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-ita.rst
============================================
Funzionamento del test *Kernel Lock Torture*
============================================
CONFIG_LOCK_TORTURE_TEST
========================
L'opzione di configurazione CONFIG_LOCK_TORTURE_TEST fornisce un
modulo kernel che esegue delle verifiche che *torturano* le primitive di
sincronizzazione del kernel. Se dovesse servire, il modulo kernel,
'locktorture', può essere generato successivamente su un kernel che
volete verificare. Periodicamente le verifiche stampano messaggi tramite
``printk()`` e che quindi possono essere letti tramite ``dmesg`` (magari
filtrate l'output con ``grep "torture"``). La verifica inizia quando
il modulo viene caricato e termina quando viene rimosso. Questo
programma si basa sulle modalità di verifica di RCU tramite rcutorture.
Questa verifica consiste nella creazione di un certo numero di thread
del kernel che acquisiscono un blocco e lo trattengono per una certa
quantità di tempo così da simulare diversi comportamenti nelle sezioni
critiche. La quantità di contese su un blocco può essere simulata
allargando la sezione critica e/o creando più thread.
Parametri del modulo
====================
Questo modulo ha i seguenti parametri:
Specifici di locktorture
------------------------
nwriters_stress
Numero di thread del kernel che stresseranno l'acquisizione
esclusiva dei blocchi (scrittori). Il valore di base è il
doppio del numero di processori attivi presenti.
nreaders_stress
Numero di thread del kernel che stresseranno l'acquisizione
condivisa dei blocchi (lettori). Il valore di base è lo stesso
di nwriters_stress. Se l'utente non ha specificato
nwriters_stress, allora entrambe i valori corrisponderanno
al numero di processori attivi presenti.
torture_type
Tipo di blocco da verificare. Di base, solo gli spinlock
verranno verificati. Questo modulo può verificare anche
i seguenti tipi di blocchi:
- "lock_busted":
Simula un'incorretta implementazione del
blocco.
- "spin_lock":
coppie di spin_lock() e spin_unlock().
- "spin_lock_irq":
coppie di spin_lock_irq() e spin_unlock_irq().
- "rw_lock":
coppie di rwlock read/write lock() e unlock().
- "rw_lock_irq":
copie di rwlock read/write lock_irq() e
unlock_irq().
- "mutex_lock":
coppie di mutex_lock() e mutex_unlock().
- "rtmutex_lock":
coppie di rtmutex_lock() e rtmutex_unlock().
Il kernel deve avere CONFIG_RT_MUTEXES=y.
- "rwsem_lock":
coppie di semafori read/write down() e up().
Generici dell'ambiente di sviluppo 'torture' (RCU + locking)
------------------------------------------------------------
shutdown_secs
Numero di secondi prima che la verifica termini e il sistema
venga spento. Il valore di base è zero, il che disabilita
la possibilità di terminare e spegnere. Questa funzionalità
può essere utile per verifiche automatizzate.
onoff_interval
Numero di secondi fra ogni tentativo di esecuzione di
un'operazione casuale di CPU-hotplug. Di base è zero, il
che disabilita la funzionalità di CPU-hotplug. Nei kernel
con CONFIG_HOTPLUG_CPU=n, locktorture si rifiuterà, senza
dirlo, di effettuare una qualsiasi operazione di
CPU-hotplug indipendentemente dal valore specificato in
onoff_interval.
onoff_holdoff
Numero di secondi da aspettare prima di iniziare le
operazioni di CPU-hotplug. Normalmente questo verrebbe
usato solamente quando locktorture è compilato come parte
integrante del kernel ed eseguito automaticamente all'avvio,
in questo caso è utile perché permette di non confondere
l'avvio con i processori che vanno e vengono. Questo
parametro è utile sono se CONFIG_HOTPLUG_CPU è abilitato.
stat_interval
Numero di secondi fra una stampa (printk()) delle
statistiche e l'altra. Di base, locktorture riporta le
statistiche ogni 60 secondi. Impostando l'intervallo a 0
ha l'effetto di stampare le statistiche -solo- quando il
modulo viene rimosso.
stutter
Durata della verifica prima di effettuare una pausa di
eguale durata. Di base "stutter=5", quindi si eseguono
verifiche e pause di (circa) cinque secondi.
L'impostazione di "stutter=0" fa si che la verifica
venga eseguita continuamente senza fermarsi.
shuffle_interval
Il numero di secondi per cui un thread debba mantenere
l'affinità con un sottoinsieme di processori, di base è
3 secondi. Viene usato assieme a test_no_idle_hz.
verbose
Abilita le stampe di debug, via printk(). Di base è
abilitato. Queste informazioni aggiuntive sono per la
maggior parte relative ad errori di alto livello e resoconti
da parte dell'struttura 'torture'.
Statistiche
===========
Le statistiche vengono stampate secondo il seguente formato::
spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0
(A) (B) (C) (D) (E)
(A): tipo di lock sotto verifica -- parametro torture_type.
(B): Numero di acquisizione del blocco in scrittura. Se si ha a che fare
con una primitiva di lettura/scrittura apparirà di seguito anche una
seconda voce "Reads"
(C): Numero di volte che il blocco è stato acquisito
(D): Numero minimo e massimo di volte che un thread ha fallito
nell'acquisire il blocco
(E): valori true/false nel caso di errori durante l'acquisizione del blocco.
Questo dovrebbe dare un riscontro positivo -solo- se c'è un baco
nell'implementazione delle primitive di sincronizzazione. Altrimenti un
blocco non dovrebbe mai fallire (per esempio, spin_lock()).
Ovviamente lo stesso si applica per (C). Un semplice esempio è il tipo
"lock_busted".
Uso
===
Il seguente script può essere utilizzato per verificare i blocchi::
#!/bin/sh
modprobe locktorture
sleep 3600
rmmod locktorture
dmesg | grep torture:
L'output può essere manualmente ispezionato cercando il marcatore d'errore
"!!!". Ovviamente potreste voler creare degli script più elaborati che
verificano automaticamente la presenza di errori. Il comando "rmmod" forza la
stampa (usando printk()) di "SUCCESS", "FAILURE", oppure "RCU_HOTPLUG". I primi
due si piegano da soli, mentre l'ultimo indica che non stati trovati problemi di
sincronizzazione, tuttavia ne sono stati trovati in CPU-hotplug.
Consultate anche: Documentation/translations/it_IT/RCU/torture.rst
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-ita.rst
.. _it_kernel_hacking_locktypes:
========================================
Tipologie di blocco e le loro istruzioni
========================================
Introduzione
============
Il kernel fornisce un certo numero di primitive di blocco che possiamo dividere
in tre categorie:
- blocchi ad attesa con sospensione
- blocchi locali per CPU
- blocchi ad attesa attiva
Questo documento descrive questi tre tipi e fornisce istruzioni su come
annidarli, ed usarli su kernel PREEMPT_RT.
Categorie di blocchi
====================
Blocchi ad attesa con sospensione
---------------------------------
I blocchi ad attesa con sospensione possono essere acquisiti solo in un contesti
dov'è possibile la prelazione.
Diverse implementazioni permettono di usare try_lock() anche in altri contesti,
nonostante ciò è bene considerare anche la sicurezza dei corrispondenti
unlock(). Inoltre, vanno prese in considerazione anche le varianti di *debug*
di queste primitive. Insomma, non usate i blocchi ad attesa con sospensioni in
altri contesti a meno che proprio non vi siano alternative.
In questa categoria troviamo:
- mutex
- rt_mutex
- semaphore
- rw_semaphore
- ww_mutex
- percpu_rw_semaphore
Nei kernel con PREEMPT_RT, i seguenti blocchi sono convertiti in blocchi ad
attesa con sospensione:
- local_lock
- spinlock_t
- rwlock_t
Blocchi locali per CPU
----------------------
- local_lock
Su kernel non-PREEMPT_RT, le funzioni local_lock gestiscono le primitive di
disabilitazione di prelazione ed interruzioni. Al contrario di altri meccanismi,
la disabilitazione della prelazione o delle interruzioni sono puri meccanismi
per il controllo della concorrenza su una CPU e quindi non sono adatti per la
gestione della concorrenza inter-CPU.
Blocchi ad attesa attiva
------------------------
- raw_spinlcok_t
- bit spinlocks
Nei kernel non-PREEMPT_RT, i seguenti blocchi sono ad attesa attiva:
- spinlock_t
- rwlock_t
Implicitamente, i blocchi ad attesa attiva disabilitano la prelazione e le
funzioni lock/unlock hanno anche dei suffissi per gestire il livello di
protezione:
=================== =========================================================================
_bh() disabilita / abilita *bottom halves* (interruzioni software)
_irq() disabilita / abilita le interruzioni
_irqsave/restore() salva e disabilita le interruzioni / ripristina ed attiva le interruzioni
=================== =========================================================================
Semantica del proprietario
==========================
Eccetto i semafori, i sopracitati tipi di blocchi hanno tutti una semantica
molto stringente riguardo al proprietario di un blocco:
Il contesto (attività) che ha acquisito il blocco deve rilasciarlo
I semafori rw_semaphores hanno un'interfaccia speciale che permette anche ai non
proprietari del blocco di rilasciarlo per i lettori.
rtmutex
=======
I blocchi a mutua esclusione RT (*rtmutex*) sono un sistema a mutua esclusione
con supporto all'ereditarietà della priorità (PI).
Questo meccanismo ha delle limitazioni sui kernel non-PREEMPT_RT dovuti alla
prelazione e alle sezioni con interruzioni disabilitate.
Chiaramente, questo meccanismo non può avvalersi della prelazione su una sezione
dove la prelazione o le interruzioni sono disabilitate; anche sui kernel
PREEMPT_RT. Tuttavia, i kernel PREEMPT_RT eseguono la maggior parte delle
sezioni in contesti dov'è possibile la prelazione, specialmente in contesti
d'interruzione (anche software). Questa conversione permette a spinlock_t e
rwlock_t di essere implementati usando rtmutex.
semaphore
=========
La primitiva semaphore implementa un semaforo con contatore.
I semafori vengono spesso utilizzati per la serializzazione e l'attesa, ma per
nuovi casi d'uso si dovrebbero usare meccanismi diversi, come mutex e
completion.
semaphore e PREEMPT_RT
----------------------
I kernel PREEMPT_RT non cambiano l'implementazione di semaphore perché non hanno
un concetto di proprietario, dunque impediscono a PREEMPT_RT d'avere
l'ereditarietà della priorità sui semafori. Un proprietario sconosciuto non può
ottenere una priorità superiore. Di consequenza, bloccarsi sui semafori porta
all'inversione di priorità.
rw_semaphore
============
Il blocco rw_semaphore è un meccanismo che permette più lettori ma un solo scrittore.
Sui kernel non-PREEMPT_RT l'implementazione è imparziale, quindi previene
l'inedia dei processi scrittori.
Questi blocchi hanno una semantica molto stringente riguardo il proprietario, ma
offre anche interfacce speciali che permettono ai processi non proprietari di
rilasciare un processo lettore. Queste interfacce funzionano indipendentemente
dalla configurazione del kernel.
rw_semaphore e PREEMPT_RT
-------------------------
I kernel PREEMPT_RT sostituiscono i rw_semaphore con un'implementazione basata
su rt_mutex, e questo ne modifica l'imparzialità:
Dato che uno scrittore rw_semaphore non può assicurare la propria priorità ai
suoi lettori, un lettore con priorità più bassa che ha subito la prelazione
continuerà a trattenere il blocco, quindi porta all'inedia anche gli scrittori
con priorità più alta. Per contro, dato che i lettori possono garantire la
propria priorità agli scrittori, uno scrittore a bassa priorità che subisce la
prelazione vedrà la propria priorità alzata finché non rilascerà il blocco, e
questo preverrà l'inedia dei processi lettori a causa di uno scrittore.
local_lock
==========
I local_lock forniscono nomi agli ambiti di visibilità delle sezioni critiche
protette tramite la disattivazione della prelazione o delle interruzioni.
Sui kernel non-PREEMPT_RT le operazioni local_lock si traducono
nell'abilitazione o disabilitazione della prelazione o le interruzioni.
=============================== ======================
local_lock(&llock) preempt_disable()
local_unlock(&llock) preempt_enable()
local_lock_irq(&llock) local_irq_disable()
local_unlock_irq(&llock) local_irq_enable()
local_lock_irqsave(&llock) local_irq_save()
local_unlock_irqrestore(&llock) local_irq_restore()
=============================== ======================
Gli ambiti di visibilità con nome hanno due vantaggi rispetto alle primitive di
base:
- Il nome del blocco permette di fare un'analisi statica, ed è anche chiaro su
cosa si applichi la protezione cosa che invece non si può fare con le
classiche primitive in quanto sono opache e senza alcun ambito di
visibilità.
- Se viene abilitato lockdep, allora local_lock ottiene un lockmap che
permette di verificare la bontà della protezione. Per esempio, questo può
identificare i casi dove una funzione usa preempt_disable() come meccanismo
di protezione in un contesto d'interruzione (anche software). A parte
questo, lockdep_assert_held(&llock) funziona come tutte le altre primitive
di sincronizzazione.
local_lock e PREEMPT_RT
-------------------------
I kernel PREEMPT_RT sostituiscono local_lock con uno spinlock_t per CPU, quindi
ne cambia la semantica:
- Tutte le modifiche a spinlock_t si applicano anche a local_lock
L'uso di local_lock
-------------------
I local_lock dovrebbero essere usati su kernel non-PREEMPT_RT quando la
disabilitazione della prelazione o delle interruzioni è il modo più adeguato per
gestire l'accesso concorrente a strutture dati per CPU.
Questo meccanismo non è adatto alla protezione da prelazione o interruzione su
kernel PREEMPT_RT dato che verrà convertito in spinlock_t.
raw_spinlock_t e spinlock_t
===========================
raw_spinlock_t
--------------
I blocco raw_spinlock_t è un blocco ad attesa attiva su tutti i tipi di kernel,
incluso quello PREEMPT_RT. Usate raw_spinlock_t solo in sezioni critiche nel
cuore del codice, nella gestione delle interruzioni di basso livello, e in posti
dove è necessario disabilitare la prelazione o le interruzioni. Per esempio, per
accedere in modo sicuro lo stato dell'hardware. A volte, i raw_spinlock_t
possono essere usati quando la sezione critica è minuscola, per evitare gli
eccessi di un rtmutex.
spinlock_t
----------
Il significato di spinlock_t cambia in base allo stato di PREEMPT_RT.
Sui kernel non-PREEMPT_RT, spinlock_t si traduce in un raw_spinlock_t ed ha
esattamente lo stesso significato.
spinlock_t e PREEMPT_RT
-----------------------
Sui kernel PREEMPT_RT, spinlock_t ha un'implementazione dedicata che si basa
sull'uso di rt_mutex. Questo ne modifica il significato:
- La prelazione non viene disabilitata.
- I suffissi relativi alla interruzioni (_irq, _irqsave / _irqrestore) per le
operazioni spin_lock / spin_unlock non hanno alcun effetto sullo stato delle
interruzioni della CPU.
- I suffissi relativi alle interruzioni software (_bh()) disabilitano i
relativi gestori d'interruzione.
I kernel non-PREEMPT_RT disabilitano la prelazione per ottenere lo stesso effetto.
I kernel PREEMPT_RT usano un blocco per CPU per la serializzazione, il che
permette di tenere attiva la prelazione. Il blocco disabilita i gestori
d'interruzione software e previene la rientranza vista la prelazione attiva.
A parte quanto appena discusso, i kernel PREEMPT_RT preservano il significato
di tutti gli altri aspetti di spinlock_t:
- Le attività che trattengono un blocco spinlock_t non migrano su altri
processori. Disabilitando la prelazione, i kernel non-PREEMPT_RT evitano la
migrazione. Invece, i kernel PREEMPT_RT disabilitano la migrazione per
assicurarsi che i puntatori a variabili per CPU rimangano validi anche
quando un'attività subisce la prelazione.
- Lo stato di un'attività si mantiene durante le acquisizioni del blocco al
fine di garantire che le regole basate sullo stato delle attività si possano
applicare a tutte le configurazioni del kernel. I kernel non-PREEMPT_RT
lasciano lo stato immutato. Tuttavia, la funzionalità PREEMPT_RT deve
cambiare lo stato se l'attività si blocca durante l'acquisizione. Dunque,
salva lo stato attuale prima di bloccarsi ed il rispettivo risveglio lo
ripristinerà come nell'esempio seguente::
task->state = TASK_INTERRUPTIBLE
lock()
block()
task->saved_state = task->state
task->state = TASK_UNINTERRUPTIBLE
schedule()
lock wakeup
task->state = task->saved_state
Altri tipi di risvegli avrebbero impostato direttamente lo stato a RUNNING,
ma in questo caso non avrebbe funzionato perché l'attività deve rimanere
bloccata fintanto che il blocco viene trattenuto. Quindi, lo stato salvato
viene messo a RUNNING quando il risveglio di un non-blocco cerca di
risvegliare un'attività bloccata in attesa del rilascio di uno spinlock. Poi,
quando viene completata l'acquisizione del blocco, il suo risveglio
ripristinerà lo stato salvato, in questo caso a RUNNING::
task->state = TASK_INTERRUPTIBLE
lock()
block()
task->saved_state = task->state
task->state = TASK_UNINTERRUPTIBLE
schedule()
non lock wakeup
task->saved_state = TASK_RUNNING
lock wakeup
task->state = task->saved_state
Questo garantisce che il vero risveglio non venga perso.
rwlock_t
========
Il blocco rwlock_t è un meccanismo che permette più lettori ma un solo scrittore.
Sui kernel non-PREEMPT_RT questo è un blocco ad attesa e per i suoi suffissi si
applicano le stesse regole per spinlock_t. La sua implementazione è imparziale,
quindi previene l'inedia dei processi scrittori.
rwlock_t e PREEMPT_RT
---------------------
Sui kernel PREEMPT_RT rwlock_t ha un'implementazione dedicata che si basa
sull'uso di rt_mutex. Questo ne modifica il significato:
- Tutte le modifiche fatte a spinlock_t si applicano anche a rwlock_t.
- Dato che uno scrittore rw_semaphore non può assicurare la propria priorità ai
suoi lettori, un lettore con priorità più bassa che ha subito la prelazione
continuerà a trattenere il blocco, quindi porta all'inedia anche gli
scrittori con priorità più alta. Per contro, dato che i lettori possono
garantire la propria priorità agli scrittori, uno scrittore a bassa priorità
che subisce la prelazione vedrà la propria priorità alzata finché non
rilascerà il blocco, e questo preverrà l'inedia dei processi lettori a causa
di uno scrittore.
Precisazioni su PREEMPT_RT
==========================
local_lock su RT
----------------
Sui kernel PREEMPT_RT Ci sono alcune implicazioni dovute alla conversione di
local_lock in un spinlock_t. Per esempio, su un kernel non-PREEMPT_RT il
seguente codice funzionerà come ci si aspetta::
local_lock_irq(&local_lock);
raw_spin_lock(&lock);
ed è equivalente a::
raw_spin_lock_irq(&lock);
Ma su un kernel PREEMPT_RT questo codice non funzionerà perché local_lock_irq()
si traduce in uno spinlock_t per CPU che non disabilita né le interruzioni né la
prelazione. Il seguente codice funzionerà su entrambe i kernel con o senza
PREEMPT_RT::
local_lock_irq(&local_lock);
spin_lock(&lock);
Un altro dettaglio da tenere a mente con local_lock è che ognuno di loro ha un
ambito di protezione ben preciso. Dunque, la seguente sostituzione è errate::
func1()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock_1, flags);
func3();
local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_1, flags);
}
func2()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock_2, flags);
func3();
local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_2, flags);
}
func3()
{
lockdep_assert_irqs_disabled();
access_protected_data();
}
Questo funziona correttamente su un kernel non-PREEMPT_RT, ma su un kernel
PREEMPT_RT local_lock_1 e local_lock_2 sono distinti e non possono serializzare
i chiamanti di func3(). L'*assert* di lockdep verrà attivato su un kernel
PREEMPT_RT perché local_lock_irqsave() non disabilita le interruzione a casa
della specifica semantica di spinlock_t in PREEMPT_RT. La corretta sostituzione
è::
func1()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
func3();
local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
}
func2()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
func3();
local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
}
func3()
{
lockdep_assert_held(&local_lock);
access_protected_data();
}
spinlock_t e rwlock_t
---------------------
Ci sono alcune conseguenze di cui tener conto dal cambiamento di semantica di
spinlock_t e rwlock_t sui kernel PREEMPT_RT. Per esempio, sui kernel non
PREEMPT_RT il seguente codice funziona come ci si aspetta::
local_irq_disable();
spin_lock(&lock);
ed è equivalente a::
spin_lock_irq(&lock);
Lo stesso vale per rwlock_t e le varianti con _irqsave().
Sui kernel PREEMPT_RT questo codice non funzionerà perché gli rtmutex richiedono
un contesto con la possibilità di prelazione. Al suo posto, usate
spin_lock_irq() o spin_lock_irqsave() e le loro controparti per il rilascio. I
kernel PREEMPT_RT offrono un meccanismo local_lock per i casi in cui la
disabilitazione delle interruzioni ed acquisizione di un blocco devono rimanere
separati. Acquisire un local_lock àncora un processo ad una CPU permettendo cose
come un'acquisizione di un blocco con interruzioni disabilitate per singola CPU.
Il tipico scenario è quando si vuole proteggere una variabile di processore nel
contesto di un thread::
struct foo *p = get_cpu_ptr(&var1);
spin_lock(&p->lock);
p->count += this_cpu_read(var2);
Questo codice è corretto su un kernel non-PREEMPT_RT, ma non lo è su un
PREEMPT_RT. La modifica della semantica di spinlock_t su PREEMPT_RT non permette
di acquisire p->lock perché, implicitamente, get_cpu_ptr() disabilita la
prelazione. La seguente sostituzione funzionerà su entrambe i kernel::
struct foo *p;
migrate_disable();
p = this_cpu_ptr(&var1);
spin_lock(&p->lock);
p->count += this_cpu_read(var2);
La funzione migrate_disable() assicura che il processo venga tenuto sulla CPU
corrente, e di conseguenza garantisce che gli accessi per-CPU alle variabili var1 e
var2 rimangano sulla stessa CPU fintanto che il processo rimane prelabile.
La sostituzione con migrate_disable() non funzionerà nel seguente scenario::
func()
{
struct foo *p;
migrate_disable();
p = this_cpu_ptr(&var1);
p->val = func2();
Questo non funziona perché migrate_disable() non protegge dal ritorno da un
processo che aveva avuto il diritto di prelazione. Una sostituzione più adatta
per questo caso è::
func()
{
struct foo *p;
local_lock(&foo_lock);
p = this_cpu_ptr(&var1);
p->val = func2();
Su un kernel non-PREEMPT_RT, questo codice protegge dal rientro disabilitando la
prelazione. Su un kernel PREEMPT_RT si ottiene lo stesso risultato acquisendo lo
spinlock di CPU.
raw_spinlock_t su RT
--------------------
Acquisire un raw_spinlock_t disabilita la prelazione e possibilmente anche le
interruzioni, quindi la sezione critica deve evitare di acquisire uno spinlock_t
o rwlock_t. Per esempio, la sezione critica non deve fare allocazioni di
memoria. Su un kernel non-PREEMPT_RT il seguente codice funziona perfettamente::
raw_spin_lock(&lock);
p = kmalloc(sizeof(*p), GFP_ATOMIC);
Ma lo stesso codice non funziona su un kernel PREEMPT_RT perché l'allocatore di
memoria può essere oggetto di prelazione e quindi non può essere chiamato in un
contesto atomico. Tuttavia, si può chiamare l'allocatore di memoria quando si
trattiene un blocco *non-raw* perché non disabilitano la prelazione sui kernel
PREEMPT_RT::
spin_lock(&lock);
p = kmalloc(sizeof(*p), GFP_ATOMIC);
bit spinlocks
-------------
I kernel PREEMPT_RT non possono sostituire i bit spinlock perché un singolo bit
è troppo piccolo per farci stare un rtmutex. Dunque, la semantica dei bit
spinlock è mantenuta anche sui kernel PREEMPT_RT. Quindi, le precisazioni fatte
per raw_spinlock_t valgono anche qui.
In PREEMPT_RT, alcuni bit spinlock sono sostituiti con normali spinlock_t usando
condizioni di preprocessore in base a dove vengono usati. Per contro, questo non
serve quando si sostituiscono gli spinlock_t. Invece, le condizioni poste sui
file d'intestazione e sul cuore dell'implementazione della sincronizzazione
permettono al compilatore di effettuare la sostituzione in modo trasparente.
Regole d'annidamento dei tipi di blocchi
========================================
Le regole principali sono:
- I tipi di blocco appartenenti alla stessa categoria possono essere annidati
liberamente a patto che si rispetti l'ordine di blocco al fine di evitare
stalli.
- I blocchi con sospensione non possono essere annidati in blocchi del tipo
CPU locale o ad attesa attiva
- I blocchi ad attesa attiva e su CPU locale possono essere annidati nei
blocchi ad attesa con sospensione.
- I blocchi ad attesa attiva possono essere annidati in qualsiasi altro tipo.
Queste limitazioni si applicano ad entrambe i kernel con o senza PREEMPT_RT.
Il fatto che un kernel PREEMPT_RT cambi i blocchi spinlock_t e rwlock_t dal tipo
ad attesa attiva a quello con sospensione, e che sostituisca local_lock con uno
spinlock_t per CPU, significa che non possono essere acquisiti quando si è in un
blocco raw_spinlock. Ne consegue il seguente ordine d'annidamento:
1) blocchi ad attesa con sospensione
2) spinlock_t, rwlock_t, local_lock
3) raw_spinlock_t e bit spinlocks
Se queste regole verranno violate, allora lockdep se ne accorgerà e questo sia
con o senza PREEMPT_RT.
.. include:: ../disclaimer-ita.rst
:Original: :ref:`Documentation/process/maintainer-netdev.rst <netdev-FAQ>`
.. _it_netdev-FAQ:
==========
netdev FAQ
==========
.. warning::
TODO ancora da tradurre
.. SPDX-License-Identifier: GPL-2.0
==========================================
Documentazione dei sottosistemi del kernel
==========================================
In questa parte della documentazione si entra nel dettaglio di come funzionano
i sottosistemi specifici del kernel dal punto di vista di uno sviluppatore del
kernel. Molte delle informazioni qui contenute provengono direttamente dai
sorgenti del kernel, con aggiunte di materiale dove è necessario (anche se
talora *non* è stato aggiunto tutto ciò che era necessario).
Sottosistemi principali
-----------------------
.. toctree::
:maxdepth: 1
core-api/index
Interfacce uomo-macchina
------------------------
.. toctree::
:maxdepth: 1
Interfacce di rete
------------------
.. toctree::
:maxdepth: 1
Interfacce per l'archiviazione
------------------------------
.. toctree::
:maxdepth: 1
Interfacce varie
----------------
.. toctree::
:maxdepth: 1
i2c/index
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
howto process/howto
.. raw:: latex .. raw:: latex
......
...@@ -273,7 +273,7 @@ revelada involucrada. La lista de embajadores actuales: ...@@ -273,7 +273,7 @@ revelada involucrada. La lista de embajadores actuales:
IBM Power Anton Blanchard <anton@linux.ibm.com> IBM Power Anton Blanchard <anton@linux.ibm.com>
IBM Z Christian Borntraeger <borntraeger@de.ibm.com> IBM Z Christian Borntraeger <borntraeger@de.ibm.com>
Intel Tony Luck <tony.luck@intel.com> Intel Tony Luck <tony.luck@intel.com>
Qualcomm Trilok Soni <tsoni@codeaurora.org> Qualcomm Trilok Soni <quic_tsoni@quicinc.com>
Samsung Javier González <javier.gonz@samsung.com> Samsung Javier González <javier.gonz@samsung.com>
Microsoft James Morris <jamorris@linux.microsoft.com> Microsoft James Morris <jamorris@linux.microsoft.com>
......
...@@ -147,4 +147,4 @@ Si no se puede encontrar a nadie para revisar internamente los parches y necesit ...@@ -147,4 +147,4 @@ Si no se puede encontrar a nadie para revisar internamente los parches y necesit
ayuda para encontrar a esa persona, o si tiene alguna otra pregunta relacionada ayuda para encontrar a esa persona, o si tiene alguna otra pregunta relacionada
con este documento y las expectativas de la comunidad de desarrolladores, por con este documento y las expectativas de la comunidad de desarrolladores, por
favor contacte con la lista de correo privada Technical Advisory Board: favor contacte con la lista de correo privada Technical Advisory Board:
<tech-board@lists.linux-foundation.org>. <tech-board@groups.linuxfoundation.org>.
...@@ -177,7 +177,7 @@ CVE分配 ...@@ -177,7 +177,7 @@ CVE分配
AMD Tom Lendacky <thomas.lendacky@amd.com> AMD Tom Lendacky <thomas.lendacky@amd.com>
IBM IBM
Intel Tony Luck <tony.luck@intel.com> Intel Tony Luck <tony.luck@intel.com>
Qualcomm Trilok Soni <tsoni@codeaurora.org> Qualcomm Trilok Soni <quic_tsoni@quicinc.com>
Microsoft Sasha Levin <sashal@kernel.org> Microsoft Sasha Levin <sashal@kernel.org>
VMware VMware
......
...@@ -53,7 +53,7 @@ OpenCAPI定义了一个在物理链路层上实现的数据链路层(TL)和 ...@@ -53,7 +53,7 @@ OpenCAPI定义了一个在物理链路层上实现的数据链路层(TL)和
Processor:处理器 Processor:处理器
Memory:内存 Memory:内存
Accelerated Function Unit:加速函数单元 Accelerated Function Unit:加速功能单元
...@@ -97,7 +97,7 @@ OpenCAPI拥有AFU向主机进程发送中断的可能性。它通过定义在传 ...@@ -97,7 +97,7 @@ OpenCAPI拥有AFU向主机进程发送中断的可能性。它通过定义在传
======== ========
驱动为每个在物理设备上发现的AFU创建一个字符设备。一个物理设备可能拥有多个 驱动为每个在物理设备上发现的AFU创建一个字符设备。一个物理设备可能拥有多个
函数,一个函数可以拥有多个AFU。不过编写这篇文档之时,只对导出一个AFU的设备 功能,一个功能可以拥有多个AFU。不过编写这篇文档之时,只对导出一个AFU的设备
测试过。 测试过。
字符设备可以在 /dev/ocxl/ 中被找到,其命名为: 字符设备可以在 /dev/ocxl/ 中被找到,其命名为:
......
...@@ -180,7 +180,7 @@ CVE分配 ...@@ -180,7 +180,7 @@ CVE分配
AMD Tom Lendacky <thomas.lendacky@amd.com> AMD Tom Lendacky <thomas.lendacky@amd.com>
IBM IBM
Intel Tony Luck <tony.luck@intel.com> Intel Tony Luck <tony.luck@intel.com>
Qualcomm Trilok Soni <tsoni@codeaurora.org> Qualcomm Trilok Soni <quic_tsoni@quicinc.com>
Microsoft Sasha Levin <sashal@kernel.org> Microsoft Sasha Levin <sashal@kernel.org>
VMware VMware
......
...@@ -9,31 +9,58 @@ While much of the kernel's user-space API is documented elsewhere ...@@ -9,31 +9,58 @@ While much of the kernel's user-space API is documented elsewhere
also be found in the kernel tree itself. This manual is intended to be the also be found in the kernel tree itself. This manual is intended to be the
place where this information is gathered. place where this information is gathered.
System calls
============
.. toctree::
:maxdepth: 1
unshare
futex2
ebpf/index
ioctl/index
Security-related interfaces
===========================
.. toctree:: .. toctree::
:caption: Table of contents :maxdepth: 1
:maxdepth: 2
no_new_privs no_new_privs
seccomp_filter seccomp_filter
landlock landlock
unshare lsm
spec_ctrl spec_ctrl
tee
Devices and I/O
===============
.. toctree::
:maxdepth: 1
accelerators/ocxl accelerators/ocxl
dma-buf-alloc-exchange dma-buf-alloc-exchange
ebpf/index
ELF
ioctl/index
iommu iommu
iommufd iommufd
media/index media/index
dcdbas
vduse
isapnp
Everything else
===============
.. toctree::
:maxdepth: 1
ELF
netlink/index netlink/index
sysfs-platform_profile sysfs-platform_profile
vduse vduse
futex2 futex2
lsm perf_ring_buffer
tee
isapnp
dcdbas
.. only:: subproject and html .. only:: subproject and html
......
.. SPDX-License-Identifier: GPL-2.0
================
Perf ring buffer
================
.. CONTENTS
1. Introduction
2. Ring buffer implementation
2.1 Basic algorithm
2.2 Ring buffer for different tracing modes
2.2.1 Default mode
2.2.2 Per-thread mode
2.2.3 Per-CPU mode
2.2.4 System wide mode
2.3 Accessing buffer
2.3.1 Producer-consumer model
2.3.2 Properties of the ring buffers
2.3.3 Writing samples into buffer
2.3.4 Reading samples from buffer
2.3.5 Memory synchronization
3. The mechanism of AUX ring buffer
3.1 The relationship between AUX and regular ring buffers
3.2 AUX events
3.3 Snapshot mode
1. Introduction
===============
The ring buffer is a fundamental mechanism for data transfer. perf uses
ring buffers to transfer event data from kernel to user space, another
kind of ring buffer which is so called auxiliary (AUX) ring buffer also
plays an important role for hardware tracing with Intel PT, Arm
CoreSight, etc.
The ring buffer implementation is critical but it's also a very
challenging work. On the one hand, the kernel and perf tool in the user
space use the ring buffer to exchange data and stores data into data
file, thus the ring buffer needs to transfer data with high throughput;
on the other hand, the ring buffer management should avoid significant
overload to distract profiling results.
This documentation dives into the details for perf ring buffer with two
parts: firstly it explains the perf ring buffer implementation, then the
second part discusses the AUX ring buffer mechanism.
2. Ring buffer implementation
=============================
2.1 Basic algorithm
-------------------
That said, a typical ring buffer is managed by a head pointer and a tail
pointer; the head pointer is manipulated by a writer and the tail
pointer is updated by a reader respectively.
::
+---------------------------+
| | |***|***|***| | |
+---------------------------+
`-> Tail `-> Head
* : the data is filled by the writer.
Figure 1. Ring buffer
Perf uses the same way to manage its ring buffer. In the implementation
there are two key data structures held together in a set of consecutive
pages, the control structure and then the ring buffer itself. The page
with the control structure in is known as the "user page". Being held
in continuous virtual addresses simplifies locating the ring buffer
address, it is in the pages after the page with the user page.
The control structure is named as ``perf_event_mmap_page``, it contains a
head pointer ``data_head`` and a tail pointer ``data_tail``. When the
kernel starts to fill records into the ring buffer, it updates the head
pointer to reserve the memory so later it can safely store events into
the buffer. On the other side, when the user page is a writable mapping,
the perf tool has the permission to update the tail pointer after consuming
data from the ring buffer. Yet another case is for the user page's
read-only mapping, which is to be addressed in the section
:ref:`writing_samples_into_buffer`.
::
user page ring buffer
+---------+---------+ +---------------------------------------+
|data_head|data_tail|...| | |***|***|***|***|***| | | |
+---------+---------+ +---------------------------------------+
` `----------------^ ^
`----------------------------------------------|
* : the data is filled by the writer.
Figure 2. Perf ring buffer
When using the ``perf record`` tool, we can specify the ring buffer size
with option ``-m`` or ``--mmap-pages=``, the given size will be rounded up
to a power of two that is a multiple of a page size. Though the kernel
allocates at once for all memory pages, it's deferred to map the pages
to VMA area until the perf tool accesses the buffer from the user space.
In other words, at the first time accesses the buffer's page from user
space in the perf tool, a data abort exception for page fault is taken
and the kernel uses this occasion to map the page into process VMA
(see ``perf_mmap_fault()``), thus the perf tool can continue to access
the page after returning from the exception.
2.2 Ring buffer for different tracing modes
-------------------------------------------
The perf profiles programs with different modes: default mode, per thread
mode, per cpu mode, and system wide mode. This section describes these
modes and how the ring buffer meets requirements for them. At last we
will review the race conditions caused by these modes.
2.2.1 Default mode
^^^^^^^^^^^^^^^^^^
Usually we execute ``perf record`` command followed by a profiling program
name, like below command::
perf record test_program
This command doesn't specify any options for CPU and thread modes, the
perf tool applies the default mode on the perf event. It maps all the
CPUs in the system and the profiled program's PID on the perf event, and
it enables inheritance mode on the event so that child tasks inherits
the events. As a result, the perf event is attributed as::
evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
evsel::threads::map[] = { pid }
evsel::attr::inherit = 1
These attributions finally will be reflected on the deployment of ring
buffers. As shown below, the perf tool allocates individual ring buffer
for each CPU, but it only enables events for the profiled program rather
than for all threads in the system. The *T1* thread represents the
thread context of the 'test_program', whereas *T2* and *T3* are irrelevant
threads in the system. The perf samples are exclusively collected for
the *T1* thread and stored in the ring buffer associated with the CPU on
which the *T1* thread is running.
::
T1 T2 T1
+----+ +-----------+ +----+
CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+----+--------------+-----------+----------+----+-------->
| |
v v
+-----------------------------------------------------+
| Ring buffer 0 |
+-----------------------------------------------------+
T1
+-----+
CPU1 |xxxxx|
-----+-----+--------------------------------------------->
|
v
+-----------------------------------------------------+
| Ring buffer 1 |
+-----------------------------------------------------+
T1 T3
+----+ +-------+
CPU2 |xxxx| |xxxxxxx|
--------------------------+----+--------+-------+-------->
|
v
+-----------------------------------------------------+
| Ring buffer 2 |
+-----------------------------------------------------+
T1
+--------------+
CPU3 |xxxxxxxxxxxxxx|
-----------+--------------+------------------------------>
|
v
+-----------------------------------------------------+
| Ring buffer 3 |
+-----------------------------------------------------+
T1: Thread 1; T2: Thread 2; T3: Thread 3
x: Thread is in running state
Figure 3. Ring buffer for default mode
2.2.2 Per-thread mode
^^^^^^^^^^^^^^^^^^^^^
By specifying option ``--per-thread`` in perf command, e.g.
::
perf record --per-thread test_program
The perf event doesn't map to any CPUs and is only bound to the
profiled process, thus, the perf event's attributions are::
evsel::cpus::map[0] = { -1 }
evsel::threads::map[] = { pid }
evsel::attr::inherit = 0
In this mode, a single ring buffer is allocated for the profiled thread;
if the thread is scheduled on a CPU, the events on that CPU will be
enabled; and if the thread is scheduled out from the CPU, the events on
the CPU will be disabled. When the thread is migrated from one CPU to
another, the events are to be disabled on the previous CPU and enabled
on the next CPU correspondingly.
::
T1 T2 T1
+----+ +-----------+ +----+
CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+----+--------------+-----------+----------+----+-------->
| |
| T1 |
| +-----+ |
CPU1 | |xxxxx| |
--|--+-----+----------------------------------|---------->
| | |
| | T1 T3 |
| | +----+ +---+ |
CPU2 | | |xxxx| |xxx| |
--|-----|-----------------+----+--------+---+-|---------->
| | | |
| | T1 | |
| | +--------------+ | |
CPU3 | | |xxxxxxxxxxxxxx| | |
--|-----|--+--------------+-|-----------------|---------->
| | | | |
v v v v v
+-----------------------------------------------------+
| Ring buffer |
+-----------------------------------------------------+
T1: Thread 1
x: Thread is in running state
Figure 4. Ring buffer for per-thread mode
When perf runs in per-thread mode, a ring buffer is allocated for the
profiled thread *T1*. The ring buffer is dedicated for thread *T1*, if the
thread *T1* is running, the perf events will be recorded into the ring
buffer; when the thread is sleeping, all associated events will be
disabled, thus no trace data will be recorded into the ring buffer.
2.2.3 Per-CPU mode
^^^^^^^^^^^^^^^^^^
The option ``-C`` is used to collect samples on the list of CPUs, for
example the below perf command receives option ``-C 0,2``::
perf record -C 0,2 test_program
It maps the perf event to CPUs 0 and 2, and the event is not associated to any
PID. Thus the perf event attributions are set as::
evsel::cpus::map[0] = { 0, 2 }
evsel::threads::map[] = { -1 }
evsel::attr::inherit = 0
This results in the session of ``perf record`` will sample all threads on CPU0
and CPU2, and be terminated until test_program exits. Even there have tasks
running on CPU1 and CPU3, since the ring buffer is absent for them, any
activities on these two CPUs will be ignored. A usage case is to combine the
options for per-thread mode and per-CPU mode, e.g. the options ``C 0,2`` and
``––perthread`` are specified together, the samples are recorded only when
the profiled thread is scheduled on any of the listed CPUs.
::
T1 T2 T1
+----+ +-----------+ +----+
CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+----+--------------+-----------+----------+----+-------->
| | |
v v v
+-----------------------------------------------------+
| Ring buffer 0 |
+-----------------------------------------------------+
T1
+-----+
CPU1 |xxxxx|
-----+-----+--------------------------------------------->
T1 T3
+----+ +-------+
CPU2 |xxxx| |xxxxxxx|
--------------------------+----+--------+-------+-------->
| |
v v
+-----------------------------------------------------+
| Ring buffer 1 |
+-----------------------------------------------------+
T1
+--------------+
CPU3 |xxxxxxxxxxxxxx|
-----------+--------------+------------------------------>
T1: Thread 1; T2: Thread 2; T3: Thread 3
x: Thread is in running state
Figure 5. Ring buffer for per-CPU mode
2.2.4 System wide mode
^^^^^^^^^^^^^^^^^^^^^^
By using option ``a`` or ``––allcpus``, perf collects samples on all CPUs
for all tasks, we call it as the system wide mode, the command is::
perf record -a test_program
Similar to the per-CPU mode, the perf event doesn't bind to any PID, and
it maps to all CPUs in the system::
evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
evsel::threads::map[] = { -1 }
evsel::attr::inherit = 0
In the system wide mode, every CPU has its own ring buffer, all threads
are monitored during the running state and the samples are recorded into
the ring buffer belonging to the CPU which the events occurred on.
::
T1 T2 T1
+----+ +-----------+ +----+
CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+----+--------------+-----------+----------+----+-------->
| | |
v v v
+-----------------------------------------------------+
| Ring buffer 0 |
+-----------------------------------------------------+
T1
+-----+
CPU1 |xxxxx|
-----+-----+--------------------------------------------->
|
v
+-----------------------------------------------------+
| Ring buffer 1 |
+-----------------------------------------------------+
T1 T3
+----+ +-------+
CPU2 |xxxx| |xxxxxxx|
--------------------------+----+--------+-------+-------->
| |
v v
+-----------------------------------------------------+
| Ring buffer 2 |
+-----------------------------------------------------+
T1
+--------------+
CPU3 |xxxxxxxxxxxxxx|
-----------+--------------+------------------------------>
|
v
+-----------------------------------------------------+
| Ring buffer 3 |
+-----------------------------------------------------+
T1: Thread 1; T2: Thread 2; T3: Thread 3
x: Thread is in running state
Figure 6. Ring buffer for system wide mode
2.3 Accessing buffer
--------------------
Based on the understanding of how the ring buffer is allocated in
various modes, this section explains access the ring buffer.
2.3.1 Producer-consumer model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In the Linux kernel, the PMU events can produce samples which are stored
into the ring buffer; the perf command in user space consumes the
samples by reading out data from the ring buffer and finally saves the
data into the file for post analysis. It’s a typical producer-consumer
model for using the ring buffer.
The perf process polls on the PMU events and sleeps when no events are
incoming. To prevent frequent exchanges between the kernel and user
space, the kernel event core layer introduces a watermark, which is
stored in the ``perf_buffer::watermark``. When a sample is recorded into
the ring buffer, and if the used buffer exceeds the watermark, the
kernel wakes up the perf process to read samples from the ring buffer.
::
Perf
/ | Read samples
Polling / `--------------| Ring buffer
v v ;---------------------v
+----------------+ +---------+---------+ +-------------------+
|Event wait queue| |data_head|data_tail| |***|***| | |***|
+----------------+ +---------+---------+ +-------------------+
^ ^ `------------------------^
| Wake up tasks | Store samples
+-----------------------------+
| Kernel event core layer |
+-----------------------------+
* : the data is filled by the writer.
Figure 7. Writing and reading the ring buffer
When the kernel event core layer notifies the user space, because
multiple events might share the same ring buffer for recording samples,
the core layer iterates every event associated with the ring buffer and
wakes up tasks waiting on the event. This is fulfilled by the kernel
function ``ring_buffer_wakeup()``.
After the perf process is woken up, it starts to check the ring buffers
one by one, if it finds any ring buffer containing samples it will read
out the samples for statistics or saving into the data file. Given the
perf process is able to run on any CPU, this leads to the ring buffer
potentially being accessed from multiple CPUs simultaneously, which
causes race conditions. The race condition handling is described in the
section :ref:`memory_synchronization`.
2.3.2 Properties of the ring buffers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Linux kernel supports two write directions for the ring buffer: forward and
backward. The forward writing saves samples from the beginning of the ring
buffer, the backward writing stores data from the end of the ring buffer with
the reversed direction. The perf tool determines the writing direction.
Additionally, the tool can map buffers in either read-write mode or read-only
mode to the user space.
The ring buffer in the read-write mode is mapped with the property
``PROT_READ | PROT_WRITE``. With the write permission, the perf tool
updates the ``data_tail`` to indicate the data start position. Combining
with the head pointer ``data_head``, which works as the end position of
the current data, the perf tool can easily know where read out the data
from.
Alternatively, in the read-only mode, only the kernel keeps to update
the ``data_head`` while the user space cannot access the ``data_tail`` due
to the mapping property ``PROT_READ``.
As a result, the matrix below illustrates the various combinations of
direction and mapping characteristics. The perf tool employs two of these
combinations to support buffer types: the non-overwrite buffer and the
overwritable buffer.
.. list-table::
:widths: 1 1 1
:header-rows: 1
* - Mapping mode
- Forward
- Backward
* - read-write
- Non-overwrite ring buffer
- Not used
* - read-only
- Not used
- Overwritable ring buffer
The non-overwrite ring buffer uses the read-write mapping with forward
writing. It starts to save data from the beginning of the ring buffer
and wrap around when overflow, which is used with the read-write mode in
the normal ring buffer. When the consumer doesn't keep up with the
producer, it would lose some data, the kernel keeps how many records it
lost and generates the ``PERF_RECORD_LOST`` records in the next time
when it finds a space in the ring buffer.
The overwritable ring buffer uses the backward writing with the
read-only mode. It saves the data from the end of the ring buffer and
the ``data_head`` keeps the position of current data, the perf always
knows where it starts to read and until the end of the ring buffer, thus
it don't need the ``data_tail``. In this mode, it will not generate the
``PERF_RECORD_LOST`` records.
.. _writing_samples_into_buffer:
2.3.3 Writing samples into buffer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When a sample is taken and saved into the ring buffer, the kernel
prepares sample fields based on the sample type; then it prepares the
info for writing ring buffer which is stored in the structure
``perf_output_handle``. In the end, the kernel outputs the sample into
the ring buffer and updates the head pointer in the user page so the
perf tool can see the latest value.
The structure ``perf_output_handle`` serves as a temporary context for
tracking the information related to the buffer. The advantages of it is
that it enables concurrent writing to the buffer by different events.
For example, a software event and a hardware PMU event both are enabled
for profiling, two instances of ``perf_output_handle`` serve as separate
contexts for the software event and the hardware event respectively.
This allows each event to reserve its own memory space for populating
the record data.
2.3.4 Reading samples from buffer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In the user space, the perf tool utilizes the ``perf_event_mmap_page``
structure to handle the head and tail of the buffer. It also uses
``perf_mmap`` structure to keep track of a context for the ring buffer, this
context includes information about the buffer's starting and ending
addresses. Additionally, the mask value can be utilized to compute the
circular buffer pointer even for an overflow.
Similar to the kernel, the perf tool in the user space first reads out
the recorded data from the ring buffer, and then updates the buffer's
tail pointer ``perf_event_mmap_page::data_tail``.
.. _memory_synchronization:
2.3.5 Memory synchronization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The modern CPUs with relaxed memory model cannot promise the memory
ordering, this means it’s possible to access the ring buffer and the
``perf_event_mmap_page`` structure out of order. To assure the specific
sequence for memory accessing perf ring buffer, memory barriers are
used to assure the data dependency. The rationale for the memory
synchronization is as below::
Kernel User space
if (LOAD ->data_tail) { LOAD ->data_head
(A) smp_rmb() (C)
STORE $data LOAD $data
smp_wmb() (B) smp_mb() (D)
STORE ->data_head STORE ->data_tail
}
The comments in tools/include/linux/ring_buffer.h gives nice description
for why and how to use memory barriers, here we will just provide an
alternative explanation:
(A) is a control dependency so that CPU assures order between checking
pointer ``perf_event_mmap_page::data_tail`` and filling sample into ring
buffer;
(D) pairs with (A). (D) separates the ring buffer data reading from
writing the pointer ``data_tail``, perf tool first consumes samples and then
tells the kernel that the data chunk has been released. Since a reading
operation is followed by a writing operation, thus (D) is a full memory
barrier.
(B) is a writing barrier in the middle of two writing operations, which
makes sure that recording a sample must be prior to updating the head
pointer.
(C) pairs with (B). (C) is a read memory barrier to ensure the head
pointer is fetched before reading samples.
To implement the above algorithm, the ``perf_output_put_handle()`` function
in the kernel and two helpers ``ring_buffer_read_head()`` and
``ring_buffer_write_tail()`` in the user space are introduced, they rely
on memory barriers as described above to ensure the data dependency.
Some architectures support one-way permeable barrier with load-acquire
and store-release operations, these barriers are more relaxed with less
performance penalty, so (C) and (D) can be optimized to use barriers
``smp_load_acquire()`` and ``smp_store_release()`` respectively.
If an architecture doesn’t support load-acquire and store-release in its
memory model, it will roll back to the old fashion of memory barrier
operations. In this case, ``smp_load_acquire()`` encapsulates
``READ_ONCE()`` + ``smp_mb()``, since ``smp_mb()`` is costly,
``ring_buffer_read_head()`` doesn't invoke ``smp_load_acquire()`` and it uses
the barriers ``READ_ONCE()`` + ``smp_rmb()`` instead.
3. The mechanism of AUX ring buffer
===================================
In this chapter, we will explain the implementation of the AUX ring
buffer. In the first part it will discuss the connection between the
AUX ring buffer and the regular ring buffer, then the second part will
examine how the AUX ring buffer co-works with the regular ring buffer,
as well as the additional features introduced by the AUX ring buffer for
the sampling mechanism.
3.1 The relationship between AUX and regular ring buffers
---------------------------------------------------------
Generally, the AUX ring buffer is an auxiliary for the regular ring
buffer. The regular ring buffer is primarily used to store the event
samples and every event format complies with the definition in the
union ``perf_event``; the AUX ring buffer is for recording the hardware
trace data and the trace data format is hardware IP dependent.
The general use and advantage of the AUX ring buffer is that it is
written directly by hardware rather than by the kernel. For example,
regular profile samples that write to the regular ring buffer cause an
interrupt. Tracing execution requires a high number of samples and
using interrupts would be overwhelming for the regular ring buffer
mechanism. Having an AUX buffer allows for a region of memory more
decoupled from the kernel and written to directly by hardware tracing.
The AUX ring buffer reuses the same algorithm with the regular ring
buffer for the buffer management. The control structure
``perf_event_mmap_page`` extends the new fields ``aux_head`` and ``aux_tail``
for the head and tail pointers of the AUX ring buffer.
During the initialisation phase, besides the mmap()-ed regular ring
buffer, the perf tool invokes a second syscall in the
``auxtrace_mmap__mmap()`` function for the mmap of the AUX buffer with
non-zero file offset; ``rb_alloc_aux()`` in the kernel allocates pages
correspondingly, these pages will be deferred to map into VMA when
handling the page fault, which is the same lazy mechanism with the
regular ring buffer.
AUX events and AUX trace data are two different things. Let's see an
example::
perf record -a -e cycles -e cs_etm/@tmc_etr0/ -- sleep 2
The above command enables two events: one is the event *cycles* from PMU
and another is the AUX event *cs_etm* from Arm CoreSight, both are saved
into the regular ring buffer while the CoreSight's AUX trace data is
stored in the AUX ring buffer.
As a result, we can see the regular ring buffer and the AUX ring buffer
are allocated in pairs. The perf in default mode allocates the regular
ring buffer and the AUX ring buffer per CPU-wise, which is the same as
the system wide mode, however, the default mode records samples only for
the profiled program, whereas the latter mode profiles for all programs
in the system. For per-thread mode, the perf tool allocates only one
regular ring buffer and one AUX ring buffer for the whole session. For
the per-CPU mode, the perf allocates two kinds of ring buffers for
selected CPUs specified by the option ``-C``.
The below figure demonstrates the buffers' layout in the system wide
mode; if there are any activities on one CPU, the AUX event samples and
the hardware trace data will be recorded into the dedicated buffers for
the CPU.
::
T1 T2 T1
+----+ +-----------+ +----+
CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+----+--------------+-----------+----------+----+-------->
| | |
v v v
+-----------------------------------------------------+
| Ring buffer 0 |
+-----------------------------------------------------+
| | |
v v v
+-----------------------------------------------------+
| AUX Ring buffer 0 |
+-----------------------------------------------------+
T1
+-----+
CPU1 |xxxxx|
-----+-----+--------------------------------------------->
|
v
+-----------------------------------------------------+
| Ring buffer 1 |
+-----------------------------------------------------+
|
v
+-----------------------------------------------------+
| AUX Ring buffer 1 |
+-----------------------------------------------------+
T1 T3
+----+ +-------+
CPU2 |xxxx| |xxxxxxx|
--------------------------+----+--------+-------+-------->
| |
v v
+-----------------------------------------------------+
| Ring buffer 2 |
+-----------------------------------------------------+
| |
v v
+-----------------------------------------------------+
| AUX Ring buffer 2 |
+-----------------------------------------------------+
T1
+--------------+
CPU3 |xxxxxxxxxxxxxx|
-----------+--------------+------------------------------>
|
v
+-----------------------------------------------------+
| Ring buffer 3 |
+-----------------------------------------------------+
|
v
+-----------------------------------------------------+
| AUX Ring buffer 3 |
+-----------------------------------------------------+
T1: Thread 1; T2: Thread 2; T3: Thread 3
x: Thread is in running state
Figure 8. AUX ring buffer for system wide mode
3.2 AUX events
--------------
Similar to ``perf_output_begin()`` and ``perf_output_end()``'s working for the
regular ring buffer, ``perf_aux_output_begin()`` and ``perf_aux_output_end()``
serve for the AUX ring buffer for processing the hardware trace data.
Once the hardware trace data is stored into the AUX ring buffer, the PMU
driver will stop hardware tracing by calling the ``pmu::stop()`` callback.
Similar to the regular ring buffer, the AUX ring buffer needs to apply
the memory synchronization mechanism as discussed in the section
:ref:`memory_synchronization`. Since the AUX ring buffer is managed by the
PMU driver, the barrier (B), which is a writing barrier to ensure the trace
data is externally visible prior to updating the head pointer, is asked
to be implemented in the PMU driver.
Then ``pmu::stop()`` can safely call the ``perf_aux_output_end()`` function to
finish two things:
- It fills an event ``PERF_RECORD_AUX`` into the regular ring buffer, this
event delivers the information of the start address and data size for a
chunk of hardware trace data has been stored into the AUX ring buffer;
- Since the hardware trace driver has stored new trace data into the AUX
ring buffer, the argument *size* indicates how many bytes have been
consumed by the hardware tracing, thus ``perf_aux_output_end()`` updates the
header pointer ``perf_buffer::aux_head`` to reflect the latest buffer usage.
At the end, the PMU driver will restart hardware tracing. During this
temporary suspending period, it will lose hardware trace data, which
will introduce a discontinuity during decoding phase.
The event ``PERF_RECORD_AUX`` presents an AUX event which is handled in the
kernel, but it lacks the information for saving the AUX trace data in
the perf file. When the perf tool copies the trace data from AUX ring
buffer to the perf data file, it synthesizes a ``PERF_RECORD_AUXTRACE``
event which is not a kernel ABI, it's defined by the perf tool to describe
which portion of data in the AUX ring buffer is saved. Afterwards, the perf
tool reads out the AUX trace data from the perf file based on the
``PERF_RECORD_AUXTRACE`` events, and the ``PERF_RECORD_AUX`` event is used to
decode a chunk of data by correlating with time order.
3.3 Snapshot mode
-----------------
Perf supports snapshot mode for AUX ring buffer, in this mode, users
only record AUX trace data at a specific time point which users are
interested in. E.g. below gives an example of how to take snapshots
with 1 second interval with Arm CoreSight::
perf record -e cs_etm/@tmc_etr0/u -S -a program &
PERFPID=$!
while true; do
kill -USR2 $PERFPID
sleep 1
done
The main flow for snapshot mode is:
- Before a snapshot is taken, the AUX ring buffer acts in free run mode.
During free run mode the perf doesn't record any of the AUX events and
trace data;
- Once the perf tool receives the *USR2* signal, it triggers the callback
function ``auxtrace_record::snapshot_start()`` to deactivate hardware
tracing. The kernel driver then populates the AUX ring buffer with the
hardware trace data, and the event ``PERF_RECORD_AUX`` is stored in the
regular ring buffer;
- Then perf tool takes a snapshot, ``record__read_auxtrace_snapshot()``
reads out the hardware trace data from the AUX ring buffer and saves it
into perf data file;
- After the snapshot is finished, ``auxtrace_record::snapshot_finish()``
restarts the PMU event for AUX tracing.
The perf only accesses the head pointer ``perf_event_mmap_page::aux_head``
in snapshot mode and doesnt touch tail pointer ``aux_tail``, this is
because the AUX ring buffer can overflow in free run mode, the tail
pointer is useless in this case. Alternatively, the callback
``auxtrace_record::find_snapshot()`` is introduced for making the decision
of whether the AUX ring buffer has been wrapped around or not, at the
end it fixes up the AUX buffer's head which are used to calculate the
trace data size.
As we know, the buffers' deployment can be per-thread mode, per-CPU
mode, or system wide mode, and the snapshot can be applied to any of
these modes. Below is an example of taking snapshot with system wide
mode.
::
Snapshot is taken
|
v
+------------------------+
| AUX Ring buffer 0 | <- aux_head
+------------------------+
v
+--------------------------------+
| AUX Ring buffer 1 | <- aux_head
+--------------------------------+
v
+--------------------------------------------+
| AUX Ring buffer 2 | <- aux_head
+--------------------------------------------+
v
+---------------------------------------+
| AUX Ring buffer 3 | <- aux_head
+---------------------------------------+
Figure 9. Snapshot with system wide mode
...@@ -24,7 +24,7 @@ Descriptions of section entries and preferred order ...@@ -24,7 +24,7 @@ Descriptions of section entries and preferred order
filing info, a direct bug tracker link, or a mailto: URI. filing info, a direct bug tracker link, or a mailto: URI.
C: URI for *chat* protocol, server and channel where developers C: URI for *chat* protocol, server and channel where developers
usually hang out, for example irc://server/channel. usually hang out, for example irc://server/channel.
P: Subsystem Profile document for more details submitting P: *Subsystem Profile* document for more details submitting
patches to the given subsystem. This is either an in-tree file, patches to the given subsystem. This is either an in-tree file,
or a URI. See Documentation/maintainer/maintainer-entry-profile.rst or a URI. See Documentation/maintainer/maintainer-entry-profile.rst
for details. for details.
...@@ -6385,6 +6385,7 @@ L: linux-doc@vger.kernel.org ...@@ -6385,6 +6385,7 @@ L: linux-doc@vger.kernel.org
S: Maintained S: Maintained
F: Documentation/admin-guide/quickly-build-trimmed-linux.rst F: Documentation/admin-guide/quickly-build-trimmed-linux.rst
F: Documentation/admin-guide/reporting-issues.rst F: Documentation/admin-guide/reporting-issues.rst
F: Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
DOCUMENTATION SCRIPTS DOCUMENTATION SCRIPTS
M: Mauro Carvalho Chehab <mchehab@kernel.org> M: Mauro Carvalho Chehab <mchehab@kernel.org>
...@@ -14035,7 +14036,7 @@ F: include/uapi/rdma/mlx5-abi.h ...@@ -14035,7 +14036,7 @@ F: include/uapi/rdma/mlx5-abi.h
MELLANOX MLX5 VDPA DRIVER MELLANOX MLX5 VDPA DRIVER
M: Dragos Tatulea <dtatulea@nvidia.com> M: Dragos Tatulea <dtatulea@nvidia.com>
L: virtualization@lists.linux-foundation.org L: virtualization@lists.linux.dev
S: Supported S: Supported
F: drivers/vdpa/mlx5/ F: drivers/vdpa/mlx5/
...@@ -21540,7 +21541,7 @@ F: tools/testing/selftests/drivers/net/team/ ...@@ -21540,7 +21541,7 @@ F: tools/testing/selftests/drivers/net/team/
TECHNICAL ADVISORY BOARD PROCESS DOCS TECHNICAL ADVISORY BOARD PROCESS DOCS
M: "Theodore Ts'o" <tytso@mit.edu> M: "Theodore Ts'o" <tytso@mit.edu>
M: Greg Kroah-Hartman <gregkh@linuxfoundation.org> M: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
L: tech-board-discuss@lists.linux-foundation.org L: tech-board-discuss@lists.linux.dev
S: Maintained S: Maintained
F: Documentation/process/contribution-maturity-model.rst F: Documentation/process/contribution-maturity-model.rst
F: Documentation/process/researcher-guidelines.rst F: Documentation/process/researcher-guidelines.rst
...@@ -23123,7 +23124,7 @@ F: drivers/vfio/pci/mlx5/ ...@@ -23123,7 +23124,7 @@ F: drivers/vfio/pci/mlx5/
VFIO VIRTIO PCI DRIVER VFIO VIRTIO PCI DRIVER
M: Yishai Hadas <yishaih@nvidia.com> M: Yishai Hadas <yishaih@nvidia.com>
L: kvm@vger.kernel.org L: kvm@vger.kernel.org
L: virtualization@lists.linux-foundation.org L: virtualization@lists.linux.dev
S: Maintained S: Maintained
F: drivers/vfio/pci/virtio F: drivers/vfio/pci/virtio
......
...@@ -11,7 +11,7 @@ In order to build the documentation, use ``make htmldocs`` or ...@@ -11,7 +11,7 @@ In order to build the documentation, use ``make htmldocs`` or
https://www.kernel.org/doc/html/latest/ https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory, There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation. several of them using the ReStructured Text markup notation.
Please read the Documentation/process/changes.rst file, as it contains the Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about requirements for building and running the kernel, and information about
......
...@@ -260,8 +260,7 @@ static u64 drm_gem_vram_pg_offset(struct drm_gem_vram_object *gbo) ...@@ -260,8 +260,7 @@ static u64 drm_gem_vram_pg_offset(struct drm_gem_vram_object *gbo)
} }
/** /**
* drm_gem_vram_offset() - \ * drm_gem_vram_offset() - Returns a GEM VRAM object's offset in video memory
Returns a GEM VRAM object's offset in video memory
* @gbo: the GEM VRAM object * @gbo: the GEM VRAM object
* *
* This function returns the buffer object's offset in the device's video * This function returns the buffer object's offset in the device's video
...@@ -470,14 +469,15 @@ void drm_gem_vram_vunmap(struct drm_gem_vram_object *gbo, ...@@ -470,14 +469,15 @@ void drm_gem_vram_vunmap(struct drm_gem_vram_object *gbo,
EXPORT_SYMBOL(drm_gem_vram_vunmap); EXPORT_SYMBOL(drm_gem_vram_vunmap);
/** /**
* drm_gem_vram_fill_create_dumb() - \ * drm_gem_vram_fill_create_dumb() - Helper for implementing
Helper for implementing &struct drm_driver.dumb_create * &struct drm_driver.dumb_create
*
* @file: the DRM file * @file: the DRM file
* @dev: the DRM device * @dev: the DRM device
* @pg_align: the buffer's alignment in multiples of the page size * @pg_align: the buffer's alignment in multiples of the page size
* @pitch_align: the scanline's alignment in powers of 2 * @pitch_align: the scanline's alignment in powers of 2
* @args: the arguments as provided to \ * @args: the arguments as provided to
&struct drm_driver.dumb_create * &struct drm_driver.dumb_create
* *
* This helper function fills &struct drm_mode_create_dumb, which is used * This helper function fills &struct drm_mode_create_dumb, which is used
* by &struct drm_driver.dumb_create. Implementations of this interface * by &struct drm_driver.dumb_create. Implementations of this interface
...@@ -575,8 +575,7 @@ static int drm_gem_vram_bo_driver_move(struct drm_gem_vram_object *gbo, ...@@ -575,8 +575,7 @@ static int drm_gem_vram_bo_driver_move(struct drm_gem_vram_object *gbo,
*/ */
/** /**
* drm_gem_vram_object_free() - \ * drm_gem_vram_object_free() - Implements &struct drm_gem_object_funcs.free
Implements &struct drm_gem_object_funcs.free
* @gem: GEM object. Refers to &struct drm_gem_vram_object.gem * @gem: GEM object. Refers to &struct drm_gem_vram_object.gem
*/ */
static void drm_gem_vram_object_free(struct drm_gem_object *gem) static void drm_gem_vram_object_free(struct drm_gem_object *gem)
...@@ -591,12 +590,11 @@ static void drm_gem_vram_object_free(struct drm_gem_object *gem) ...@@ -591,12 +590,11 @@ static void drm_gem_vram_object_free(struct drm_gem_object *gem)
*/ */
/** /**
* drm_gem_vram_driver_dumb_create() - \ * drm_gem_vram_driver_dumb_create() - Implements &struct drm_driver.dumb_create
Implements &struct drm_driver.dumb_create
* @file: the DRM file * @file: the DRM file
* @dev: the DRM device * @dev: the DRM device
* @args: the arguments as provided to \ * @args: the arguments as provided to
&struct drm_driver.dumb_create * &struct drm_driver.dumb_create
* *
* This function requires the driver to use @drm_device.vram_mm for its * This function requires the driver to use @drm_device.vram_mm for its
* instance of VRAM MM. * instance of VRAM MM.
...@@ -639,8 +637,8 @@ static void __drm_gem_vram_plane_helper_cleanup_fb(struct drm_plane *plane, ...@@ -639,8 +637,8 @@ static void __drm_gem_vram_plane_helper_cleanup_fb(struct drm_plane *plane,
} }
/** /**
* drm_gem_vram_plane_helper_prepare_fb() - \ * drm_gem_vram_plane_helper_prepare_fb() - Implements &struct
* Implements &struct drm_plane_helper_funcs.prepare_fb * drm_plane_helper_funcs.prepare_fb
* @plane: a DRM plane * @plane: a DRM plane
* @new_state: the plane's new state * @new_state: the plane's new state
* *
...@@ -690,8 +688,8 @@ drm_gem_vram_plane_helper_prepare_fb(struct drm_plane *plane, ...@@ -690,8 +688,8 @@ drm_gem_vram_plane_helper_prepare_fb(struct drm_plane *plane,
EXPORT_SYMBOL(drm_gem_vram_plane_helper_prepare_fb); EXPORT_SYMBOL(drm_gem_vram_plane_helper_prepare_fb);
/** /**
* drm_gem_vram_plane_helper_cleanup_fb() - \ * drm_gem_vram_plane_helper_cleanup_fb() - Implements &struct
* Implements &struct drm_plane_helper_funcs.cleanup_fb * drm_plane_helper_funcs.cleanup_fb
* @plane: a DRM plane * @plane: a DRM plane
* @old_state: the plane's old state * @old_state: the plane's old state
* *
...@@ -717,8 +715,8 @@ EXPORT_SYMBOL(drm_gem_vram_plane_helper_cleanup_fb); ...@@ -717,8 +715,8 @@ EXPORT_SYMBOL(drm_gem_vram_plane_helper_cleanup_fb);
*/ */
/** /**
* drm_gem_vram_simple_display_pipe_prepare_fb() - \ * drm_gem_vram_simple_display_pipe_prepare_fb() - Implements &struct
* Implements &struct drm_simple_display_pipe_funcs.prepare_fb * drm_simple_display_pipe_funcs.prepare_fb
* @pipe: a simple display pipe * @pipe: a simple display pipe
* @new_state: the plane's new state * @new_state: the plane's new state
* *
...@@ -739,8 +737,8 @@ int drm_gem_vram_simple_display_pipe_prepare_fb( ...@@ -739,8 +737,8 @@ int drm_gem_vram_simple_display_pipe_prepare_fb(
EXPORT_SYMBOL(drm_gem_vram_simple_display_pipe_prepare_fb); EXPORT_SYMBOL(drm_gem_vram_simple_display_pipe_prepare_fb);
/** /**
* drm_gem_vram_simple_display_pipe_cleanup_fb() - \ * drm_gem_vram_simple_display_pipe_cleanup_fb() - Implements &struct
* Implements &struct drm_simple_display_pipe_funcs.cleanup_fb * drm_simple_display_pipe_funcs.cleanup_fb
* @pipe: a simple display pipe * @pipe: a simple display pipe
* @old_state: the plane's old state * @old_state: the plane's old state
* *
...@@ -761,8 +759,7 @@ EXPORT_SYMBOL(drm_gem_vram_simple_display_pipe_cleanup_fb); ...@@ -761,8 +759,7 @@ EXPORT_SYMBOL(drm_gem_vram_simple_display_pipe_cleanup_fb);
*/ */
/** /**
* drm_gem_vram_object_pin() - \ * drm_gem_vram_object_pin() - Implements &struct drm_gem_object_funcs.pin
Implements &struct drm_gem_object_funcs.pin
* @gem: The GEM object to pin * @gem: The GEM object to pin
* *
* Returns: * Returns:
...@@ -785,8 +782,7 @@ static int drm_gem_vram_object_pin(struct drm_gem_object *gem) ...@@ -785,8 +782,7 @@ static int drm_gem_vram_object_pin(struct drm_gem_object *gem)
} }
/** /**
* drm_gem_vram_object_unpin() - \ * drm_gem_vram_object_unpin() - Implements &struct drm_gem_object_funcs.unpin
Implements &struct drm_gem_object_funcs.unpin
* @gem: The GEM object to unpin * @gem: The GEM object to unpin
*/ */
static void drm_gem_vram_object_unpin(struct drm_gem_object *gem) static void drm_gem_vram_object_unpin(struct drm_gem_object *gem)
......
...@@ -33,8 +33,8 @@ struct vm_area_struct; ...@@ -33,8 +33,8 @@ struct vm_area_struct;
* struct drm_gem_vram_object - GEM object backed by VRAM * struct drm_gem_vram_object - GEM object backed by VRAM
* @bo: TTM buffer object * @bo: TTM buffer object
* @map: Mapping information for @bo * @map: Mapping information for @bo
* @placement: TTM placement information. Supported placements are \ * @placement: TTM placement information. Supported placements are %TTM_PL_VRAM
%TTM_PL_VRAM and %TTM_PL_SYSTEM * and %TTM_PL_SYSTEM
* @placements: TTM placement information. * @placements: TTM placement information.
* *
* The type struct drm_gem_vram_object represents a GEM object that is * The type struct drm_gem_vram_object represents a GEM object that is
...@@ -126,8 +126,8 @@ drm_gem_vram_plane_helper_cleanup_fb(struct drm_plane *plane, ...@@ -126,8 +126,8 @@ drm_gem_vram_plane_helper_cleanup_fb(struct drm_plane *plane,
struct drm_plane_state *old_state); struct drm_plane_state *old_state);
/** /**
* DRM_GEM_VRAM_PLANE_HELPER_FUNCS - * DRM_GEM_VRAM_PLANE_HELPER_FUNCS - Initializes struct drm_plane_helper_funcs
* Initializes struct drm_plane_helper_funcs for VRAM handling * for VRAM handling
* *
* Drivers may use GEM BOs as VRAM helpers for the framebuffer memory. This * Drivers may use GEM BOs as VRAM helpers for the framebuffer memory. This
* macro initializes struct drm_plane_helper_funcs to use the respective helper * macro initializes struct drm_plane_helper_funcs to use the respective helper
...@@ -150,8 +150,8 @@ void drm_gem_vram_simple_display_pipe_cleanup_fb( ...@@ -150,8 +150,8 @@ void drm_gem_vram_simple_display_pipe_cleanup_fb(
struct drm_plane_state *old_state); struct drm_plane_state *old_state);
/** /**
* define DRM_GEM_VRAM_DRIVER - default callback functions for \ * define DRM_GEM_VRAM_DRIVER - default callback functions for
&struct drm_driver * &struct drm_driver
* *
* Drivers that use VRAM MM and GEM VRAM can use this macro to initialize * Drivers that use VRAM MM and GEM VRAM can use this macro to initialize
* &struct drm_driver with default functions. * &struct drm_driver with default functions.
...@@ -185,8 +185,8 @@ struct drm_vram_mm { ...@@ -185,8 +185,8 @@ struct drm_vram_mm {
}; };
/** /**
* drm_vram_mm_of_bdev() - \ * drm_vram_mm_of_bdev() - Returns the container of type &struct ttm_device for
Returns the container of type &struct ttm_device for field bdev. * field bdev.
* @bdev: the TTM BO device * @bdev: the TTM BO device
* *
* Returns: * Returns:
......
#!/usr/bin/env perl #!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
# vim: softtabstop=4
use warnings; use warnings;
use strict; use strict;
...@@ -99,6 +100,7 @@ my $blankline_man = ""; ...@@ -99,6 +100,7 @@ my $blankline_man = "";
my @highlights_rst = ( my @highlights_rst = (
[$type_constant, "``\$1``"], [$type_constant, "``\$1``"],
[$type_constant2, "``\$1``"], [$type_constant2, "``\$1``"],
# Note: need to escape () to avoid func matching later # Note: need to escape () to avoid func matching later
[$type_member_func, "\\:c\\:type\\:`\$1\$2\$3\\\\(\\\\) <\$1>`"], [$type_member_func, "\\:c\\:type\\:`\$1\$2\$3\\\\(\\\\) <\$1>`"],
[$type_member, "\\:c\\:type\\:`\$1\$2\$3 <\$1>`"], [$type_member, "\\:c\\:type\\:`\$1\$2\$3 <\$1>`"],
...@@ -109,6 +111,7 @@ my @highlights_rst = ( ...@@ -109,6 +111,7 @@ my @highlights_rst = (
[$type_struct, "\\:c\\:type\\:`\$1 <\$2>`"], [$type_struct, "\\:c\\:type\\:`\$1 <\$2>`"],
[$type_typedef, "\\:c\\:type\\:`\$1 <\$2>`"], [$type_typedef, "\\:c\\:type\\:`\$1 <\$2>`"],
[$type_union, "\\:c\\:type\\:`\$1 <\$2>`"], [$type_union, "\\:c\\:type\\:`\$1 <\$2>`"],
# in rst this can refer to any type # in rst this can refer to any type
[$type_fallback, "\\:c\\:type\\:`\$1`"], [$type_fallback, "\\:c\\:type\\:`\$1`"],
[$type_param_ref, "**\$1\$2**"] [$type_param_ref, "**\$1\$2**"]
...@@ -631,8 +634,7 @@ sub output_enum_man(%) { ...@@ -631,8 +634,7 @@ sub output_enum_man(%) {
if ($count == $#{$args{'parameterlist'}}) { if ($count == $#{$args{'parameterlist'}}) {
print "\n};\n"; print "\n};\n";
last; last;
} } else {
else {
print ", \n.br\n"; print ", \n.br\n";
} }
$count++; $count++;
...@@ -818,8 +820,31 @@ sub output_function_rst(%) { ...@@ -818,8 +820,31 @@ sub output_function_rst(%) {
my %args = %{$_[0]}; my %args = %{$_[0]};
my ($parameter, $section); my ($parameter, $section);
my $oldprefix = $lineprefix; my $oldprefix = $lineprefix;
my $start = "";
my $is_macro = 0; my $signature = "";
if ($args{'functiontype'} ne "") {
$signature = $args{'functiontype'} . " " . $args{'function'} . " (";
} else {
$signature = $args{'function'} . " (";
}
my $count = 0;
foreach my $parameter (@{$args{'parameterlist'}}) {
if ($count ne 0) {
$signature .= ", ";
}
$count++;
$type = $args{'parametertypes'}{$parameter};
if ($type =~ m/$function_pointer/) {
# pointer-to-function
$signature .= $1 . $parameter . ") (" . $2 . ")";
} else {
$signature .= $type;
}
}
$signature .= ")";
if ($sphinx_major < 3) { if ($sphinx_major < 3) {
if ($args{'typedef'}) { if ($args{'typedef'}) {
...@@ -828,56 +853,30 @@ sub output_function_rst(%) { ...@@ -828,56 +853,30 @@ sub output_function_rst(%) {
print " **Typedef**: "; print " **Typedef**: ";
$lineprefix = ""; $lineprefix = "";
output_highlight_rst($args{'purpose'}); output_highlight_rst($args{'purpose'});
$start = "\n\n**Syntax**\n\n ``"; print "\n\n**Syntax**\n\n";
$is_macro = 1; print " ``$signature``\n\n";
} else { } else {
print ".. c:function:: "; print ".. c:function:: $signature\n\n";
} }
} else { } else {
if ($args{'typedef'} || $args{'functiontype'} eq "") { if ($args{'typedef'} || $args{'functiontype'} eq "") {
$is_macro = 1;
print ".. c:macro:: ". $args{'function'} . "\n\n"; print ".. c:macro:: ". $args{'function'} . "\n\n";
} else {
print ".. c:function:: ";
}
if ($args{'typedef'}) { if ($args{'typedef'}) {
print_lineno($declaration_start_line); print_lineno($declaration_start_line);
print " **Typedef**: "; print " **Typedef**: ";
$lineprefix = ""; $lineprefix = "";
output_highlight_rst($args{'purpose'}); output_highlight_rst($args{'purpose'});
$start = "\n\n**Syntax**\n\n ``"; print "\n\n**Syntax**\n\n";
print " ``$signature``\n\n";
} else { } else {
print "``" if ($is_macro); print "``$signature``\n\n";
} }
}
if ($args{'functiontype'} ne "") {
$start .= $args{'functiontype'} . " " . $args{'function'} . " (";
} else { } else {
$start .= $args{'function'} . " ("; print ".. c:function:: $signature\n\n";
} }
print $start;
my $count = 0;
foreach my $parameter (@{$args{'parameterlist'}}) {
if ($count ne 0) {
print ", ";
} }
$count++;
$type = $args{'parametertypes'}{$parameter};
if ($type =~ m/$function_pointer/) {
# pointer-to-function
print $1 . $parameter . ") (" . $2 . ")";
} else {
print $type;
}
}
if ($is_macro) {
print ")``\n\n";
} else {
print ")\n\n";
}
if (!$args{'typedef'}) { if (!$args{'typedef'}) {
print_lineno($declaration_start_line); print_lineno($declaration_start_line);
$lineprefix = " "; $lineprefix = " ";
...@@ -1279,8 +1278,7 @@ sub dump_struct($$) { ...@@ -1279,8 +1278,7 @@ sub dump_struct($$) {
'purpose' => $declaration_purpose, 'purpose' => $declaration_purpose,
'type' => $decl_type 'type' => $decl_type
}); });
} } else {
else {
print STDERR "${file}:$.: error: Cannot parse struct or union!\n"; print STDERR "${file}:$.: error: Cannot parse struct or union!\n";
++$errors; ++$errors;
} }
...@@ -1330,7 +1328,7 @@ sub dump_enum($$) { ...@@ -1330,7 +1328,7 @@ sub dump_enum($$) {
$x =~ s@/\*.*?\*/@@gos; # strip comments. $x =~ s@/\*.*?\*/@@gos; # strip comments.
# strip #define macros inside enums # strip #define macros inside enums
$x =~ s@#\s*((define|ifdef)\s+|endif)[^;]*;@@gos; $x =~ s@#\s*((define|ifdef|if)\s+|endif)[^;]*;@@gos;
if ($x =~ /typedef\s+enum\s*\{(.*)\}\s*(\w*)\s*;/) { if ($x =~ /typedef\s+enum\s*\{(.*)\}\s*(\w*)\s*;/) {
$declaration_name = $2; $declaration_name = $2;
...@@ -1456,8 +1454,7 @@ sub dump_typedef($$) { ...@@ -1456,8 +1454,7 @@ sub dump_typedef($$) {
'sections' => \%sections, 'sections' => \%sections,
'purpose' => $declaration_purpose 'purpose' => $declaration_purpose
}); });
} } else {
else {
print STDERR "${file}:$.: error: Cannot parse typedef!\n"; print STDERR "${file}:$.: error: Cannot parse typedef!\n";
++$errors; ++$errors;
} }
...@@ -1509,6 +1506,15 @@ sub create_parameterlist($$$$) { ...@@ -1509,6 +1506,15 @@ sub create_parameterlist($$$$) {
$type =~ s/([^\(]+\(\*?)\s*$param/$1/; $type =~ s/([^\(]+\(\*?)\s*$param/$1/;
save_struct_actual($param); save_struct_actual($param);
push_parameter($param, $type, $arg, $file, $declaration_name); push_parameter($param, $type, $arg, $file, $declaration_name);
} elsif ($arg =~ m/\(.+\)\s*\[/) {
# array-of-pointers
$arg =~ tr/#/,/;
$arg =~ m/[^\(]+\(\s*\*\s*([\w\[\]\.]*?)\s*(\s*\[\s*[\w]+\s*\]\s*)*\)/;
$param = $1;
$type = $arg;
$type =~ s/([^\(]+\(\*?)\s*$param/$1/;
save_struct_actual($param);
push_parameter($param, $type, $arg, $file, $declaration_name);
} elsif ($arg) { } elsif ($arg) {
$arg =~ s/\s*:\s*/:/g; $arg =~ s/\s*:\s*/:/g;
$arg =~ s/\s*\[/\[/g; $arg =~ s/\s*\[/\[/g;
...@@ -1535,14 +1541,12 @@ sub create_parameterlist($$$$) { ...@@ -1535,14 +1541,12 @@ sub create_parameterlist($$$$) {
save_struct_actual($2); save_struct_actual($2);
push_parameter($2, "$type $1", $arg, $file, $declaration_name); push_parameter($2, "$type $1", $arg, $file, $declaration_name);
} } elsif ($param =~ m/(.*?):(\d+)/) {
elsif ($param =~ m/(.*?):(\d+)/) {
if ($type ne "") { # skip unnamed bit-fields if ($type ne "") { # skip unnamed bit-fields
save_struct_actual($1); save_struct_actual($1);
push_parameter($1, "$type:$2", $arg, $file, $declaration_name) push_parameter($1, "$type:$2", $arg, $file, $declaration_name)
} }
} } else {
else {
save_struct_actual($param); save_struct_actual($param);
push_parameter($param, $type, $arg, $file, $declaration_name); push_parameter($param, $type, $arg, $file, $declaration_name);
} }
...@@ -1571,8 +1575,7 @@ sub push_parameter($$$$$) { ...@@ -1571,8 +1575,7 @@ sub push_parameter($$$$$) {
if (!$param =~ /\w\.\.\.$/) { if (!$param =~ /\w\.\.\.$/) {
# handles unnamed variable parameters # handles unnamed variable parameters
$param = "..."; $param = "...";
} } elsif ($param =~ /\w\.\.\.$/) {
elsif ($param =~ /\w\.\.\.$/) {
# for named variable parameters of the form `x...`, remove the dots # for named variable parameters of the form `x...`, remove the dots
$param =~ s/\.\.\.$//; $param =~ s/\.\.\.$//;
} }
...@@ -1659,8 +1662,7 @@ sub check_sections($$$$$) { ...@@ -1659,8 +1662,7 @@ sub check_sections($$$$$) {
"Excess function parameter " . "Excess function parameter " .
"'$sects[$sx]' " . "'$sects[$sx]' " .
"description in '$decl_name'\n"); "description in '$decl_name'\n");
} } elsif (($decl_type eq "struct") or
elsif (($decl_type eq "struct") or
($decl_type eq "union")) { ($decl_type eq "union")) {
emit_warning("${file}:$.", emit_warning("${file}:$.",
"Excess $decl_type member " . "Excess $decl_type member " .
...@@ -1685,7 +1687,8 @@ sub check_return_section { ...@@ -1685,7 +1687,8 @@ sub check_return_section {
} }
if (!defined($sections{$section_return}) || if (!defined($sections{$section_return}) ||
$sections{$section_return} eq "") { $sections{$section_return} eq "")
{
emit_warning("${file}:$.", emit_warning("${file}:$.",
"No description found for return value of " . "No description found for return value of " .
"'$declaration_name'\n"); "'$declaration_name'\n");
...@@ -1907,10 +1910,9 @@ sub process_proto_function($$) { ...@@ -1907,10 +1910,9 @@ sub process_proto_function($$) {
$x =~ s@\/\/.*$@@gos; # strip C99-style comments to end of line $x =~ s@\/\/.*$@@gos; # strip C99-style comments to end of line
if ($x =~ m#\s*/\*\s+MACDOC\s*#io || ($x =~ /^#/ && $x !~ /^#\s*define/)) { if ($x =~ /^#/ && $x !~ /^#\s*define/) {
# do nothing # do nothing
} } elsif ($x =~ /([^\{]*)/) {
elsif ($x =~ /([^\{]*)/) {
$prototype .= $1; $prototype .= $1;
} }
...@@ -2331,7 +2333,7 @@ sub process_file($) { ...@@ -2331,7 +2333,7 @@ sub process_file($) {
$section_counter = 0; $section_counter = 0;
while (<IN_FILE>) { while (<IN_FILE>) {
while (s/\\\s*$//) { while (!/^ \*/ && s/\\\s*$//) {
$_ .= <IN_FILE>; $_ .= <IN_FILE>;
} }
# Replace tabs by spaces # Replace tabs by spaces
...@@ -2359,8 +2361,7 @@ sub process_file($) { ...@@ -2359,8 +2361,7 @@ sub process_file($) {
if ($output_selection == OUTPUT_INCLUDE) { if ($output_selection == OUTPUT_INCLUDE) {
emit_warning("${file}:1", "'$_' not found\n") emit_warning("${file}:1", "'$_' not found\n")
for keys %function_table; for keys %function_table;
} } else {
else {
emit_warning("${file}:1", "no structured comments found\n"); emit_warning("${file}:1", "no structured comments found\n");
} }
} }
......
...@@ -280,8 +280,6 @@ sub get_sphinx_version($) ...@@ -280,8 +280,6 @@ sub get_sphinx_version($)
sub check_sphinx() sub check_sphinx()
{ {
my $default_version;
open IN, $conf or die "Can't open $conf"; open IN, $conf or die "Can't open $conf";
while (<IN>) { while (<IN>) {
if (m/^\s*needs_sphinx\s*=\s*[\'\"]([\d\.]+)[\'\"]/) { if (m/^\s*needs_sphinx\s*=\s*[\'\"]([\d\.]+)[\'\"]/) {
...@@ -293,18 +291,7 @@ sub check_sphinx() ...@@ -293,18 +291,7 @@ sub check_sphinx()
die "Can't get needs_sphinx version from $conf" if (!$min_version); die "Can't get needs_sphinx version from $conf" if (!$min_version);
open IN, $requirement_file or die "Can't open $requirement_file"; $virtenv_dir = $virtenv_prefix . "latest";
while (<IN>) {
if (m/^\s*Sphinx\s*==\s*([\d\.]+)$/) {
$default_version=$1;
last;
}
}
close IN;
die "Can't get default sphinx version from $requirement_file" if (!$default_version);
$virtenv_dir = $virtenv_prefix . $default_version;
my $sphinx = get_sphinx_fname(); my $sphinx = get_sphinx_fname();
if ($sphinx eq "") { if ($sphinx eq "") {
...@@ -318,8 +305,8 @@ sub check_sphinx() ...@@ -318,8 +305,8 @@ sub check_sphinx()
die "$sphinx didn't return its version" if (!$cur_version); die "$sphinx didn't return its version" if (!$cur_version);
if ($cur_version lt $min_version) { if ($cur_version lt $min_version) {
printf "ERROR: Sphinx version is %s. It should be >= %s (recommended >= %s)\n", printf "ERROR: Sphinx version is %s. It should be >= %s\n",
$cur_version, $min_version, $default_version; $cur_version, $min_version;
$need_sphinx = 1; $need_sphinx = 1;
return; return;
} }
...@@ -361,6 +348,7 @@ sub give_debian_hints() ...@@ -361,6 +348,7 @@ sub give_debian_hints()
{ {
my %map = ( my %map = (
"python-sphinx" => "python3-sphinx", "python-sphinx" => "python3-sphinx",
"yaml" => "python3-yaml",
"ensurepip" => "python3-venv", "ensurepip" => "python3-venv",
"virtualenv" => "virtualenv", "virtualenv" => "virtualenv",
"dot" => "graphviz", "dot" => "graphviz",
...@@ -395,6 +383,7 @@ sub give_redhat_hints() ...@@ -395,6 +383,7 @@ sub give_redhat_hints()
{ {
my %map = ( my %map = (
"python-sphinx" => "python3-sphinx", "python-sphinx" => "python3-sphinx",
"yaml" => "python3-pyyaml",
"virtualenv" => "python3-virtualenv", "virtualenv" => "python3-virtualenv",
"dot" => "graphviz", "dot" => "graphviz",
"convert" => "ImageMagick", "convert" => "ImageMagick",
...@@ -421,6 +410,7 @@ sub give_redhat_hints() ...@@ -421,6 +410,7 @@ sub give_redhat_hints()
# #
my $old = 0; my $old = 0;
my $rel; my $rel;
my $noto_sans_redhat = "google-noto-sans-cjk-ttc-fonts";
$rel = $1 if ($system_release =~ /release\s+(\d+)/); $rel = $1 if ($system_release =~ /release\s+(\d+)/);
if (!($system_release =~ /Fedora/)) { if (!($system_release =~ /Fedora/)) {
...@@ -438,6 +428,9 @@ sub give_redhat_hints() ...@@ -438,6 +428,9 @@ sub give_redhat_hints()
if ($rel && $rel < 26) { if ($rel && $rel < 26) {
$old = 1; $old = 1;
} }
if ($rel && $rel >= 38) {
$noto_sans_redhat = "google-noto-sans-cjk-fonts";
}
} }
if (!$rel) { if (!$rel) {
printf("Couldn't identify release number\n"); printf("Couldn't identify release number\n");
...@@ -446,8 +439,9 @@ sub give_redhat_hints() ...@@ -446,8 +439,9 @@ sub give_redhat_hints()
} }
if ($pdf) { if ($pdf) {
check_missing_file(["/usr/share/fonts/google-noto-cjk/NotoSansCJK-Regular.ttc"], check_missing_file(["/usr/share/fonts/google-noto-cjk/NotoSansCJK-Regular.ttc",
"google-noto-sans-cjk-ttc-fonts", 2); "/usr/share/fonts/google-noto-sans-cjk-fonts/NotoSansCJK-Regular.ttc"],
$noto_sans_redhat, 2);
} }
check_rpm_missing(\@fedora26_opt_pkgs, 2) if ($pdf && !$old); check_rpm_missing(\@fedora26_opt_pkgs, 2) if ($pdf && !$old);
...@@ -472,6 +466,7 @@ sub give_opensuse_hints() ...@@ -472,6 +466,7 @@ sub give_opensuse_hints()
{ {
my %map = ( my %map = (
"python-sphinx" => "python3-sphinx", "python-sphinx" => "python3-sphinx",
"yaml" => "python3-pyyaml",
"virtualenv" => "python3-virtualenv", "virtualenv" => "python3-virtualenv",
"dot" => "graphviz", "dot" => "graphviz",
"convert" => "ImageMagick", "convert" => "ImageMagick",
...@@ -951,6 +946,7 @@ sub check_needs() ...@@ -951,6 +946,7 @@ sub check_needs()
# Check for needed programs/tools # Check for needed programs/tools
check_perl_module("Pod::Usage", 0); check_perl_module("Pod::Usage", 0);
check_python_module("yaml", 0);
check_program("make", 0); check_program("make", 0);
check_program("gcc", 0); check_program("gcc", 0);
check_program("dot", 1); check_program("dot", 1);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment