Commit 05ef8b97 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-5.6' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a relatively quiet cycle for documentation, but there's
  still a couple of things of note:

   - Conversion of the NFS documentation to RST

   - A new document on how to help with documentation (and a maintainer
     profile entry too)

  Plus the usual collection of typo fixes, etc"

* tag 'docs-5.6' of git://git.lwn.net/linux: (40 commits)
  docs: filesystems: add overlayfs to index.rst
  docs: usb: remove some broken references
  scripts/find-unused-docs: Fix massive false positives
  docs: nvdimm: use ReST notation for subsection
  zram: correct documentation about sysfs node of huge page writeback
  Documentation: zram: various fixes in zram.rst
  Add a maintainer entry profile for documentation
  Add a document on how to contribute to the documentation
  docs: Keep up with the location of NoUri
  Documentation: Call out example SYM_FUNC_* usage as x86-specific
  Documentation: nfs: fault_injection: convert to ReST
  Documentation: nfs: pnfs-scsi-server: convert to ReST
  Documentation: nfs: convert pnfs-block-server to ReST
  Documentation: nfs: idmapper: convert to ReST
  Documentation: convert nfsd-admin-interfaces to ReST
  Documentation: nfs-rdma: convert to ReST
  Documentation: nfsroot.rst: COSMETIC: refill a paragraph
  Documentation: nfsroot.txt: convert to ReST
  Documentation: convert nfs.txt to ReST
  Documentation: filesystems: convert vfat.txt to RST
  ...
parents 08a3ef8f 77ce1a47
======================================== ========================================
zram: Compressed RAM based block devices zram: Compressed RAM-based block devices
======================================== ========================================
Introduction Introduction
============ ============
The zram module creates RAM based block devices named /dev/zram<id> The zram module creates RAM-based block devices named /dev/zram<id>
(<id> = 0, 1, ...). Pages written to these disks are compressed and stored (<id> = 0, 1, ...). Pages written to these disks are compressed and stored
in memory itself. These disks allow very fast I/O and compression provides in memory itself. These disks allow very fast I/O and compression provides
good amounts of memory savings. Some of the usecases include /tmp storage, good amounts of memory savings. Some of the use cases include /tmp storage,
use as swap disks, various caches under /var and maybe many more :) use as swap disks, various caches under /var and maybe many more. :)
Statistics for individual zram devices are exported through sysfs nodes at Statistics for individual zram devices are exported through sysfs nodes at
/sys/block/zram<id>/ /sys/block/zram<id>/
...@@ -43,17 +43,17 @@ The list of possible return codes: ...@@ -43,17 +43,17 @@ The list of possible return codes:
======== ============================================================= ======== =============================================================
-EBUSY an attempt to modify an attribute that cannot be changed once -EBUSY an attempt to modify an attribute that cannot be changed once
the device has been initialised. Please reset device first; the device has been initialised. Please reset device first.
-ENOMEM zram was not able to allocate enough memory to fulfil your -ENOMEM zram was not able to allocate enough memory to fulfil your
needs; needs.
-EINVAL invalid input has been provided. -EINVAL invalid input has been provided.
======== ============================================================= ======== =============================================================
If you use 'echo', the returned value that is changed by 'echo' utility, If you use 'echo', the returned value is set by the 'echo' utility,
and, in general case, something like:: and, in general case, something like::
echo 3 > /sys/block/zram0/max_comp_streams echo 3 > /sys/block/zram0/max_comp_streams
if [ $? -ne 0 ]; if [ $? -ne 0 ]; then
handle_error handle_error
fi fi
...@@ -65,7 +65,8 @@ should suffice. ...@@ -65,7 +65,8 @@ should suffice.
:: ::
modprobe zram num_devices=4 modprobe zram num_devices=4
This creates 4 devices: /dev/zram{0,1,2,3}
This creates 4 devices: /dev/zram{0,1,2,3}
num_devices parameter is optional and tells zram how many devices should be num_devices parameter is optional and tells zram how many devices should be
pre-created. Default: 1. pre-created. Default: 1.
...@@ -73,12 +74,12 @@ pre-created. Default: 1. ...@@ -73,12 +74,12 @@ pre-created. Default: 1.
2) Set max number of compression streams 2) Set max number of compression streams
======================================== ========================================
Regardless the value passed to this attribute, ZRAM will always Regardless of the value passed to this attribute, ZRAM will always
allocate multiple compression streams - one per online CPUs - thus allocate multiple compression streams - one per online CPU - thus
allowing several concurrent compression operations. The number of allowing several concurrent compression operations. The number of
allocated compression streams goes down when some of the CPUs allocated compression streams goes down when some of the CPUs
become offline. There is no single-compression-stream mode anymore, become offline. There is no single-compression-stream mode anymore,
unless you are running a UP system or has only 1 CPU online. unless you are running a UP system or have only 1 CPU online.
To find out how many streams are currently available:: To find out how many streams are currently available::
...@@ -89,7 +90,7 @@ To find out how many streams are currently available:: ...@@ -89,7 +90,7 @@ To find out how many streams are currently available::
Using comp_algorithm device attribute one can see available and Using comp_algorithm device attribute one can see available and
currently selected (shown in square brackets) compression algorithms, currently selected (shown in square brackets) compression algorithms,
change selected compression algorithm (once the device is initialised or change the selected compression algorithm (once the device is initialised
there is no way to change compression algorithm). there is no way to change compression algorithm).
Examples:: Examples::
...@@ -167,9 +168,9 @@ Examples:: ...@@ -167,9 +168,9 @@ Examples::
zram provides a control interface, which enables dynamic (on-demand) device zram provides a control interface, which enables dynamic (on-demand) device
addition and removal. addition and removal.
In order to add a new /dev/zramX device, perform read operation on hot_add In order to add a new /dev/zramX device, perform a read operation on the hot_add
attribute. This will return either new device's device id (meaning that you attribute. This will return either the new device's device id (meaning that you
can use /dev/zram<id>) or error code. can use /dev/zram<id>) or an error code.
Example:: Example::
...@@ -186,8 +187,8 @@ execute:: ...@@ -186,8 +187,8 @@ execute::
Per-device statistics are exported as various nodes under /sys/block/zram<id>/ Per-device statistics are exported as various nodes under /sys/block/zram<id>/
A brief description of exported device attributes. For more details please A brief description of exported device attributes follows. For more details
read Documentation/ABI/testing/sysfs-block-zram. please read Documentation/ABI/testing/sysfs-block-zram.
====================== ====== =============================================== ====================== ====== ===============================================
Name access description Name access description
...@@ -245,7 +246,7 @@ whitespace: ...@@ -245,7 +246,7 @@ whitespace:
File /sys/block/zram<id>/mm_stat File /sys/block/zram<id>/mm_stat
The stat file represents device's mm statistics. It consists of a single The mm_stat file represents the device's mm statistics. It consists of a single
line of text and contains the following stats separated by whitespace: line of text and contains the following stats separated by whitespace:
================ ============================================================= ================ =============================================================
...@@ -261,7 +262,7 @@ line of text and contains the following stats separated by whitespace: ...@@ -261,7 +262,7 @@ line of text and contains the following stats separated by whitespace:
Unit: bytes Unit: bytes
mem_limit the maximum amount of memory ZRAM can use to store mem_limit the maximum amount of memory ZRAM can use to store
the compressed data the compressed data
mem_used_max the maximum amount of memory zram have consumed to mem_used_max the maximum amount of memory zram has consumed to
store the data store the data
same_pages the number of same element filled pages written to this disk. same_pages the number of same element filled pages written to this disk.
No memory is allocated for such pages. No memory is allocated for such pages.
...@@ -271,7 +272,7 @@ line of text and contains the following stats separated by whitespace: ...@@ -271,7 +272,7 @@ line of text and contains the following stats separated by whitespace:
File /sys/block/zram<id>/bd_stat File /sys/block/zram<id>/bd_stat
The stat file represents device's backing device statistics. It consists of The bd_stat file represents a device's backing device statistics. It consists of
a single line of text and contains the following stats separated by whitespace: a single line of text and contains the following stats separated by whitespace:
============== ============================================================= ============== =============================================================
...@@ -316,9 +317,9 @@ To use the feature, admin should set up backing device via:: ...@@ -316,9 +317,9 @@ To use the feature, admin should set up backing device via::
echo /dev/sda5 > /sys/block/zramX/backing_dev echo /dev/sda5 > /sys/block/zramX/backing_dev
before disksize setting. It supports only partition at this moment. before disksize setting. It supports only partition at this moment.
If admin want to use incompressible page writeback, they could do via:: If admin wants to use incompressible page writeback, they could do via::
echo huge > /sys/block/zramX/write echo huge > /sys/block/zramX/writeback
To use idle page writeback, first, user need to declare zram pages To use idle page writeback, first, user need to declare zram pages
as idle:: as idle::
...@@ -326,7 +327,7 @@ as idle:: ...@@ -326,7 +327,7 @@ as idle::
echo all > /sys/block/zramX/idle echo all > /sys/block/zramX/idle
From now on, any pages on zram are idle pages. The idle mark From now on, any pages on zram are idle pages. The idle mark
will be removed until someone request access of the block. will be removed until someone requests access of the block.
IOW, unless there is access request, those pages are still idle pages. IOW, unless there is access request, those pages are still idle pages.
Admin can request writeback of those idle pages at right timing via:: Admin can request writeback of those idle pages at right timing via::
...@@ -341,16 +342,16 @@ to guarantee storage health for entire product life. ...@@ -341,16 +342,16 @@ to guarantee storage health for entire product life.
To overcome the concern, zram supports "writeback_limit" feature. To overcome the concern, zram supports "writeback_limit" feature.
The "writeback_limit_enable"'s default value is 0 so that it doesn't limit The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
any writeback. IOW, if admin want to apply writeback budget, he should any writeback. IOW, if admin wants to apply writeback budget, he should
enable writeback_limit_enable via:: enable writeback_limit_enable via::
$ echo 1 > /sys/block/zramX/writeback_limit_enable $ echo 1 > /sys/block/zramX/writeback_limit_enable
Once writeback_limit_enable is set, zram doesn't allow any writeback Once writeback_limit_enable is set, zram doesn't allow any writeback
until admin set the budget via /sys/block/zramX/writeback_limit. until admin sets the budget via /sys/block/zramX/writeback_limit.
(If admin doesn't enable writeback_limit_enable, writeback_limit's value (If admin doesn't enable writeback_limit_enable, writeback_limit's value
assigned via /sys/block/zramX/writeback_limit is meaninless.) assigned via /sys/block/zramX/writeback_limit is meaningless.)
If admin want to limit writeback as per-day 400M, he could do it If admin want to limit writeback as per-day 400M, he could do it
like below:: like below::
...@@ -361,13 +362,13 @@ like below:: ...@@ -361,13 +362,13 @@ like below::
/sys/block/zram0/writeback_limit. /sys/block/zram0/writeback_limit.
$ echo 1 > /sys/block/zram0/writeback_limit_enable $ echo 1 > /sys/block/zram0/writeback_limit_enable
If admin want to allow further write again once the bugdet is exausted, If admins want to allow further write again once the bugdet is exhausted,
he could do it like below:: he could do it like below::
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \ $ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
/sys/block/zram0/writeback_limit /sys/block/zram0/writeback_limit
If admin want to see remaining writeback budget since he set:: If admin wants to see remaining writeback budget since last set::
$ cat /sys/block/zramX/writeback_limit $ cat /sys/block/zramX/writeback_limit
...@@ -375,12 +376,12 @@ If admin want to disable writeback limit, he could do:: ...@@ -375,12 +376,12 @@ If admin want to disable writeback limit, he could do::
$ echo 0 > /sys/block/zramX/writeback_limit_enable $ echo 0 > /sys/block/zramX/writeback_limit_enable
The writeback_limit count will reset whenever you reset zram(e.g., The writeback_limit count will reset whenever you reset zram (e.g.,
system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
writeback happened until you reset the zram to allocate extra writeback writeback happened until you reset the zram to allocate extra writeback
budget in next setting is user's job. budget in next setting is user's job.
If admin want to measure writeback count in a certain period, he could If admin wants to measure writeback count in a certain period, he could
know it via /sys/block/zram0/bd_stat's 3rd column. know it via /sys/block/zram0/bd_stat's 3rd column.
memory tracking memory tracking
......
...@@ -76,6 +76,7 @@ configure specific aspects of kernel behavior to your liking. ...@@ -76,6 +76,7 @@ configure specific aspects of kernel behavior to your liking.
device-mapper/index device-mapper/index
efi-stub efi-stub
ext4 ext4
nfs/index
gpio/index gpio/index
highuid highuid
hw_random hw_random
......
===================
NFS Fault Injection
===================
Fault Injection
===============
Fault injection is a method for forcing errors that may not normally occur, or Fault injection is a method for forcing errors that may not normally occur, or
may be difficult to reproduce. Forcing these errors in a controlled environment may be difficult to reproduce. Forcing these errors in a controlled environment
can help the developer find and fix bugs before their code is shipped in a can help the developer find and fix bugs before their code is shipped in a
......
=============
NFS
=============
.. toctree::
:maxdepth: 1
nfs-client
nfsroot
nfs-rdma
nfsd-admin-interfaces
nfs-idmapper
pnfs-block-server
pnfs-scsi-server
fault_injection
==========
NFS Client
==========
The NFS client The NFS client
============== ==============
...@@ -59,10 +62,11 @@ The DNS resolver ...@@ -59,10 +62,11 @@ The DNS resolver
NFSv4 allows for one server to refer the NFS client to data that has been NFSv4 allows for one server to refer the NFS client to data that has been
migrated onto another server by means of the special "fs_locations" migrated onto another server by means of the special "fs_locations"
attribute. See attribute. See `RFC3530 Section 6: Filesystem Migration and Replication`_ and
http://tools.ietf.org/html/rfc3530#section-6 `Implementation Guide for Referrals in NFSv4`_.
and
http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 .. _RFC3530 Section 6\: Filesystem Migration and Replication: http://tools.ietf.org/html/rfc3530#section-6
.. _Implementation Guide for Referrals in NFSv4: http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
The fs_locations information can take the form of either an ip address and The fs_locations information can take the form of either an ip address and
a path, or a DNS hostname and a path. The latter requires the NFS client to a path, or a DNS hostname and a path. The latter requires the NFS client to
...@@ -78,8 +82,8 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual ...@@ -78,8 +82,8 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
(2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent'
(may be changed using the 'nfs.cache_getent' kernel boot parameter) (may be changed using the 'nfs.cache_getent' kernel boot parameter)
is run, with two arguments: is run, with two arguments:
- the cache name, "dns_resolve" - the cache name, "dns_resolve"
- the hostname to resolve - the hostname to resolve
(3) After looking up the corresponding ip address, the helper script (3) After looking up the corresponding ip address, the helper script
writes the result into the rpc_pipefs pseudo-file writes the result into the rpc_pipefs pseudo-file
...@@ -94,43 +98,44 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual ...@@ -94,43 +98,44 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
script, and <ttl> is the 'time to live' of this cache entry (in script, and <ttl> is the 'time to live' of this cache entry (in
units of seconds). units of seconds).
Note: If <ip address> is invalid, say the string "0", then a negative .. note::
entry is created, which will cause the kernel to treat the hostname If <ip address> is invalid, say the string "0", then a negative
as having no valid DNS translation. entry is created, which will cause the kernel to treat the hostname
as having no valid DNS translation.
A basic sample /sbin/nfs_cache_getent A basic sample /sbin/nfs_cache_getent
===================================== =====================================
.. code-block:: sh
#!/bin/bash
# #!/bin/bash
ttl=600 #
# ttl=600
cut=/usr/bin/cut #
getent=/usr/bin/getent cut=/usr/bin/cut
rpc_pipefs=/var/lib/nfs/rpc_pipefs getent=/usr/bin/getent
# rpc_pipefs=/var/lib/nfs/rpc_pipefs
die() #
{ die()
echo "Usage: $0 cache_name entry_name" {
exit 1 echo "Usage: $0 cache_name entry_name"
} exit 1
}
[ $# -lt 2 ] && die
cachename="$1" [ $# -lt 2 ] && die
cache_path=${rpc_pipefs}/cache/${cachename}/channel cachename="$1"
cache_path=${rpc_pipefs}/cache/${cachename}/channel
case "${cachename}" in
dns_resolve) case "${cachename}" in
name="$2" dns_resolve)
result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" name="$2"
[ -z "${result}" ] && result="0" result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )"
;; [ -z "${result}" ] && result="0"
*) ;;
die *)
;; die
esac ;;
echo "${result} ${name} ${ttl}" >${cache_path} esac
echo "${result} ${name} ${ttl}" >${cache_path}
=============
NFS ID Mapper
=============
=========
ID Mapper
=========
Id mapper is used by NFS to translate user and group ids into names, and to Id mapper is used by NFS to translate user and group ids into names, and to
translate user and group names into ids. Part of this translation involves translate user and group names into ids. Part of this translation involves
performing an upcall to userspace to request the information. There are two performing an upcall to userspace to request the information. There are two
...@@ -20,22 +20,24 @@ legacy rpc.idmap daemon for the id mapping. This result will be stored ...@@ -20,22 +20,24 @@ legacy rpc.idmap daemon for the id mapping. This result will be stored
in a custom NFS idmap cache. in a custom NFS idmap cache.
===========
Configuring Configuring
=========== ===========
The file /etc/request-key.conf will need to be modified so /sbin/request-key can The file /etc/request-key.conf will need to be modified so /sbin/request-key can
direct the upcall. The following line should be added: direct the upcall. The following line should be added:
#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... ``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...``
#====== ======= =============== =============== =============================== ``#====== ======= =============== =============== ===============================``
create id_resolver * * /usr/sbin/nfs.idmap %k %d 600 ``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600``
This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap. This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap.
The last parameter, 600, defines how many seconds into the future the key will The last parameter, 600, defines how many seconds into the future the key will
expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout
is not specified, nfs.idmap will default to 600 seconds. is not specified, nfs.idmap will default to 600 seconds.
id mapper uses for key descriptions: id mapper uses for key descriptions::
uid: Find the UID for the given user uid: Find the UID for the given user
gid: Find the GID for the given group gid: Find the GID for the given group
user: Find the user name for the given UID user: Find the user name for the given UID
...@@ -45,23 +47,24 @@ You can handle any of these individually, rather than using the generic upcall ...@@ -45,23 +47,24 @@ You can handle any of these individually, rather than using the generic upcall
program. If you would like to use your own program for a uid lookup then you program. If you would like to use your own program for a uid lookup then you
would edit your request-key.conf so it look similar to this: would edit your request-key.conf so it look similar to this:
#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... ``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...``
#====== ======= =============== =============== =============================== ``#====== ======= =============== =============== ===============================``
create id_resolver uid:* * /some/other/program %k %d 600 ``create id_resolver uid:* * /some/other/program %k %d 600``
create id_resolver * * /usr/sbin/nfs.idmap %k %d 600 ``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600``
Notice that the new line was added above the line for the generic program. Notice that the new line was added above the line for the generic program.
request-key will find the first matching line and corresponding program. In request-key will find the first matching line and corresponding program. In
this case, /some/other/program will handle all uid lookups and this case, /some/other/program will handle all uid lookups and
/usr/sbin/nfs.idmap will handle gid, user, and group lookups. /usr/sbin/nfs.idmap will handle gid, user, and group lookups.
See <file:Documentation/security/keys/request-key.rst> for more information See Documentation/security/keys/request-key.rst for more information
about the request-key function. about the request-key function.
=========
nfs.idmap nfs.idmap
========= =========
nfs.idmap is designed to be called by request-key, and should not be run "by nfs.idmap is designed to be called by request-key, and should not be run "by
hand". This program takes two arguments, a serialized key and a key hand". This program takes two arguments, a serialized key and a key
description. The serialized key is first converted into a key_serial_t, and description. The serialized key is first converted into a key_serial_t, and
......
===================
Setting up NFS/RDMA
===================
:Author:
NetApp and Open Grid Computing (May 29, 2008)
.. warning::
This document is probably obsolete.
Overview
========
This document describes how to install and setup the Linux NFS/RDMA client
and server software.
The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
was first included in the following release, Linux 2.6.25.
In our testing, we have obtained excellent performance results (full 10Gbit
wire bandwidth at minimal client CPU) under many workloads. The code passes
the full Connectathon test suite and operates over both Infiniband and iWARP
RDMA adapters.
Getting Help
============
If you get stuck, you can ask questions on the
nfs-rdma-devel@lists.sourceforge.net mailing list.
Installation
============
These instructions are a step by step guide to building a machine for
use with NFS/RDMA.
- Install an RDMA device
Any device supported by the drivers in drivers/infiniband/hw is acceptable.
Testing has been performed using several Mellanox-based IB cards, the
Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
- Install a Linux distribution and tools
The first kernel release to contain both the NFS/RDMA client and server was
Linux 2.6.25 Therefore, a distribution compatible with this and subsequent
Linux kernel release should be installed.
The procedures described in this document have been tested with
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
- Install nfs-utils-1.1.2 or greater on the client
An NFS/RDMA mount point can be obtained by using the mount.nfs command in
nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
version with support for NFS/RDMA mounts, but for various reasons we
recommend using nfs-utils-1.1.2 or greater). To see which version of
mount.nfs you are using, type:
.. code-block:: sh
$ /sbin/mount.nfs -V
If the version is less than 1.1.2 or the command does not exist,
you should install the latest version of nfs-utils.
Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs
Uncompress the package and follow the installation instructions.
If you will not need the idmapper and gssd executables (you do not need
these to create an NFS/RDMA enabled mount command), the installation
process can be simplified by disabling these features when running
configure:
.. code-block:: sh
$ ./configure --disable-gss --disable-nfsv4
To build nfs-utils you will need the tcp_wrappers package installed. For
more information on this see the package's README and INSTALL files.
After building the nfs-utils package, there will be a mount.nfs binary in
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
or v4 mounts. To initiate a v4 mount, the binary must be called
mount.nfs4. The standard technique is to create a symlink called
mount.nfs4 to mount.nfs.
This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
.. code-block:: sh
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
In this location, mount.nfs will be invoked automatically for NFS mounts
by the system mount command.
.. note::
mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
on the NFS client machine. You do not need this specific version of
nfs-utils on the server. Furthermore, only the mount.nfs command from
nfs-utils-1.1.2 is needed on the client.
- Install a Linux kernel with NFS/RDMA
The NFS/RDMA client and server are both included in the mainline Linux
kernel version 2.6.25 and later. This and other versions of the Linux
kernel can be found at: https://www.kernel.org/pub/linux/kernel/
Download the sources and place them in an appropriate location.
- Configure the RDMA stack
Make sure your kernel configuration has RDMA support enabled. Under
Device Drivers -> InfiniBand support, update the kernel configuration
to enable InfiniBand support [NOTE: the option name is misleading. Enabling
InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
iWARP adapter support (amso, cxgb3, etc.).
If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
- Configure the NFS client and server
Your kernel configuration must also have NFS file system support and/or
NFS server support enabled. These and other NFS related configuration
options can be found under File Systems -> Network File Systems.
- Build, install, reboot
The NFS/RDMA code will be enabled automatically if NFS and RDMA
are turned on. The NFS/RDMA client and server are configured via the hidden
SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
value of SUNRPC_XPRT_RDMA will be:
#. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
and server will not be built
#. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
in this case the NFS/RDMA client and server will be built as modules
#. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
and server will be built into the kernel
Therefore, if you have followed the steps above and turned no NFS and RDMA,
the NFS/RDMA client and server will be built.
Build a new kernel, install it, boot it.
Check RDMA and NFS Setup
========================
Before configuring the NFS/RDMA software, it is a good idea to test
your new kernel to ensure that the kernel is working correctly.
In particular, it is a good idea to verify that the RDMA stack
is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
is working properly.
- Check RDMA Setup
If you built the RDMA components as modules, load them at
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
card:
.. code-block:: sh
$ modprobe ib_mthca
$ modprobe ib_ipoib
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
running on the network. If your IB switch has an embedded SM, you can
use it. Otherwise, you will need to run an SM, such as OpenSM, on one
of your end nodes.
If an SM is running on your network, you should see the following:
.. code-block:: sh
$ cat /sys/class/infiniband/driverX/ports/1/state
4: ACTIVE
where driverX is mthca0, ipath5, ehca3, etc.
To further test the InfiniBand software stack, use IPoIB (this
assumes you have two IB hosts named host1 and host2):
.. code-block:: sh
host1$ ip link set dev ib0 up
host1$ ip address add dev ib0 a.b.c.x
host2$ ip link set dev ib0 up
host2$ ip address add dev ib0 a.b.c.y
host1$ ping a.b.c.y
host2$ ping a.b.c.x
For other device types, follow the appropriate procedures.
- Check NFS Setup
For the NFS components enabled above (client and/or server),
test their functionality over standard Ethernet using TCP/IP or UDP/IP.
NFS/RDMA Setup
==============
We recommend that you use two machines, one to act as the client and
one to act as the server.
One time configuration:
-----------------------
- On the server system, configure the /etc/exports file and start the NFS/RDMA server.
Exports entries with the following formats have been tested::
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
The IP address(es) is(are) the client's IPoIB address for an InfiniBand
HCA or the client's iWARP address(es) for an RNIC.
.. note::
The "insecure" option must be used because the NFS/RDMA client does
not use a reserved port.
Each time a machine boots:
--------------------------
- Load and configure the RDMA drivers
For InfiniBand using a Mellanox adapter:
.. code-block:: sh
$ modprobe ib_mthca
$ modprobe ib_ipoib
$ ip li set dev ib0 up
$ ip addr add dev ib0 a.b.c.d
.. note::
Please use unique addresses for the client and server!
- Start the NFS server
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
kernel config), load the RDMA transport module:
.. code-block:: sh
$ modprobe svcrdma
Regardless of how the server was built (module or built-in), start the
server:
.. code-block:: sh
$ /etc/init.d/nfs start
or
.. code-block:: sh
$ service nfs start
Instruct the server to listen on the RDMA transport:
.. code-block:: sh
$ echo rdma 20049 > /proc/fs/nfsd/portlist
- On the client system
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
kernel config), load the RDMA client module:
.. code-block:: sh
$ modprobe xprtrdma.ko
Regardless of how the client was built (module or built-in), use this
command to mount the NFS/RDMA server:
.. code-block:: sh
$ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
the "proto" field for the given mount.
Congratulations! You're using NFS/RDMA!
==================================
Administrative interfaces for nfsd Administrative interfaces for nfsd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ==================================
Note that normally these interfaces are used only by the utilities in Note that normally these interfaces are used only by the utilities in
nfs-utils. nfs-utils.
...@@ -13,18 +14,16 @@ nfsd/threads. ...@@ -13,18 +14,16 @@ nfsd/threads.
Before doing that, NFSD can be told which sockets to listen on by Before doing that, NFSD can be told which sockets to listen on by
writing to nfsd/portlist; that write may be: writing to nfsd/portlist; that write may be:
- an ascii-encoded file descriptor, which should refer to a - an ascii-encoded file descriptor, which should refer to a
bound (and listening, for tcp) socket, or bound (and listening, for tcp) socket, or
- "transportname port", where transportname is currently either - "transportname port", where transportname is currently either
"udp", "tcp", or "rdma". "udp", "tcp", or "rdma".
If nfsd is started without doing any of these, then it will create one If nfsd is started without doing any of these, then it will create one
udp and one tcp listener at port 2049 (see nfsd_init_socks). udp and one tcp listener at port 2049 (see nfsd_init_socks).
On startup, nfsd and lockd grace periods start. On startup, nfsd and lockd grace periods start. nfsd is shut down by a write of
0 to nfsd/threads. All locks and state are thrown away at that point.
nfsd is shut down by a write of 0 to nfsd/threads. All locks and state
are thrown away at that point.
Between startup and shutdown, the number of threads may be adjusted up Between startup and shutdown, the number of threads may be adjusted up
or down by additional writes to nfsd/threads or by writes to or down by additional writes to nfsd/threads or by writes to
...@@ -34,7 +33,7 @@ For more detail about files under nfsd/ and what they control, see ...@@ -34,7 +33,7 @@ For more detail about files under nfsd/ and what they control, see
fs/nfsd/nfsctl.c; most of them have detailed comments. fs/nfsd/nfsctl.c; most of them have detailed comments.
Implementation notes Implementation notes
^^^^^^^^^^^^^^^^^^^^ ====================
Note that the rpc server requires the caller to serialize addition and Note that the rpc server requires the caller to serialize addition and
removal of listening sockets, and startup and shutdown of the server. removal of listening sockets, and startup and shutdown of the server.
......
===============================================
Mounting the root filesystem via NFS (nfsroot) Mounting the root filesystem via NFS (nfsroot)
=============================================== ===============================================
Written 1996 by Gero Kuhlmann <gero@gkminix.han.de> :Authors:
Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz> Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
Updated 2006 by Horms <horms@verge.net.au> Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
Updated 2018 by Chris Novakovic <chris@chrisn.me.uk>
Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
Updated 2006 by Horms <horms@verge.net.au>
Updated 2018 by Chris Novakovic <chris@chrisn.me.uk>
In order to use a diskless system, such as an X-terminal or printer server
for example, it is necessary for the root filesystem to be present on a
non-disk device. This may be an initramfs (see Documentation/filesystems/
ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/admin-guide/initrd.rst) or a
filesystem mounted via NFS. The following text describes on how to use NFS
for the root filesystem. For the rest of this text 'client' means the
diskless system, and 'server' means the NFS server.
In order to use a diskless system, such as an X-terminal or printer server for
example, it is necessary for the root filesystem to be present on a non-disk
device. This may be an initramfs (see
Documentation/filesystems/ramfs-rootfs-initramfs.txt), a ramdisk (see
Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The
following text describes on how to use NFS for the root filesystem. For the rest
of this text 'client' means the diskless system, and 'server' means the NFS
server.
1.) Enabling nfsroot capabilities
----------------------------- Enabling nfsroot capabilities
=============================
In order to use nfsroot, NFS client support needs to be selected as In order to use nfsroot, NFS client support needs to be selected as
built-in during configuration. Once this has been selected, the nfsroot built-in during configuration. Once this has been selected, the nfsroot
...@@ -34,8 +41,8 @@ DHCP, BOOTP and RARP is safe. ...@@ -34,8 +41,8 @@ DHCP, BOOTP and RARP is safe.
2.) Kernel command line Kernel command line
------------------- ===================
When the kernel has been loaded by a boot loader (see below) it needs to be When the kernel has been loaded by a boot loader (see below) it needs to be
told what root fs device to use. And in the case of nfsroot, where to find told what root fs device to use. And in the case of nfsroot, where to find
...@@ -44,19 +51,17 @@ This can be established using the following kernel command line parameters: ...@@ -44,19 +51,17 @@ This can be established using the following kernel command line parameters:
root=/dev/nfs root=/dev/nfs
This is necessary to enable the pseudo-NFS-device. Note that it's not a This is necessary to enable the pseudo-NFS-device. Note that it's not a
real device but just a synonym to tell the kernel to use NFS instead of real device but just a synonym to tell the kernel to use NFS instead of
a real device. a real device.
nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
If the `nfsroot' parameter is NOT given on the command line, If the `nfsroot' parameter is NOT given on the command line,
the default "/tftpboot/%s" will be used. the default ``"/tftpboot/%s"`` will be used.
<server-ip> Specifies the IP address of the NFS server. <server-ip> Specifies the IP address of the NFS server.
The default address is determined by the `ip' parameter The default address is determined by the ip parameter
(see below). This parameter allows the use of different (see below). This parameter allows the use of different
servers for IP autoconfiguration and NFS. servers for IP autoconfiguration and NFS.
...@@ -66,7 +71,8 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] ...@@ -66,7 +71,8 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
IP address. IP address.
<nfs-options> Standard NFS options. All options are separated by commas. <nfs-options> Standard NFS options. All options are separated by commas.
The following defaults are used: The following defaults are used::
port = as given by server portmap daemon port = as given by server portmap daemon
rsize = 4096 rsize = 4096
wsize = 4096 wsize = 4096
...@@ -79,13 +85,11 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] ...@@ -79,13 +85,11 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
flags = hard, nointr, noposix, cto, ac flags = hard, nointr, noposix, cto, ac
ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>: ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>:<ntp0-ip>
<dns0-ip>:<dns1-ip>:<ntp0-ip>
This parameter tells the kernel how to configure IP addresses of devices This parameter tells the kernel how to configure IP addresses of devices
and also how to set up the IP routing table. It was originally called and also how to set up the IP routing table. It was originally called
`nfsaddrs', but now the boot-time IP configuration works independently of nfsaddrs, but now the boot-time IP configuration works independently of
NFS, so it was renamed to `ip' and the old name remained as an alias for NFS, so it was renamed to ip and the old name remained as an alias for
compatibility reasons. compatibility reasons.
If this parameter is missing from the kernel command line, all fields are If this parameter is missing from the kernel command line, all fields are
...@@ -93,17 +97,17 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>: ...@@ -93,17 +97,17 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
this means that the kernel tries to configure everything using this means that the kernel tries to configure everything using
autoconfiguration. autoconfiguration.
The <autoconf> parameter can appear alone as the value to the `ip' The <autoconf> parameter can appear alone as the value to the ip
parameter (without all the ':' characters before). If the value is parameter (without all the ':' characters before). If the value is
"ip=off" or "ip=none", no autoconfiguration will take place, otherwise "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
autoconfiguration will take place. The most common way to use this autoconfiguration will take place. The most common way to use this
is "ip=dhcp". is "ip=dhcp".
<client-ip> IP address of the client. <client-ip> IP address of the client.
Default: Determined using autoconfiguration. Default: Determined using autoconfiguration.
<server-ip> IP address of the NFS server. If RARP is used to determine <server-ip> IP address of the NFS server.
If RARP is used to determine
the client address and this parameter is NOT empty only the client address and this parameter is NOT empty only
replies from the specified server are accepted. replies from the specified server are accepted.
...@@ -115,19 +119,19 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>: ...@@ -115,19 +119,19 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
(see below). (see below).
Default: Determined using autoconfiguration. Default: Determined using autoconfiguration.
The address of the autoconfiguration server is used. The address of the autoconfiguration server is used.
<gw-ip> IP address of a gateway if the server is on a different subnet. <gw-ip> IP address of a gateway if the server is on a different subnet.
Default: Determined using autoconfiguration. Default: Determined using autoconfiguration.
<netmask> Netmask for local network interface. If unspecified <netmask> Netmask for local network interface.
the netmask is derived from the client IP address assuming If unspecified the netmask is derived from the client IP address
classful addressing. assuming classful addressing.
Default: Determined using autoconfiguration. Default: Determined using autoconfiguration.
<hostname> Name of the client. If a '.' character is present, anything <hostname> Name of the client.
If a '.' character is present, anything
before the first '.' is used as the client's hostname, and anything before the first '.' is used as the client's hostname, and anything
after it is used as its NIS domain name. May be supplied by after it is used as its NIS domain name. May be supplied by
autoconfiguration, but its absence will not trigger autoconfiguration. autoconfiguration, but its absence will not trigger autoconfiguration.
...@@ -138,21 +142,21 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>: ...@@ -138,21 +142,21 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
Default: Client IP address is used in ASCII notation. Default: Client IP address is used in ASCII notation.
<device> Name of network device to use. <device> Name of network device to use.
Default: If the host only has one device, it is used. Default: If the host only has one device, it is used.
Otherwise the device is determined using Otherwise the device is determined using
autoconfiguration. This is done by sending autoconfiguration. This is done by sending
autoconfiguration requests out of all devices, autoconfiguration requests out of all devices,
and using the device that received the first reply. and using the device that received the first reply.
<autoconf> Method to use for autoconfiguration. In the case of options <autoconf> Method to use for autoconfiguration.
which specify multiple autoconfiguration protocols, In the case of options
which specify multiple autoconfiguration protocols,
requests are sent using all protocols, and the first one requests are sent using all protocols, and the first one
to reply is used. to reply is used.
Only autoconfiguration protocols that have been compiled Only autoconfiguration protocols that have been compiled
into the kernel will be used, regardless of the value of into the kernel will be used, regardless of the value of
this option. this option::
off or none: don't use autoconfiguration off or none: don't use autoconfiguration
(do static IP assignment instead) (do static IP assignment instead)
...@@ -221,7 +225,6 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>: ...@@ -221,7 +225,6 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
nfsrootdebug nfsrootdebug
This parameter enables debugging messages to appear in the kernel This parameter enables debugging messages to appear in the kernel
log at boot time so that administrators can verify that the correct log at boot time so that administrators can verify that the correct
NFS mount options, server address, and root path are passed to the NFS mount options, server address, and root path are passed to the
...@@ -229,36 +232,32 @@ nfsrootdebug ...@@ -229,36 +232,32 @@ nfsrootdebug
rdinit=<executable file> rdinit=<executable file>
To specify which file contains the program that starts system To specify which file contains the program that starts system
initialization, administrators can use this command line parameter. initialization, administrators can use this command line parameter.
The default value of this parameter is "/init". If the specified The default value of this parameter is "/init". If the specified
file exists and the kernel can execute it, root filesystem related file exists and the kernel can execute it, root filesystem related
kernel command line parameters, including `nfsroot=', are ignored. kernel command line parameters, including 'nfsroot=', are ignored.
A description of the process of mounting the root file system can be A description of the process of mounting the root file system can be
found in: found in Documentation/driver-api/early-userspace/early_userspace_support.rst
Documentation/driver-api/early-userspace/early_userspace_support.rst
3.) Boot Loader Boot Loader
---------- ===========
To get the kernel into memory different approaches can be used. To get the kernel into memory different approaches can be used.
They depend on various facilities being available: They depend on various facilities being available:
3.1) Booting from a floppy using syslinux - Booting from a floppy using syslinux
When building kernels, an easy way to create a boot floppy that uses When building kernels, an easy way to create a boot floppy that uses
syslinux is to use the zdisk or bzdisk make targets which use zimage syslinux is to use the zdisk or bzdisk make targets which use zimage
and bzimage images respectively. Both targets accept the and bzimage images respectively. Both targets accept the
FDARGS parameter which can be used to set the kernel command line. FDARGS parameter which can be used to set the kernel command line.
e.g. e.g::
make bzdisk FDARGS="root=/dev/nfs" make bzdisk FDARGS="root=/dev/nfs"
Note that the user running this command will need to have Note that the user running this command will need to have
...@@ -267,32 +266,36 @@ They depend on various facilities being available: ...@@ -267,32 +266,36 @@ They depend on various facilities being available:
For more information on syslinux, including how to create bootdisks For more information on syslinux, including how to create bootdisks
for prebuilt kernels, see http://syslinux.zytor.com/ for prebuilt kernels, see http://syslinux.zytor.com/
N.B: Previously it was possible to write a kernel directly to .. note::
a floppy using dd, configure the boot device using rdev, and Previously it was possible to write a kernel directly to
boot using the resulting floppy. Linux no longer supports this a floppy using dd, configure the boot device using rdev, and
method of booting. boot using the resulting floppy. Linux no longer supports this
method of booting.
3.2) Booting from a cdrom using isolinux - Booting from a cdrom using isolinux
When building kernels, an easy way to create a bootable cdrom that When building kernels, an easy way to create a bootable cdrom that
uses isolinux is to use the isoimage target which uses a bzimage uses isolinux is to use the isoimage target which uses a bzimage
image. Like zdisk and bzdisk, this target accepts the FDARGS image. Like zdisk and bzdisk, this target accepts the FDARGS
parameter which can be used to set the kernel command line. parameter which can be used to set the kernel command line.
e.g. e.g::
make isoimage FDARGS="root=/dev/nfs" make isoimage FDARGS="root=/dev/nfs"
The resulting iso image will be arch/<ARCH>/boot/image.iso The resulting iso image will be arch/<ARCH>/boot/image.iso
This can be written to a cdrom using a variety of tools including This can be written to a cdrom using a variety of tools including
cdrecord. cdrecord.
e.g. e.g::
cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso
For more information on isolinux, including how to create bootdisks For more information on isolinux, including how to create bootdisks
for prebuilt kernels, see http://syslinux.zytor.com/ for prebuilt kernels, see http://syslinux.zytor.com/
3.2) Using LILO - Using LILO
When using LILO all the necessary command line parameters may be When using LILO all the necessary command line parameters may be
specified using the 'append=' directive in the LILO configuration specified using the 'append=' directive in the LILO configuration
file. file.
...@@ -300,15 +303,19 @@ They depend on various facilities being available: ...@@ -300,15 +303,19 @@ They depend on various facilities being available:
However, to use the 'root=' directive you also need to create However, to use the 'root=' directive you also need to create
a dummy root device, which may be removed after LILO is run. a dummy root device, which may be removed after LILO is run.
mknod /dev/boot255 c 0 255 e.g::
mknod /dev/boot255 c 0 255
For information on configuring LILO, please refer to its documentation. For information on configuring LILO, please refer to its documentation.
3.3) Using GRUB - Using GRUB
When using GRUB, kernel parameter are simply appended after the kernel When using GRUB, kernel parameter are simply appended after the kernel
specification: kernel <kernel> <parameters> specification: kernel <kernel> <parameters>
3.4) Using loadlin - Using loadlin
loadlin may be used to boot Linux from a DOS command prompt without loadlin may be used to boot Linux from a DOS command prompt without
requiring a local hard disk to mount as root. This has not been requiring a local hard disk to mount as root. This has not been
thoroughly tested by the authors of this document, but in general thoroughly tested by the authors of this document, but in general
...@@ -317,7 +324,8 @@ They depend on various facilities being available: ...@@ -317,7 +324,8 @@ They depend on various facilities being available:
Please refer to the loadlin documentation for further information. Please refer to the loadlin documentation for further information.
3.5) Using a boot ROM - Using a boot ROM
This is probably the most elegant way of booting a diskless client. This is probably the most elegant way of booting a diskless client.
With a boot ROM the kernel is loaded using the TFTP protocol. The With a boot ROM the kernel is loaded using the TFTP protocol. The
authors of this document are not aware of any no commercial boot authors of this document are not aware of any no commercial boot
...@@ -326,7 +334,8 @@ They depend on various facilities being available: ...@@ -326,7 +334,8 @@ They depend on various facilities being available:
etherboot, both of which are available on sunsite.unc.edu, and both etherboot, both of which are available on sunsite.unc.edu, and both
of which contain everything you need to boot a diskless Linux client. of which contain everything you need to boot a diskless Linux client.
3.6) Using pxelinux - Using pxelinux
Pxelinux may be used to boot linux using the PXE boot loader Pxelinux may be used to boot linux using the PXE boot loader
which is present on many modern network cards. which is present on many modern network cards.
...@@ -342,8 +351,8 @@ They depend on various facilities being available: ...@@ -342,8 +351,8 @@ They depend on various facilities being available:
4.) Credits Credits
------- =======
The nfsroot code in the kernel and the RARP support have been written The nfsroot code in the kernel and the RARP support have been written
by Gero Kuhlmann <gero@gkminix.han.de>. by Gero Kuhlmann <gero@gkminix.han.de>.
......
===================================
pNFS block layout server user guide pNFS block layout server user guide
===================================
The Linux NFS server now supports the pNFS block layout extension. In this The Linux NFS server now supports the pNFS block layout extension. In this
case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition
...@@ -22,16 +24,19 @@ If the nfsd server needs to fence a non-responding client it calls ...@@ -22,16 +24,19 @@ If the nfsd server needs to fence a non-responding client it calls
/sbin/nfsd-recall-failed with the first argument set to the IP address of /sbin/nfsd-recall-failed with the first argument set to the IP address of
the client, and the second argument set to the device node without the /dev the client, and the second argument set to the device node without the /dev
prefix for the file system to be fenced. Below is an example file that shows prefix for the file system to be fenced. Below is an example file that shows
how to translate the device into a serial number from SCSI EVPD 0x80: how to translate the device into a serial number from SCSI EVPD 0x80::
cat > /sbin/nfsd-recall-failed << EOF cat > /sbin/nfsd-recall-failed << EOF
#!/bin/sh
CLIENT="$1" .. code-block:: sh
DEV="/dev/$2"
EVPD=`sg_inq --page=0x80 ${DEV} | \
grep "Unit serial number:" | \
awk -F ': ' '{print $2}'`
echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log #!/bin/sh
EOF
CLIENT="$1"
DEV="/dev/$2"
EVPD=`sg_inq --page=0x80 ${DEV} | \
grep "Unit serial number:" | \
awk -F ': ' '{print $2}'`
echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log
EOF
==================================
pNFS SCSI layout server user guide pNFS SCSI layout server user guide
================================== ==================================
......
...@@ -73,10 +73,11 @@ The new macros are prefixed with the ``SYM_`` prefix and can be divided into ...@@ -73,10 +73,11 @@ The new macros are prefixed with the ``SYM_`` prefix and can be divided into
three main groups: three main groups:
1. ``SYM_FUNC_*`` -- to annotate C-like functions. This means functions with 1. ``SYM_FUNC_*`` -- to annotate C-like functions. This means functions with
standard C calling conventions, i.e. the stack contains a return address at standard C calling conventions. For example, on x86, this means that the
the predefined place and a return from the function can happen in a stack contains a return address at the predefined place and a return from
standard way. When frame pointers are enabled, save/restore of frame the function can happen in a standard way. When frame pointers are enabled,
pointer shall happen at the start/end of a function, respectively, too. save/restore of frame pointer shall happen at the start/end of a function,
respectively, too.
Checking tools like ``objtool`` should ensure such marked functions conform Checking tools like ``objtool`` should ensure such marked functions conform
to these rules. The tools can also easily annotate these functions with to these rules. The tools can also easily annotate these functions with
......
...@@ -47,7 +47,7 @@ Having a real iterator, and making biovecs immutable, has a number of ...@@ -47,7 +47,7 @@ Having a real iterator, and making biovecs immutable, has a number of
advantages: advantages:
* Before, iterating over bios was very awkward when you weren't processing * Before, iterating over bios was very awkward when you weren't processing
exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c, exactly one bvec at a time - for example, bio_copy_data() in block/bio.c,
which copies the contents of one bio into another. Because the biovecs which copies the contents of one bio into another. Because the biovecs
wouldn't necessarily be the same size, the old code was tricky convoluted - wouldn't necessarily be the same size, the old code was tricky convoluted -
it had to walk two different bios at the same time, keeping both bi_idx and it had to walk two different bios at the same time, keeping both bi_idx and
......
.. SPDX-License-Identifier: GPL-2.0
How to help improve kernel documentation
========================================
Documentation is an important part of any software-development project.
Good documentation helps to bring new developers in and helps established
developers work more effectively. Without top-quality documentation, a lot
of time is wasted in reverse-engineering the code and making avoidable
mistakes.
Unfortunately, the kernel's documentation currently falls far short of what
it needs to be to support a project of this size and importance.
This guide is for contributors who would like to improve that situation.
Kernel documentation improvements can be made by developers at a variety of
skill levels; they are a relatively easy way to learn the kernel process in
general and find a place in the community. The bulk of what follows is the
documentation maintainer's list of tasks that most urgently need to be
done.
The documentation TODO list
---------------------------
There is an endless list of tasks that need to be carried out to get our
documentation to where it should be. This list contains a number of
important items, but is far from exhaustive; if you see a different way to
improve the documentation, please do not hold back!
Addressing warnings
~~~~~~~~~~~~~~~~~~~
The documentation build currently spews out an unbelievable number of
warnings. When you have that many, you might as well have none at all;
people ignore them, and they will never notice when their work adds new
ones. For this reason, eliminating warnings is one of the highest-priority
tasks on the documentation TODO list. The task itself is reasonably
straightforward, but it must be approached in the right way to be
successful.
Warnings issued by a compiler for C code can often be dismissed as false
positives, leading to patches aimed at simply shutting the compiler up.
Warnings from the documentation build almost always point at a real
problem; making those warnings go away requires understanding the problem
and fixing it at its source. For this reason, patches fixing documentation
warnings should probably not say "fix a warning" in the changelog title;
they should indicate the real problem that has been fixed.
Another important point is that documentation warnings are often created by
problems in kerneldoc comments in C code. While the documentation
maintainer appreciates being copied on fixes for these warnings, the
documentation tree is often not the right one to actually carry those
fixes; they should go to the maintainer of the subsystem in question.
For example, in a documentation build I grabbed a pair of warnings nearly
at random::
./drivers/devfreq/devfreq.c:1818: warning: bad line:
- Resource-managed devfreq_register_notifier()
./drivers/devfreq/devfreq.c:1854: warning: bad line:
- Resource-managed devfreq_unregister_notifier()
(The lines were split for readability).
A quick look at the source file named above turned up a couple of kerneldoc
comments that look like this::
/**
* devm_devfreq_register_notifier()
- Resource-managed devfreq_register_notifier()
* @dev: The devfreq user device. (parent of devfreq)
* @devfreq: The devfreq object.
* @nb: The notifier block to be unregistered.
* @list: DEVFREQ_TRANSITION_NOTIFIER.
*/
The problem is the missing "*", which confuses the build system's
simplistic idea of what C comment blocks look like. This problem had been
present since that comment was added in 2016 — a full four years. Fixing
it was a matter of adding the missing asterisks. A quick look at the
history for that file showed what the normal format for subject lines is,
and ``scripts/get_maintainer.pl`` told me who should receive it. The
resulting patch looked like this::
[PATCH] PM / devfreq: Fix two malformed kerneldoc comments
Two kerneldoc comments in devfreq.c fail to adhere to the required format,
resulting in these doc-build warnings:
./drivers/devfreq/devfreq.c:1818: warning: bad line:
- Resource-managed devfreq_register_notifier()
./drivers/devfreq/devfreq.c:1854: warning: bad line:
- Resource-managed devfreq_unregister_notifier()
Add a couple of missing asterisks and make kerneldoc a little happier.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
drivers/devfreq/devfreq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 57f6944d65a6..00c9b80b3d33 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -1814,7 +1814,7 @@ static void devm_devfreq_notifier_release(struct device *dev, void *res)
/**
* devm_devfreq_register_notifier()
- - Resource-managed devfreq_register_notifier()
+ * - Resource-managed devfreq_register_notifier()
* @dev: The devfreq user device. (parent of devfreq)
* @devfreq: The devfreq object.
* @nb: The notifier block to be unregistered.
@@ -1850,7 +1850,7 @@ EXPORT_SYMBOL(devm_devfreq_register_notifier);
/**
* devm_devfreq_unregister_notifier()
- - Resource-managed devfreq_unregister_notifier()
+ * - Resource-managed devfreq_unregister_notifier()
* @dev: The devfreq user device. (parent of devfreq)
* @devfreq: The devfreq object.
* @nb: The notifier block to be unregistered.
--
2.24.1
The entire process only took a few minutes. Of course, I then found that
somebody else had fixed it in a separate tree, highlighting another lesson:
always check linux-next to see if a problem has been fixed before you dig
into it.
Other fixes will take longer, especially those relating to structure
members or function parameters that lack documentation. In such cases, it
is necessary to work out what the role of those members or parameters is
and describe them correctly. Overall, this task gets a little tedious at
times, but it's highly important. If we can actually eliminate warnings
from the documentation build, then we can start expecting developers to
avoid adding new ones.
Languishing kerneldoc comments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Developers are encouraged to write kerneldoc comments for their code, but
many of those comments are never pulled into the docs build. That makes
this information harder to find and, for example, makes Sphinx unable to
generate links to that documentation. Adding ``kernel-doc`` directives to
the documentation to bring those comments in can help the community derive
the full value of the work that has gone into creating them.
The ``scripts/find-unused-docs.sh`` tool can be used to find these
overlooked comments.
Note that the most value comes from pulling in the documentation for
exported functions and data structures. Many subsystems also have
kerneldoc comments for internal use; those should not be pulled into the
documentation build unless they are placed in a document that is
specifically aimed at developers working within the relevant subsystem.
Typo fixes
~~~~~~~~~~
Fixing typographical or formatting errors in the documentation is a quick
way to figure out how to create and send patches, and it is a useful
service. I am always willing to accept such patches. That said, once you
have fixed a few, please consider moving on to more advanced tasks, leaving
some typos for the next beginner to address.
Please note that some things are *not* typos and should not be "fixed":
- Both American and British English spellings are allowed within the
kernel documentation. There is no need to fix one by replacing it with
the other.
- The question of whether a period should be followed by one or two spaces
is not to be debated in the context of kernel documentation. Other
areas of rational disagreement, such as the "Oxford comma", are also
off-topic here.
As with any patch to any project, please consider whether your change is
really making things better.
Ancient documentation
~~~~~~~~~~~~~~~~~~~~~
Some kernel documentation is current, maintained, and useful. Some
documentation is ... not. Dusty, old, and inaccurate documentation can
mislead readers and casts doubt on our documentation as a whole. Anything
that can be done to address such problems is more than welcome.
Whenever you are working with a document, please consider whether it is
current, whether it needs updating, or whether it should perhaps be removed
altogether. There are a number of warning signs that you can pay attention
to here:
- References to 2.x kernels
- Pointers to SourceForge repositories
- Nothing but typo fixes in the history for several years
- Discussion of pre-Git workflows
The best thing to do, of course, would be to bring the documentation
current, adding whatever information is needed. Such work often requires
the cooperation of developers familiar with the subsystem in question, of
course. Developers are often more than willing to cooperate with people
working to improve the documentation when asked nicely, and when their
answers are listened to and acted upon.
Some documentation is beyond hope; we occasionally find documents that
refer to code that was removed from the kernel long ago, for example.
There is surprising resistance to removing obsolete documentation, but we
should do that anyway. Extra cruft in our documentation helps nobody.
In cases where there is perhaps some useful information in a badly outdated
document, and you are unable to update it, the best thing to do may be to
add a warning at the beginning. The following text is recommended::
.. warning ::
This document is outdated and in need of attention. Please use
this information with caution, and please consider sending patches
to update it.
That way, at least our long-suffering readers have been warned that the
document may lead them astray.
Documentation coherency
~~~~~~~~~~~~~~~~~~~~~~~
The old-timers around here will remember the Linux books that showed up on
the shelves in the 1990s. They were simply collections of documentation
files scrounged from various locations on the net. The books have (mostly)
improved since then, but the kernel's documentation is still mostly built
on that model. It is thousands of files, almost each of which was written
in isolation from all of the others. We don't have a coherent body of
kernel documentation; we have thousands of individual documents.
We have been trying to improve the situation through the creation of
a set of "books" that group documentation for specific readers. These
include:
- :doc:`../admin-guide/index`
- :doc:`../core-api/index`
- :doc:`../driver-api/index`
- :doc:`../userspace-api/index`
As well as this book on documentation itself.
Moving documents into the appropriate books is an important task and needs
to continue. There are a couple of challenges associated with this work,
though. Moving documentation files creates short-term pain for the people
who work with those files; they are understandably unenthusiastic about
such changes. Usually the case can be made to move a document once; we
really don't want to keep shifting them around, though.
Even when all documents are in the right place, though, we have only
managed to turn a big pile into a group of smaller piles. The work of
trying to knit all of those documents together into a single whole has not
yet begun. If you have bright ideas on how we could proceed on that front,
we would be more than happy to hear them.
Stylesheet improvements
~~~~~~~~~~~~~~~~~~~~~~~
With the adoption of Sphinx we have much nicer-looking HTML output than we
once did. But it could still use a lot of improvement; Donald Knuth and
Edward Tufte would be unimpressed. That requires tweaking our stylesheets
to create more typographically sound, accessible, and readable output.
Be warned: if you take on this task you are heading into classic bikeshed
territory. Expect a lot of opinions and discussion for even relatively
obvious changes. That is, alas, the nature of the world we live in.
Non-LaTeX PDF build
~~~~~~~~~~~~~~~~~~~
This is a decidedly nontrivial task for somebody with a lot of time and
Python skills. The Sphinx toolchain is relatively small and well
contained; it is easy to add to a development system. But building PDF or
EPUB output requires installing LaTeX, which is anything but small or well
contained. That would be a nice thing to eliminate.
The original hope had been to use the rst2pdf tool (https://rst2pdf.org/)
for PDF generation, but it turned out to not be up to the task.
Development work on rst2pdf seems to have picked up again in recent times,
though, which is a hopeful sign. If a suitably motivated developer were to
work with that project to make rst2pdf work with the kernel documentation
build, the world would be eternally grateful.
Write more documentation
~~~~~~~~~~~~~~~~~~~~~~~~
Naturally, there are massive parts of the kernel that are severely
underdocumented. If you have the knowledge to document a specific kernel
subsystem and the desire to do so, please do not hesitate to do some
writing and contribute the result to the kernel. Untold numbers of kernel
developers and users will thank you.
...@@ -10,6 +10,8 @@ How to write kernel documentation ...@@ -10,6 +10,8 @@ How to write kernel documentation
sphinx sphinx
kernel-doc kernel-doc
parse-headers parse-headers
contributing
maintainer-profile
.. only:: subproject and html .. only:: subproject and html
......
.. SPDX-License-Identifier: GPL-2.0
Documentation subsystem maintainer entry profile
================================================
The documentation "subsystem" is the central coordinating point for the
kernel's documentation and associated infrastructure. It covers the
hierarchy under Documentation/ (with the exception of
Documentation/device-tree), various utilities under scripts/ and, at least
some of the time, LICENSES/.
It's worth noting, though, that the boundaries of this subsystem are rather
fuzzier than normal. Many other subsystem maintainers like to keep control
of portions of Documentation/, and many more freely apply changes there
when it is convenient. Beyond that, much of the kernel's documentation is
found in the source as kerneldoc comments; those are usually (but not
always) maintained by the relevant subsystem maintainer.
The mailing list for documentation is linux-doc@vger.kernel.org. Patches
should be made against the docs-next tree whenever possible.
Submit checklist addendum
-------------------------
When making documentation changes, you should actually build the
documentation and ensure that no new errors or warnings have been
introduced. Generating HTML documents and looking at the result will help
to avoid unsightly misunderstandings about how things will be rendered.
Key cycle dates
---------------
Patches can be sent anytime, but response will be slower than usual during
the merge window. The docs tree tends to close late before the merge
window opens, since the risk of regressions from documentation patches is
low.
Review cadence
--------------
I am the sole maintainer for the documentation subsystem, and I am doing
the work on my own time, so the response to patches will occasionally be
slow. I try to always send out a notification when a patch is merged (or
when I decide that one cannot be). Do not hesitate to send a ping if you
have not heard back within a week of sending a patch.
...@@ -9,7 +9,7 @@ also be requested by userspace. ...@@ -9,7 +9,7 @@ also be requested by userspace.
IN-KERNEL AUTOMOUNTING IN-KERNEL AUTOMOUNTING
====================== ======================
See section "Mount Traps" of Documentation/filesystems/autofs.txt See section "Mount Traps" of Documentation/filesystems/autofs.rst
Then from userspace, you can just do something like: Then from userspace, you can just do something like:
......
...@@ -47,4 +47,6 @@ Documentation for filesystem implementations. ...@@ -47,4 +47,6 @@ Documentation for filesystem implementations.
:maxdepth: 2 :maxdepth: 2
autofs autofs
overlayfs
virtiofs virtiofs
vfat
################################################################################
# #
# NFS/RDMA README #
# #
################################################################################
Author: NetApp and Open Grid Computing
Date: May 29, 2008
Table of Contents
~~~~~~~~~~~~~~~~~
- Overview
- Getting Help
- Installation
- Check RDMA and NFS Setup
- NFS/RDMA Setup
Overview
~~~~~~~~
This document describes how to install and setup the Linux NFS/RDMA client
and server software.
The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
was first included in the following release, Linux 2.6.25.
In our testing, we have obtained excellent performance results (full 10Gbit
wire bandwidth at minimal client CPU) under many workloads. The code passes
the full Connectathon test suite and operates over both Infiniband and iWARP
RDMA adapters.
Getting Help
~~~~~~~~~~~~
If you get stuck, you can ask questions on the
nfs-rdma-devel@lists.sourceforge.net
mailing list.
Installation
~~~~~~~~~~~~
These instructions are a step by step guide to building a machine for
use with NFS/RDMA.
- Install an RDMA device
Any device supported by the drivers in drivers/infiniband/hw is acceptable.
Testing has been performed using several Mellanox-based IB cards, the
Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
- Install a Linux distribution and tools
The first kernel release to contain both the NFS/RDMA client and server was
Linux 2.6.25 Therefore, a distribution compatible with this and subsequent
Linux kernel release should be installed.
The procedures described in this document have been tested with
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
- Install nfs-utils-1.1.2 or greater on the client
An NFS/RDMA mount point can be obtained by using the mount.nfs command in
nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
version with support for NFS/RDMA mounts, but for various reasons we
recommend using nfs-utils-1.1.2 or greater). To see which version of
mount.nfs you are using, type:
$ /sbin/mount.nfs -V
If the version is less than 1.1.2 or the command does not exist,
you should install the latest version of nfs-utils.
Download the latest package from:
http://www.kernel.org/pub/linux/utils/nfs
Uncompress the package and follow the installation instructions.
If you will not need the idmapper and gssd executables (you do not need
these to create an NFS/RDMA enabled mount command), the installation
process can be simplified by disabling these features when running
configure:
$ ./configure --disable-gss --disable-nfsv4
To build nfs-utils you will need the tcp_wrappers package installed. For
more information on this see the package's README and INSTALL files.
After building the nfs-utils package, there will be a mount.nfs binary in
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
or v4 mounts. To initiate a v4 mount, the binary must be called
mount.nfs4. The standard technique is to create a symlink called
mount.nfs4 to mount.nfs.
This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
In this location, mount.nfs will be invoked automatically for NFS mounts
by the system mount command.
NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
on the NFS client machine. You do not need this specific version of
nfs-utils on the server. Furthermore, only the mount.nfs command from
nfs-utils-1.1.2 is needed on the client.
- Install a Linux kernel with NFS/RDMA
The NFS/RDMA client and server are both included in the mainline Linux
kernel version 2.6.25 and later. This and other versions of the Linux
kernel can be found at:
https://www.kernel.org/pub/linux/kernel/
Download the sources and place them in an appropriate location.
- Configure the RDMA stack
Make sure your kernel configuration has RDMA support enabled. Under
Device Drivers -> InfiniBand support, update the kernel configuration
to enable InfiniBand support [NOTE: the option name is misleading. Enabling
InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
iWARP adapter support (amso, cxgb3, etc.).
If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
- Configure the NFS client and server
Your kernel configuration must also have NFS file system support and/or
NFS server support enabled. These and other NFS related configuration
options can be found under File Systems -> Network File Systems.
- Build, install, reboot
The NFS/RDMA code will be enabled automatically if NFS and RDMA
are turned on. The NFS/RDMA client and server are configured via the hidden
SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
value of SUNRPC_XPRT_RDMA will be:
- N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
and server will not be built
- M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
in this case the NFS/RDMA client and server will be built as modules
- Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
and server will be built into the kernel
Therefore, if you have followed the steps above and turned no NFS and RDMA,
the NFS/RDMA client and server will be built.
Build a new kernel, install it, boot it.
Check RDMA and NFS Setup
~~~~~~~~~~~~~~~~~~~~~~~~
Before configuring the NFS/RDMA software, it is a good idea to test
your new kernel to ensure that the kernel is working correctly.
In particular, it is a good idea to verify that the RDMA stack
is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
is working properly.
- Check RDMA Setup
If you built the RDMA components as modules, load them at
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
card:
$ modprobe ib_mthca
$ modprobe ib_ipoib
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
running on the network. If your IB switch has an embedded SM, you can
use it. Otherwise, you will need to run an SM, such as OpenSM, on one
of your end nodes.
If an SM is running on your network, you should see the following:
$ cat /sys/class/infiniband/driverX/ports/1/state
4: ACTIVE
where driverX is mthca0, ipath5, ehca3, etc.
To further test the InfiniBand software stack, use IPoIB (this
assumes you have two IB hosts named host1 and host2):
host1$ ip link set dev ib0 up
host1$ ip address add dev ib0 a.b.c.x
host2$ ip link set dev ib0 up
host2$ ip address add dev ib0 a.b.c.y
host1$ ping a.b.c.y
host2$ ping a.b.c.x
For other device types, follow the appropriate procedures.
- Check NFS Setup
For the NFS components enabled above (client and/or server),
test their functionality over standard Ethernet using TCP/IP or UDP/IP.
NFS/RDMA Setup
~~~~~~~~~~~~~~
We recommend that you use two machines, one to act as the client and
one to act as the server.
One time configuration:
- On the server system, configure the /etc/exports file and
start the NFS/RDMA server.
Exports entries with the following formats have been tested:
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
The IP address(es) is(are) the client's IPoIB address for an InfiniBand
HCA or the client's iWARP address(es) for an RNIC.
NOTE: The "insecure" option must be used because the NFS/RDMA client does
not use a reserved port.
Each time a machine boots:
- Load and configure the RDMA drivers
For InfiniBand using a Mellanox adapter:
$ modprobe ib_mthca
$ modprobe ib_ipoib
$ ip li set dev ib0 up
$ ip addr add dev ib0 a.b.c.d
NOTE: use unique addresses for the client and server
- Start the NFS server
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
kernel config), load the RDMA transport module:
$ modprobe svcrdma
Regardless of how the server was built (module or built-in), start the
server:
$ /etc/init.d/nfs start
or
$ service nfs start
Instruct the server to listen on the RDMA transport:
$ echo rdma 20049 > /proc/fs/nfsd/portlist
- On the client system
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
kernel config), load the RDMA client module:
$ modprobe xprtrdma.ko
Regardless of how the client was built (module or built-in), use this
command to mount the NFS/RDMA server:
$ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
the "proto" field for the given mount.
Congratulations! You're using NFS/RDMA!
====
VFAT
====
USING VFAT USING VFAT
---------------------------------------------------------------------- ==========
To use the vfat filesystem, use the filesystem type 'vfat'. i.e.
To use the vfat filesystem, use the filesystem type 'vfat'. i.e.::
mount -t vfat /dev/fd0 /mnt mount -t vfat /dev/fd0 /mnt
No special partition formatter is required. mkdosfs will work fine
if you want to format from within Linux. No special partition formatter is required,
'mkdosfs' will work fine if you want to format from within Linux.
VFAT MOUNT OPTIONS VFAT MOUNT OPTIONS
---------------------------------------------------------------------- ==================
uid=### -- Set the owner of all files on this filesystem.
The default is the uid of current process. **uid=###**
Set the owner of all files on this filesystem.
gid=### -- Set the group of all files on this filesystem. The default is the uid of current process.
The default is the gid of current process.
**gid=###**
umask=### -- The permission mask (for files and directories, see umask(1)). Set the group of all files on this filesystem.
The default is the umask of current process. The default is the gid of current process.
dmask=### -- The permission mask for the directory. **umask=###**
The default is the umask of current process. The permission mask (for files and directories, see *umask(1)*).
The default is the umask of current process.
fmask=### -- The permission mask for files.
The default is the umask of current process. **dmask=###**
The permission mask for the directory.
allow_utime=### -- This option controls the permission check of mtime/atime. The default is the umask of current process.
20 - If current process is in group of file's group ID, **fmask=###**
you can change timestamp. The permission mask for files.
2 - Other users can change timestamp. The default is the umask of current process.
The default is set from `dmask' option. (If the directory is **allow_utime=###**
writable, utime(2) is also allowed. I.e. ~dmask & 022) This option controls the permission check of mtime/atime.
Normally utime(2) checks current process is owner of **-20**: If current process is in group of file's group ID,
the file, or it has CAP_FOWNER capability. But FAT you can change timestamp.
filesystem doesn't have uid/gid on disk, so normal
check is too unflexible. With this option you can **-2**: Other users can change timestamp.
relax it.
The default is set from dmask option. If the directory is
codepage=### -- Sets the codepage number for converting to shortname writable, utime(2) is also allowed. i.e. ~dmask & 022.
characters on FAT filesystem.
By default, FAT_DEFAULT_CODEPAGE setting is used. Normally utime(2) checks current process is owner of
the file, or it has CAP_FOWNER capability. But FAT
iocharset=<name> -- Character set to use for converting between the filesystem doesn't have uid/gid on disk, so normal
encoding is used for user visible filename and 16 bit check is too unflexible. With this option you can
Unicode characters. Long filenames are stored on disk relax it.
in Unicode format, but Unix for the most part doesn't
know how to deal with Unicode. **codepage=###**
By default, FAT_DEFAULT_IOCHARSET setting is used. Sets the codepage number for converting to shortname
characters on FAT filesystem.
There is also an option of doing UTF-8 translations By default, FAT_DEFAULT_CODEPAGE setting is used.
with the utf8 option.
**iocharset=<name>**
NOTE: "iocharset=utf8" is not recommended. If unsure, Character set to use for converting between the
you should consider the following option instead. encoding is used for user visible filename and 16 bit
Unicode characters. Long filenames are stored on disk
utf8=<bool> -- UTF-8 is the filesystem safe version of Unicode that in Unicode format, but Unix for the most part doesn't
is used by the console. It can be enabled or disabled know how to deal with Unicode.
for the filesystem with this option. By default, FAT_DEFAULT_IOCHARSET setting is used.
If 'uni_xlate' gets set, UTF-8 gets disabled.
By default, FAT_DEFAULT_UTF8 setting is used. There is also an option of doing UTF-8 translations
with the utf8 option.
uni_xlate=<bool> -- Translate unhandled Unicode characters to special
escaped sequences. This would let you backup and .. note:: ``iocharset=utf8`` is not recommended. If unsure, you should consider
restore filenames that are created with any Unicode the utf8 option instead.
characters. Until Linux supports Unicode for real,
this gives you an alternative. Without this option, **utf8=<bool>**
a '?' is used when no translation is possible. The UTF-8 is the filesystem safe version of Unicode that
escape character is ':' because it is otherwise is used by the console. It can be enabled or disabled
illegal on the vfat filesystem. The escape sequence for the filesystem with this option.
that gets used is ':' and the four digits of hexadecimal If 'uni_xlate' gets set, UTF-8 gets disabled.
unicode. By default, FAT_DEFAULT_UTF8 setting is used.
nonumtail=<bool> -- When creating 8.3 aliases, normally the alias will **uni_xlate=<bool>**
end in '~1' or tilde followed by some number. If this Translate unhandled Unicode characters to special
option is set, then if the filename is escaped sequences. This would let you backup and
"longfilename.txt" and "longfile.txt" does not restore filenames that are created with any Unicode
currently exist in the directory, 'longfile.txt' will characters. Until Linux supports Unicode for real,
be the short alias instead of 'longfi~1.txt'. this gives you an alternative. Without this option,
a '?' is used when no translation is possible. The
usefree -- Use the "free clusters" value stored on FSINFO. It'll escape character is ':' because it is otherwise
be used to determine number of free clusters without illegal on the vfat filesystem. The escape sequence
scanning disk. But it's not used by default, because that gets used is ':' and the four digits of hexadecimal
recent Windows don't update it correctly in some unicode.
case. If you are sure the "free clusters" on FSINFO is
correct, by this option you can avoid scanning disk. **nonumtail=<bool>**
When creating 8.3 aliases, normally the alias will
quiet -- Stops printing certain warning messages. end in '~1' or tilde followed by some number. If this
option is set, then if the filename is
check=s|r|n -- Case sensitivity checking setting. "longfilename.txt" and "longfile.txt" does not
s: strict, case sensitive currently exist in the directory, longfile.txt will
r: relaxed, case insensitive be the short alias instead of longfi~1.txt.
n: normal, default setting, currently case insensitive
**usefree**
nocase -- This was deprecated for vfat. Use shortname=win95 instead. Use the "free clusters" value stored on FSINFO. It will
be used to determine number of free clusters without
shortname=lower|win95|winnt|mixed scanning disk. But it's not used by default, because
-- Shortname display/create setting. recent Windows don't update it correctly in some
lower: convert to lowercase for display, case. If you are sure the "free clusters" on FSINFO is
emulate the Windows 95 rule for create. correct, by this option you can avoid scanning disk.
win95: emulate the Windows 95 rule for display/create.
winnt: emulate the Windows NT rule for display/create. **quiet**
mixed: emulate the Windows NT rule for display, Stops printing certain warning messages.
emulate the Windows 95 rule for create.
Default setting is `mixed'. **check=s|r|n**
Case sensitivity checking setting.
tz=UTC -- Interpret timestamps as UTC rather than local time.
This option disables the conversion of timestamps **s**: strict, case sensitive
between local time (as used by Windows on FAT) and UTC
(which Linux uses internally). This is particularly **r**: relaxed, case insensitive
useful when mounting devices (like digital cameras)
that are set to UTC in order to avoid the pitfalls of **n**: normal, default setting, currently case insensitive
local time.
time_offset=minutes **nocase**
-- Set offset for conversion of timestamps from local time This was deprecated for vfat. Use ``shortname=win95`` instead.
used by FAT to UTC. I.e. <minutes> minutes will be subtracted
from each timestamp to convert it to UTC used internally by **shortname=lower|win95|winnt|mixed**
Linux. This is useful when time zone set in sys_tz is Shortname display/create setting.
not the time zone used by the filesystem. Note that this
option still does not provide correct time stamps in all **lower**: convert to lowercase for display,
cases in presence of DST - time stamps in a different DST emulate the Windows 95 rule for create.
setting will be off by one hour.
**win95**: emulate the Windows 95 rule for display/create.
showexec -- If set, the execute permission bits of the file will be
allowed only if the extension part of the name is .EXE, **winnt**: emulate the Windows NT rule for display/create.
.COM, or .BAT. Not set by default.
**mixed**: emulate the Windows NT rule for display,
debug -- Can be set, but unused by the current implementation. emulate the Windows 95 rule for create.
sys_immutable -- If set, ATTR_SYS attribute on FAT is handled as Default setting is `mixed`.
IMMUTABLE flag on Linux. Not set by default.
**tz=UTC**
flush -- If set, the filesystem will try to flush to disk more Interpret timestamps as UTC rather than local time.
early than normal. Not set by default. This option disables the conversion of timestamps
between local time (as used by Windows on FAT) and UTC
rodir -- FAT has the ATTR_RO (read-only) attribute. On Windows, (which Linux uses internally). This is particularly
the ATTR_RO of the directory will just be ignored, useful when mounting devices (like digital cameras)
and is used only by applications as a flag (e.g. it's set that are set to UTC in order to avoid the pitfalls of
for the customized folder). local time.
If you want to use ATTR_RO as read-only flag even for **time_offset=minutes**
the directory, set this option. Set offset for conversion of timestamps from local time
used by FAT to UTC. I.e. <minutes> minutes will be subtracted
errors=panic|continue|remount-ro from each timestamp to convert it to UTC used internally by
-- specify FAT behavior on critical errors: panic, continue Linux. This is useful when time zone set in ``sys_tz`` is
without doing anything or remount the partition in not the time zone used by the filesystem. Note that this
read-only mode (default behavior). option still does not provide correct time stamps in all
cases in presence of DST - time stamps in a different DST
discard -- If set, issues discard/TRIM commands to the block setting will be off by one hour.
device when blocks are freed. This is useful for SSD devices
and sparse/thinly-provisoned LUNs. **showexec**
If set, the execute permission bits of the file will be
nfs=stale_rw|nostale_ro allowed only if the extension part of the name is .EXE,
Enable this only if you want to export the FAT filesystem .COM, or .BAT. Not set by default.
over NFS.
**debug**
stale_rw: This option maintains an index (cache) of directory Can be set, but unused by the current implementation.
inodes by i_logstart which is used by the nfs-related code to
**sys_immutable**
If set, ATTR_SYS attribute on FAT is handled as
IMMUTABLE flag on Linux. Not set by default.
**flush**
If set, the filesystem will try to flush to disk more
early than normal. Not set by default.
**rodir**
FAT has the ATTR_RO (read-only) attribute. On Windows,
the ATTR_RO of the directory will just be ignored,
and is used only by applications as a flag (e.g. it's set
for the customized folder).
If you want to use ATTR_RO as read-only flag even for
the directory, set this option.
**errors=panic|continue|remount-ro**
specify FAT behavior on critical errors: panic, continue
without doing anything or remount the partition in
read-only mode (default behavior).
**discard**
If set, issues discard/TRIM commands to the block
device when blocks are freed. This is useful for SSD devices
and sparse/thinly-provisoned LUNs.
**nfs=stale_rw|nostale_ro**
Enable this only if you want to export the FAT filesystem
over NFS.
**stale_rw**: This option maintains an index (cache) of directory
*inodes* by *i_logstart* which is used by the nfs-related code to
improve look-ups. Full file operations (read/write) over NFS is improve look-ups. Full file operations (read/write) over NFS is
supported but with cache eviction at NFS server, this could supported but with cache eviction at NFS server, this could
result in ESTALE issues. result in ESTALE issues.
nostale_ro: This option bases the inode number and filehandle **nostale_ro**: This option bases the *inode* number and filehandle
on the on-disk location of a file in the MS-DOS directory entry. on the on-disk location of a file in the MS-DOS directory entry.
This ensures that ESTALE will not be returned after a file is This ensures that ESTALE will not be returned after a file is
evicted from the inode cache. However, it means that operations evicted from the inode cache. However, it means that operations
...@@ -170,63 +210,59 @@ nfs=stale_rw|nostale_ro ...@@ -170,63 +210,59 @@ nfs=stale_rw|nostale_ro
potentially causing data corruption. For this reason, this potentially causing data corruption. For this reason, this
option also mounts the filesystem readonly. option also mounts the filesystem readonly.
To maintain backward compatibility, '-o nfs' is also accepted, To maintain backward compatibility, ``'-o nfs'`` is also accepted,
defaulting to stale_rw defaulting to "stale_rw".
dos1xfloppy -- If set, use a fallback default BIOS Parameter Block **dos1xfloppy <bool>: 0,1,yes,no,true,false**
configuration, determined by backing device size. These static If set, use a fallback default BIOS Parameter Block
parameters match defaults assumed by DOS 1.x for 160 kiB, configuration, determined by backing device size. These static
180 kiB, 320 kiB, and 360 kiB floppies and floppy images. parameters match defaults assumed by DOS 1.x for 160 kiB,
180 kiB, 320 kiB, and 360 kiB floppies and floppy images.
<bool>: 0,1,yes,no,true,false
LIMITATION LIMITATION
--------------------------------------------------------------------- ==========
* The fallocated region of file is discarded at umount/evict time
when using fallocate with FALLOC_FL_KEEP_SIZE. The fallocated region of file is discarded at umount/evict time
So, User should assume that fallocated region can be discarded at when using fallocate with FALLOC_FL_KEEP_SIZE.
last close if there is memory pressure resulting in eviction of So, User should assume that fallocated region can be discarded at
the inode from the memory. As a result, for any dependency on last close if there is memory pressure resulting in eviction of
the fallocated region, user should make sure to recheck fallocate the inode from the memory. As a result, for any dependency on
after reopening the file. the fallocated region, user should make sure to recheck fallocate
after reopening the file.
TODO TODO
---------------------------------------------------------------------- ====
* Need to get rid of the raw scanning stuff. Instead, always use Need to get rid of the raw scanning stuff. Instead, always use
a get next directory entry approach. The only thing left that uses a get next directory entry approach. The only thing left that uses
raw scanning is the directory renaming code. raw scanning is the directory renaming code.
POSSIBLE PROBLEMS POSSIBLE PROBLEMS
---------------------------------------------------------------------- =================
* vfat_valid_longname does not properly checked reserved names.
* When a volume name is the same as a directory name in the root - vfat_valid_longname does not properly checked reserved names.
- When a volume name is the same as a directory name in the root
directory of the filesystem, the directory name sometimes shows directory of the filesystem, the directory name sometimes shows
up as an empty file. up as an empty file.
* autoconv option does not work correctly. - autoconv option does not work correctly.
BUG REPORTS
----------------------------------------------------------------------
If you have trouble with the VFAT filesystem, mail bug reports to
chaffee@bmrc.cs.berkeley.edu. Please specify the filename
and the operation that gave you trouble.
TEST SUITE TEST SUITE
---------------------------------------------------------------------- ==========
If you plan to make any modifications to the vfat filesystem, please If you plan to make any modifications to the vfat filesystem, please
get the test suite that comes with the vfat distribution at get the test suite that comes with the vfat distribution at
http://web.archive.org/web/*/http://bmrc.berkeley.edu/ `<http://web.archive.org/web/*/http://bmrc.berkeley.edu/people/chaffee/vfat.html>`_
people/chaffee/vfat.html
This tests quite a few parts of the vfat filesystem and additional This tests quite a few parts of the vfat filesystem and additional
tests for new features or untested features would be appreciated. tests for new features or untested features would be appreciated.
NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM
---------------------------------------------------------------------- =============================================
(This documentation was provided by Galen C. Hunt <gchunt@cs.rochester.edu> This documentation was provided by Galen C. Hunt gchunt@cs.rochester.edu and
and lightly annotated by Gordon Chaffee). lightly annotated by Gordon Chaffee.
This document presents a very rough, technical overview of my This document presents a very rough, technical overview of my
knowledge of the extended FAT file system used in Windows NT 3.5 and knowledge of the extended FAT file system used in Windows NT 3.5 and
...@@ -234,30 +270,31 @@ Windows 95. I don't guarantee that any of the following is correct, ...@@ -234,30 +270,31 @@ Windows 95. I don't guarantee that any of the following is correct,
but it appears to be so. but it appears to be so.
The extended FAT file system is almost identical to the FAT The extended FAT file system is almost identical to the FAT
file system used in DOS versions up to and including 6.223410239847 file system used in DOS versions up to and including *6.223410239847*
:-). The significant change has been the addition of long file names. :-). The significant change has been the addition of long file names.
These names support up to 255 characters including spaces and lower These names support up to 255 characters including spaces and lower
case characters as opposed to the traditional 8.3 short names. case characters as opposed to the traditional 8.3 short names.
Here is the description of the traditional FAT entry in the current Here is the description of the traditional FAT entry in the current
Windows 95 filesystem: Windows 95 filesystem::
struct directory { // Short 8.3 names struct directory { // Short 8.3 names
unsigned char name[8]; // file name unsigned char name[8]; // file name
unsigned char ext[3]; // file extension unsigned char ext[3]; // file extension
unsigned char attr; // attribute byte unsigned char attr; // attribute byte
unsigned char lcase; // Case for base and extension unsigned char lcase; // Case for base and extension
unsigned char ctime_ms; // Creation time, milliseconds unsigned char ctime_ms; // Creation time, milliseconds
unsigned char ctime[2]; // Creation time unsigned char ctime[2]; // Creation time
unsigned char cdate[2]; // Creation date unsigned char cdate[2]; // Creation date
unsigned char adate[2]; // Last access date unsigned char adate[2]; // Last access date
unsigned char reserved[2]; // reserved values (ignored) unsigned char reserved[2]; // reserved values (ignored)
unsigned char time[2]; // time stamp unsigned char time[2]; // time stamp
unsigned char date[2]; // date stamp unsigned char date[2]; // date stamp
unsigned char start[2]; // starting cluster number unsigned char start[2]; // starting cluster number
unsigned char size[4]; // size of the file unsigned char size[4]; // size of the file
}; };
The lcase field specifies if the base and/or the extension of an 8.3 The lcase field specifies if the base and/or the extension of an 8.3
name should be capitalized. This field does not seem to be used by name should be capitalized. This field does not seem to be used by
Windows 95 but it is used by Windows NT. The case of filenames is not Windows 95 but it is used by Windows NT. The case of filenames is not
...@@ -266,9 +303,9 @@ compatible in the reverse direction, however. Filenames that fit in ...@@ -266,9 +303,9 @@ compatible in the reverse direction, however. Filenames that fit in
the 8.3 namespace and are written on Windows NT to be lowercase will the 8.3 namespace and are written on Windows NT to be lowercase will
show up as uppercase on Windows 95. show up as uppercase on Windows 95.
Note that the "start" and "size" values are actually little .. note:: Note that the ``start`` and ``size`` values are actually little
endian integer values. The descriptions of the fields in this endian integer values. The descriptions of the fields in this
structure are public knowledge and can be found elsewhere. structure are public knowledge and can be found elsewhere.
With the extended FAT system, Microsoft has inserted extra With the extended FAT system, Microsoft has inserted extra
directory entries for any files with extended names. (Any name which directory entries for any files with extended names. (Any name which
...@@ -278,21 +315,22 @@ specially formatted directory entry which holds up to 13 characters of ...@@ -278,21 +315,22 @@ specially formatted directory entry which holds up to 13 characters of
a file's extended name. Think of slots as additional labeling for the a file's extended name. Think of slots as additional labeling for the
directory entry of the file to which they correspond. Microsoft directory entry of the file to which they correspond. Microsoft
prefers to refer to the 8.3 entry for a file as its alias and the prefers to refer to the 8.3 entry for a file as its alias and the
extended slot directory entries as the file name. extended slot directory entries as the file name.
The C structure for a slot directory entry follows: The C structure for a slot directory entry follows::
struct slot { // Up to 13 characters of a long name struct slot { // Up to 13 characters of a long name
unsigned char id; // sequence number for slot unsigned char id; // sequence number for slot
unsigned char name0_4[10]; // first 5 characters in name unsigned char name0_4[10]; // first 5 characters in name
unsigned char attr; // attribute byte unsigned char attr; // attribute byte
unsigned char reserved; // always 0 unsigned char reserved; // always 0
unsigned char alias_checksum; // checksum for 8.3 alias unsigned char alias_checksum; // checksum for 8.3 alias
unsigned char name5_10[12]; // 6 more characters in name unsigned char name5_10[12]; // 6 more characters in name
unsigned char start[2]; // starting cluster number unsigned char start[2]; // starting cluster number
unsigned char name11_12[4]; // last 2 characters in name unsigned char name11_12[4]; // last 2 characters in name
}; };
If the layout of the slots looks a little odd, it's only If the layout of the slots looks a little odd, it's only
because of Microsoft's efforts to maintain compatibility with old because of Microsoft's efforts to maintain compatibility with old
software. The slots must be disguised to prevent old software from software. The slots must be disguised to prevent old software from
...@@ -319,7 +357,7 @@ the following: ...@@ -319,7 +357,7 @@ the following:
slot has an id which marks its order in the extended file slot has an id which marks its order in the extended file
name. Here is a very abbreviated view of an 8.3 directory name. Here is a very abbreviated view of an 8.3 directory
entry and its corresponding long name slots for the file entry and its corresponding long name slots for the file
"My Big File.Extension which is long": "My Big File.Extension which is long"::
<proceeding files...> <proceeding files...>
<slot #3, id = 0x43, characters = "h is long"> <slot #3, id = 0x43, characters = "h is long">
...@@ -327,20 +365,22 @@ the following: ...@@ -327,20 +365,22 @@ the following:
<slot #1, id = 0x01, characters = "My Big File.E"> <slot #1, id = 0x01, characters = "My Big File.E">
<directory entry, name = "MYBIGFIL.EXT"> <directory entry, name = "MYBIGFIL.EXT">
Note that the slots are stored from last to first. Slots
are numbered from 1 to N. The Nth slot is or'ed with 0x40
to mark it as the last one.
2) Checksum. Each slot has an "alias_checksum" value. The .. note:: Note that the slots are stored from last to first. Slots
are numbered from 1 to N. The Nth slot is ``or'ed`` with
0x40 to mark it as the last one.
2) Checksum. Each slot has an alias_checksum value. The
checksum is calculated from the 8.3 name using the checksum is calculated from the 8.3 name using the
following algorithm: following algorithm::
for (sum = i = 0; i < 11; i++) { for (sum = i = 0; i < 11; i++) {
sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i] sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i]
} }
3) If there is free space in the final slot, a Unicode NULL (0x0000)
is stored after the final character. After that, all unused 3) If there is free space in the final slot, a Unicode ``NULL (0x0000)``
is stored after the final character. After that, all unused
characters in the final slot are set to Unicode 0xFFFF. characters in the final slot are set to Unicode 0xFFFF.
Finally, note that the extended name is stored in Unicode. Each Unicode Finally, note that the extended name is stored in Unicode. Each Unicode
......
...@@ -601,7 +601,7 @@ Defined in ``include/linux/export.h`` ...@@ -601,7 +601,7 @@ Defined in ``include/linux/export.h``
This is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol This is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol
namespace. Symbol Namespaces are documented in namespace. Symbol Namespaces are documented in
``Documentation/kbuild/namespaces.rst``. ``Documentation/core-api/symbol-namespaces.rst``.
:c:func:`EXPORT_SYMBOL_NS_GPL()` :c:func:`EXPORT_SYMBOL_NS_GPL()`
-------------------------------- --------------------------------
...@@ -610,7 +610,7 @@ Defined in ``include/linux/export.h`` ...@@ -610,7 +610,7 @@ Defined in ``include/linux/export.h``
This is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol This is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol
namespace. Symbol Namespaces are documented in namespace. Symbol Namespaces are documented in
``Documentation/kbuild/namespaces.rst``. ``Documentation/core-api/symbol-namespaces.rst``.
Routines and Conventions Routines and Conventions
======================== ========================
......
...@@ -103,8 +103,7 @@ stat_interval ...@@ -103,8 +103,7 @@ stat_interval
Number of seconds between statistics-related printk()s. Number of seconds between statistics-related printk()s.
By default, locktorture will report stats every 60 seconds. By default, locktorture will report stats every 60 seconds.
Setting the interval to zero causes the statistics to Setting the interval to zero causes the statistics to
be printed -only- when the module is unloaded, and this be printed -only- when the module is unloaded.
is the default.
stutter stutter
The length of time to run the test before pausing for this The length of time to run the test before pausing for this
......
...@@ -99,4 +99,5 @@ to do something different in the near future. ...@@ -99,4 +99,5 @@ to do something different in the near future.
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
../doc-guide/maintainer-profile
../nvdimm/maintainer-entry-profile ../nvdimm/maintainer-entry-profile
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
==================== ====================
Xilinx SD-FEC Driver Xilinx SD-FEC Driver
==================== ====================
......
...@@ -33,7 +33,8 @@ Those tests need to be passed before the patches go upstream, but not ...@@ -33,7 +33,8 @@ Those tests need to be passed before the patches go upstream, but not
necessarily before initial posting. Contact the list if you need help necessarily before initial posting. Contact the list if you need help
getting the test environment set up. getting the test environment set up.
### ACPI Device Specific Methods (_DSM) ACPI Device Specific Methods (_DSM)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before patches enabling for a new _DSM family will be considered it must Before patches enabling for a new _DSM family will be considered it must
be assigned a format-interface-code from the NVDIMM Sub-team of the ACPI be assigned a format-interface-code from the NVDIMM Sub-team of the ACPI
Specification Working Group. In general, the stance of the subsystem is Specification Working Group. In general, the stance of the subsystem is
......
.. _embargoed_hardware_issues:
Embargoed hardware issues Embargoed hardware issues
========================= =========================
...@@ -36,7 +38,10 @@ issue according to our documented process. ...@@ -36,7 +38,10 @@ issue according to our documented process.
The list is encrypted and email to the list can be sent by either PGP or The list is encrypted and email to the list can be sent by either PGP or
S/MIME encrypted and must be signed with the reporter's PGP key or S/MIME S/MIME encrypted and must be signed with the reporter's PGP key or S/MIME
certificate. The list's PGP key and S/MIME certificate are available from certificate. The list's PGP key and S/MIME certificate are available from
https://www.kernel.org/.... the following URLs:
- PGP: https://www.kernel.org/static/files/hardware-security.asc
- S/MIME: https://www.kernel.org/static/files/hardware-security.crt
While hardware security issues are often handled by the affected hardware While hardware security issues are often handled by the affected hardware
vendor, we welcome contact from researchers or individuals who have vendor, we welcome contact from researchers or individuals who have
...@@ -55,14 +60,14 @@ Operation of mailing-lists ...@@ -55,14 +60,14 @@ Operation of mailing-lists
^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^
The encrypted mailing-lists which are used in our process are hosted on The encrypted mailing-lists which are used in our process are hosted on
Linux Foundation's IT infrastructure. By providing this service Linux Linux Foundation's IT infrastructure. By providing this service, members
Foundation's director of IT Infrastructure security technically has the of Linux Foundation's IT operations personnel technically have the
ability to access the embargoed information, but is obliged to ability to access the embargoed information, but are obliged to
confidentiality by his employment contract. Linux Foundation's director of confidentiality by their employment contract. Linux Foundation IT
IT Infrastructure security is also responsible for the kernel.org personnel are also responsible for operating and managing the rest of
infrastructure. kernel.org infrastructure.
The Linux Foundation's current director of IT Infrastructure security is The Linux Foundation's current director of IT Project infrastructure is
Konstantin Ryabitsev. Konstantin Ryabitsev.
...@@ -274,7 +279,7 @@ software decrypts the email and re-encrypts it individually for each ...@@ -274,7 +279,7 @@ software decrypts the email and re-encrypts it individually for each
subscriber with the subscriber's PGP key or S/MIME certificate. Details subscriber with the subscriber's PGP key or S/MIME certificate. Details
about the mailing-list software and the setup which is used to ensure the about the mailing-list software and the setup which is used to ensure the
security of the lists and protection of the data can be found here: security of the lists and protection of the data can be found here:
https://www.kernel.org/.... https://korg.wiki.kernel.org/userdoc/remail.
List keys List keys
^^^^^^^^^ ^^^^^^^^^
......
...@@ -22,7 +22,7 @@ The following 64-byte header is present in decompressed Linux kernel image:: ...@@ -22,7 +22,7 @@ The following 64-byte header is present in decompressed Linux kernel image::
u64 res2 = 0; /* Reserved */ u64 res2 = 0; /* Reserved */
u64 magic = 0x5643534952; /* Magic number, little endian, "RISCV" */ u64 magic = 0x5643534952; /* Magic number, little endian, "RISCV" */
u32 magic2 = 0x05435352; /* Magic number 2, little endian, "RSC\x05" */ u32 magic2 = 0x05435352; /* Magic number 2, little endian, "RSC\x05" */
u32 res4; /* Reserved for PE COFF offset */ u32 res3; /* Reserved for PE COFF offset */
This header format is compliant with PE/COFF header and largely inspired from This header format is compliant with PE/COFF header and largely inspired from
ARM64 header. Thus, both ARM64 & RISC-V header can be combined into one common ARM64 header. Thus, both ARM64 & RISC-V header can be combined into one common
...@@ -34,7 +34,7 @@ Notes ...@@ -34,7 +34,7 @@ Notes
- This header can also be reused to support EFI stub for RISC-V in future. EFI - This header can also be reused to support EFI stub for RISC-V in future. EFI
specification needs PE/COFF image header in the beginning of the kernel image specification needs PE/COFF image header in the beginning of the kernel image
in order to load it as an EFI application. In order to support EFI stub, in order to load it as an EFI application. In order to support EFI stub,
code0 should be replaced with "MZ" magic string and res5(at offset 0x3c) should code0 should be replaced with "MZ" magic string and res3(at offset 0x3c) should
point to the rest of the PE/COFF header. point to the rest of the PE/COFF header.
- version field indicate header version number - version field indicate header version number
......
...@@ -5,8 +5,13 @@ ...@@ -5,8 +5,13 @@
# has been done. # has been done.
# #
from docutils import nodes from docutils import nodes
import sphinx
from sphinx import addnodes from sphinx import addnodes
from sphinx.environment import NoUri if sphinx.version_info[0] < 2 or \
sphinx.version_info[0] == 2 and sphinx.version_info[1] < 1:
from sphinx.environment import NoUri
else:
from sphinx.errors import NoUri
import re import re
# #
......
...@@ -95,7 +95,8 @@ of ftrace. Here is a list of some of the key files: ...@@ -95,7 +95,8 @@ of ftrace. Here is a list of some of the key files:
current_tracer: current_tracer:
This is used to set or display the current tracer This is used to set or display the current tracer
that is configured. that is configured. Changing the current tracer clears
the ring buffer content as well as the "snapshot" buffer.
available_tracers: available_tracers:
...@@ -126,7 +127,8 @@ of ftrace. Here is a list of some of the key files: ...@@ -126,7 +127,8 @@ of ftrace. Here is a list of some of the key files:
This file holds the output of the trace in a human This file holds the output of the trace in a human
readable format (described below). Note, tracing is temporarily readable format (described below). Note, tracing is temporarily
disabled when the file is open for reading. Once all readers disabled when the file is open for reading. Once all readers
are closed, tracing is re-enabled. are closed, tracing is re-enabled. Opening this file for
writing with the O_TRUNC flag clears the ring buffer content.
trace_pipe: trace_pipe:
...@@ -185,7 +187,8 @@ of ftrace. Here is a list of some of the key files: ...@@ -185,7 +187,8 @@ of ftrace. Here is a list of some of the key files:
CPU buffer and not total size of all buffers. The CPU buffer and not total size of all buffers. The
trace buffers are allocated in pages (blocks of memory trace buffers are allocated in pages (blocks of memory
that the kernel uses for allocation, usually 4 KB in size). that the kernel uses for allocation, usually 4 KB in size).
If the last page allocated has room for more bytes A few extra pages may be allocated to accommodate buffer management
meta-data. If the last page allocated has room for more bytes
than requested, the rest of the page will be used, than requested, the rest of the page will be used,
making the actual allocation bigger than requested or shown. making the actual allocation bigger than requested or shown.
( Note, the size may not be a multiple of the page size ( Note, the size may not be a multiple of the page size
...@@ -235,7 +238,7 @@ of ftrace. Here is a list of some of the key files: ...@@ -235,7 +238,7 @@ of ftrace. Here is a list of some of the key files:
This interface also allows for commands to be used. See the This interface also allows for commands to be used. See the
"Filter commands" section for more details. "Filter commands" section for more details.
As a speed up, since processing strings can't be quite expensive As a speed up, since processing strings can be quite expensive
and requires a check of all functions registered to tracing, instead and requires a check of all functions registered to tracing, instead
an index can be written into this file. A number (starting with "1") an index can be written into this file. A number (starting with "1")
written will instead select the same corresponding at the line position written will instead select the same corresponding at the line position
...@@ -382,7 +385,7 @@ of ftrace. Here is a list of some of the key files: ...@@ -382,7 +385,7 @@ of ftrace. Here is a list of some of the key files:
By default, 128 comms are saved (see "saved_cmdlines" above). To By default, 128 comms are saved (see "saved_cmdlines" above). To
increase or decrease the amount of comms that are cached, echo increase or decrease the amount of comms that are cached, echo
in a the number of comms to cache, into this file. the number of comms to cache into this file.
saved_tgids: saved_tgids:
...@@ -490,6 +493,9 @@ of ftrace. Here is a list of some of the key files: ...@@ -490,6 +493,9 @@ of ftrace. Here is a list of some of the key files:
# echo global > trace_clock # echo global > trace_clock
Setting a clock clears the ring buffer content as well as the
"snapshot" buffer.
trace_marker: trace_marker:
This is a very useful file for synchronizing user space This is a very useful file for synchronizing user space
...@@ -3324,7 +3330,7 @@ directories after it is created. ...@@ -3324,7 +3330,7 @@ directories after it is created.
As you can see, the new directory looks similar to the tracing directory As you can see, the new directory looks similar to the tracing directory
itself. In fact, it is very similar, except that the buffer and itself. In fact, it is very similar, except that the buffer and
events are agnostic from the main director, or from any other events are agnostic from the main directory, or from any other
instances that are created. instances that are created.
The files in the new directory work just like the files with the The files in the new directory work just like the files with the
......
...@@ -37,7 +37,7 @@ commit_page - a pointer to the page with the last finished non-nested write. ...@@ -37,7 +37,7 @@ commit_page - a pointer to the page with the last finished non-nested write.
cmpxchg - hardware-assisted atomic transaction that performs the following: cmpxchg - hardware-assisted atomic transaction that performs the following:
A = B iff previous A == C A = B if previous A == C
R = cmpxchg(A, C, B) is saying that we replace A with B if and only if R = cmpxchg(A, C, B) is saying that we replace A with B if and only if
current A is equal to C, and we put the old (current) A into R current A is equal to C, and we put the old (current) A into R
......
...@@ -2413,7 +2413,7 @@ _않습니다_. ...@@ -2413,7 +2413,7 @@ _않습니다_.
알고 있는, - inb() 나 writel() 과 같은 - 적절한 액세스 루틴을 통해 이루어져야만 알고 있는, - inb() 나 writel() 과 같은 - 적절한 액세스 루틴을 통해 이루어져야만
합니다. 이것들은 대부분의 경우에는 명시적 메모리 배리어 와 함께 사용될 필요가 합니다. 이것들은 대부분의 경우에는 명시적 메모리 배리어 와 함께 사용될 필요가
없습니다만, 완화된 메모리 액세스 속성으로 I/O 메모리 윈도우로의 참조를 위해 없습니다만, 완화된 메모리 액세스 속성으로 I/O 메모리 윈도우로의 참조를 위해
액세스 함수가 사용된다면 순서를 강제하기 위해 _madatory_ 메모리 배리어가 액세스 함수가 사용된다면 순서를 강제하기 위해 _mandatory_ 메모리 배리어가
필요합니다. 필요합니다.
더 많은 정보를 위해선 Documentation/driver-api/device-io.rst 를 참고하십시오. 더 많은 정보를 위해선 Documentation/driver-api/device-io.rst 를 참고하십시오.
...@@ -2528,7 +2528,7 @@ I/O 액세스를 통한 주변장치와의 통신은 아키텍쳐와 기기에 ...@@ -2528,7 +2528,7 @@ I/O 액세스를 통한 주변장치와의 통신은 아키텍쳐와 기기에
이것들은 readX() 와 writeX() 랑 비슷하지만, 더 완화된 메모리 순서 이것들은 readX() 와 writeX() 랑 비슷하지만, 더 완화된 메모리 순서
보장을 제공합니다. 구체적으로, 이것들은 일반적 메모리 액세스나 delay() 보장을 제공합니다. 구체적으로, 이것들은 일반적 메모리 액세스나 delay()
루프 (예:앞의 2-5 항목) 에 대해 순서를 보장하지 않습니다만 디폴트 I/O 루프 (예:앞의 2-5 항목) 에 대해 순서를 보장하지 않습니다만 디폴트 I/O
기능으로 매핑된 __iomem 포인터에 대해 동작할 때, 같은 CPU 쓰레드에 의 기능으로 매핑된 __iomem 포인터에 대해 동작할 때, 같은 CPU 쓰레드에 의
같은 주변장치로의 액세스에는 순서가 맞춰질 것이 보장됩니다. 같은 주변장치로의 액세스에는 순서가 맞춰질 것이 보장됩니다.
(*) readsX(), writesX(): (*) readsX(), writesX():
......
.. include:: ../disclaimer-zh_CN.rst
:Original: :ref:`Documentation/process/embargoed-hardware-issues.rst <embargoed_hardware_issues>`
:Translator: Alex Shi <alex.shi@linux.alibaba.com>
被限制的硬件问题
================
范围
----
导致安全问题的硬件问题与只影响Linux内核的纯软件错误是不同的安全错误类别。
必须区别对待诸如熔毁(Meltdown)、Spectre、L1TF等硬件问题,因为它们通常会影响
所有操作系统(“OS”),因此需要在不同的OS供应商、发行版、硬件供应商和其他各方
之间进行协调。对于某些问题,软件缓解可能依赖于微码或固件更新,这需要进一步的
协调。
.. _zh_Contact:
接触
----
Linux内核硬件安全小组独立于普通的Linux内核安全小组。
该小组只负责协调被限制的硬件安全问题。Linux内核中纯软件安全漏洞的报告不由该
小组处理,报告者将被引导至常规Linux内核安全小组(:ref:`Documentation/admin-guide/
<securitybugs>`)联系。
可以通过电子邮件 <hardware-security@kernel.org> 与小组联系。这是一份私密的安全
官名单,他们将帮助您根据我们的文档化流程协调问题。
邮件列表是加密的,发送到列表的电子邮件可以通过PGP或S/MIME加密,并且必须使用报告
者的PGP密钥或S/MIME证书签名。该列表的PGP密钥和S/MIME证书可从
https://www.kernel.org/.... 获得。
虽然硬件安全问题通常由受影响的硬件供应商处理,但我们欢迎发现潜在硬件缺陷的研究
人员或个人与我们联系。
硬件安全官
^^^^^^^^^^
目前的硬件安全官小组:
- Linus Torvalds(Linux基金会院士)
- Greg Kroah Hartman(Linux基金会院士)
- Thomas Gleixner(Linux基金会院士)
邮件列表的操作
^^^^^^^^^^^^^^
处理流程中使用的加密邮件列表托管在Linux Foundation的IT基础设施上。通过提供这项
服务,Linux基金会的IT基础设施安全总监在技术上有能力访问被限制的信息,但根据他
的雇佣合同,他必须保密。Linux基金会的IT基础设施安全总监还负责 kernel.org 基础
设施。
Linux基金会目前的IT基础设施安全总监是 Konstantin Ryabitsev。
保密协议
--------
Linux内核硬件安全小组不是正式的机构,因此无法签订任何保密协议。核心社区意识到
这些问题的敏感性,并提供了一份谅解备忘录。
谅解备忘录
----------
Linux内核社区深刻理解在不同操作系统供应商、发行商、硬件供应商和其他各方之间
进行协调时,保持硬件安全问题处于限制状态的要求。
Linux内核社区在过去已经成功地处理了硬件安全问题,并且有必要的机制允许在限制
限制下进行符合社区的开发。
Linux内核社区有一个专门的硬件安全小组负责初始联系,并监督在限制规则下处理
此类问题的过程。
硬件安全小组确定开发人员(领域专家),他们将组成特定问题的初始响应小组。最初
的响应小组可以引入更多的开发人员(领域专家)以最佳的技术方式解决这个问题。
所有相关开发商承诺遵守限制规定,并对收到的信息保密。违反承诺将导致立即从当前
问题中排除,并从所有相关邮件列表中删除。此外,硬件安全小组还将把违反者排除在
未来的问题之外。这一后果的影响在我们社区是一种非常有效的威慑。如果发生违规
情况,硬件安全小组将立即通知相关方。如果您或任何人发现潜在的违规行为,请立即
向硬件安全人员报告。
流程
^^^^
由于Linux内核开发的全球分布式特性,面对面的会议几乎不可能解决硬件安全问题。
由于时区和其他因素,电话会议很难协调,只能在绝对必要时使用。加密电子邮件已被
证明是解决此类问题的最有效和最安全的通信方法。
开始披露
""""""""
披露内容首先通过电子邮件联系Linux内核硬件安全小组。此初始联系人应包含问题的
描述和任何已知受影响硬件的列表。如果您的组织制造或分发受影响的硬件,我们建议
您也考虑哪些其他硬件可能会受到影响。
硬件安全小组将提供一个特定于事件的加密邮件列表,用于与报告者进行初步讨论、
进一步披露和协调。
硬件安全小组将向披露方提供一份开发人员(领域专家)名单,在与开发人员确认他们
将遵守本谅解备忘录和文件化流程后,应首先告知开发人员有关该问题的信息。这些开发
人员组成初始响应小组,并在初始接触后负责处理问题。硬件安全小组支持响应小组,
但不一定参与缓解开发过程。
虽然个别开发人员可能通过其雇主受到保密协议的保护,但他们不能以Linux内核开发
人员的身份签订个别保密协议。但是,他们将同意遵守这一书面程序和谅解备忘录。
披露方应提供已经或应该被告知该问题的所有其他实体的联系人名单。这有几个目的:
- 披露的实体列表允许跨行业通信,例如其他操作系统供应商、硬件供应商等。
- 可联系已披露的实体,指定应参与缓解措施开发的专家。
- 如果需要处理某一问题的专家受雇于某一上市实体或某一上市实体的成员,则响应
小组可要求该实体披露该专家。这确保专家也是实体反应小组的一部分。
披露
""""
披露方通过特定的加密邮件列表向初始响应小组提供详细信息。
根据我们的经验,这些问题的技术文档通常是一个足够的起点,最好通过电子邮件进行
进一步的技术澄清。
缓解开发
""""""""
初始响应小组设置加密邮件列表,或在适当的情况下重新修改现有邮件列表。
使用邮件列表接近于正常的Linux开发过程,并且在过去已经成功地用于为各种硬件安全
问题开发缓解措施。
邮件列表的操作方式与正常的Linux开发相同。发布、讨论和审查修补程序,如果同意,
则应用于非公共git存储库,参与开发人员只能通过安全连接访问该存储库。存储库包含
针对主线内核的主开发分支,并根据需要为稳定的内核版本提供向后移植分支。
最初的响应小组将根据需要从Linux内核开发人员社区中确定更多的专家。引进专家可以
在开发过程中的任何时候发生,需要及时处理。
如果专家受雇于披露方提供的披露清单上的实体或其成员,则相关实体将要求其参与。
否则,披露方将被告知专家参与的情况。谅解备忘录涵盖了专家,要求披露方确认参与。
如果披露方有令人信服的理由提出异议,则必须在五个工作日内提出异议,并立即与事件
小组解决。如果披露方在五个工作日内未作出回应,则视为默许。
在确认或解决异议后,专家由事件小组披露,并进入开发过程。
协调发布
""""""""
有关各方将协商限制结束的日期和时间。此时,准备好的缓解措施集成到相关的内核树中
并发布。
虽然我们理解硬件安全问题需要协调限制时间,但限制时间应限制在所有有关各方制定、
测试和准备缓解措施所需的最短时间内。人为地延长限制时间以满足会议讨论日期或其他
非技术原因,会给相关的开发人员和响应小组带来了更多的工作和负担,因为补丁需要
保持最新,以便跟踪正在进行的上游内核开发,这可能会造成冲突的更改。
CVE分配
"""""""
硬件安全小组和初始响应小组都不分配CVE,开发过程也不需要CVE。如果CVE是由披露方
提供的,则可用于文档中。
流程专使
--------
为了协助这一进程,我们在各组织设立了专使,他们可以回答有关报告流程和进一步处理
的问题或提供指导。专使不参与特定问题的披露,除非响应小组或相关披露方提出要求。
现任专使名单:
============= ========================================================
ARM
AMD Tom Lendacky <tom.lendacky@amd.com>
IBM
Intel Tony Luck <tony.luck@intel.com>
Qualcomm Trilok Soni <tsoni@codeaurora.org>
Microsoft Sasha Levin <sashal@kernel.org>
VMware
Xen Andrew Cooper <andrew.cooper3@citrix.com>
Canonical Tyler Hicks <tyhicks@canonical.com>
Debian Ben Hutchings <ben@decadent.org.uk>
Oracle Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Red Hat Josh Poimboeuf <jpoimboe@redhat.com>
SUSE Jiri Kosina <jkosina@suse.cz>
Amazon
Google Kees Cook <keescook@chromium.org>
============= ========================================================
如果要将您的组织添加到专使名单中,请与硬件安全小组联系。被提名的专使必须完全
理解和支持我们的过程,并且在Linux内核社区中很容易联系。
加密邮件列表
------------
我们使用加密邮件列表进行通信。这些列表的工作原理是,发送到列表的电子邮件使用
列表的PGP密钥或列表的/MIME证书进行加密。邮件列表软件对电子邮件进行解密,并
使用订阅者的PGP密钥或S/MIME证书为每个订阅者分别对其进行重新加密。有关邮件列表
软件和用于确保列表安全和数据保护的设置的详细信息,请访问:
https://www.kernel.org/....
关键点
^^^^^^
初次接触见 :ref:`zh_Contact`. 对于特定于事件的邮件列表,密钥和S/MIME证书通过
特定列表发送的电子邮件传递给订阅者。
订阅事件特定列表
^^^^^^^^^^^^^^^^
订阅由响应小组处理。希望参与通信的披露方将潜在订户的列表发送给响应组,以便
响应组可以验证订阅请求。
每个订户都需要通过电子邮件向响应小组发送订阅请求。电子邮件必须使用订阅服务器
的PGP密钥或S/MIME证书签名。如果使用PGP密钥,则必须从公钥服务器获得该密钥,
并且理想情况下该密钥连接到Linux内核的PGP信任网。另请参见:
https://www.kernel.org/signature.html.
响应小组验证订阅者,并将订阅者添加到列表中。订阅后,订阅者将收到来自邮件列表
的电子邮件,该邮件列表使用列表的PGP密钥或列表的/MIME证书签名。订阅者的电子邮件
客户端可以从签名中提取PGP密钥或S/MIME证书,以便订阅者可以向列表发送加密电子
邮件。
...@@ -31,6 +31,8 @@ ...@@ -31,6 +31,8 @@
development-process development-process
email-clients email-clients
license-rules license-rules
kernel-enforcement-statement
kernel-driver-statement
其它大多数开发人员感兴趣的社区指南: 其它大多数开发人员感兴趣的社区指南:
...@@ -43,6 +45,7 @@ ...@@ -43,6 +45,7 @@
stable-api-nonsense stable-api-nonsense
stable-kernel-rules stable-kernel-rules
management-style management-style
embargoed-hardware-issues
这些是一些总体技术指南,由于缺乏更好的地方,现在已经放在这里 这些是一些总体技术指南,由于缺乏更好的地方,现在已经放在这里
......
.. _cn_process_statement_driver:
.. include:: ../disclaimer-zh_CN.rst
:Original: :ref:`Documentation/process/kernel-driver-statement.rst <process_statement_driver>`
:Translator: Alex Shi <alex.shi@linux.alibaba.com>
内核驱动声明
------------
关于Linux内核模块的立场声明
===========================
我们,以下署名的Linux内核开发人员,认为任何封闭源Linux内核模块或驱动程序都是
有害的和不可取的。我们已经一再发现它们对Linux用户,企业和更大的Linux生态系统
有害。这样的模块否定了Linux开发模型的开放性,稳定性,灵活性和可维护性,并使
他们的用户无法使用Linux社区的专业知识。提供闭源内核模块的供应商迫使其客户
放弃Linux的主要优势或选择新的供应商。因此,为了充分利用开源所提供的成本节省和
共享支持优势,我们敦促供应商采取措施,以开源内核代码在Linux上为其客户提供支持。
我们只为自己说话,而不是我们今天可能会为之工作,过去或将来会为之工作的任何公司。
- Dave Airlie
- Nick Andrew
- Jens Axboe
- Ralf Baechle
- Felipe Balbi
- Ohad Ben-Cohen
- Muli Ben-Yehuda
- Jiri Benc
- Arnd Bergmann
- Thomas Bogendoerfer
- Vitaly Bordug
- James Bottomley
- Josh Boyer
- Neil Brown
- Mark Brown
- David Brownell
- Michael Buesch
- Franck Bui-Huu
- Adrian Bunk
- François Cami
- Ralph Campbell
- Luiz Fernando N. Capitulino
- Mauro Carvalho Chehab
- Denis Cheng
- Jonathan Corbet
- Glauber Costa
- Alan Cox
- Magnus Damm
- Ahmed S. Darwish
- Robert P. J. Day
- Hans de Goede
- Arnaldo Carvalho de Melo
- Helge Deller
- Jean Delvare
- Mathieu Desnoyers
- Sven-Thorsten Dietrich
- Alexey Dobriyan
- Daniel Drake
- Alex Dubov
- Randy Dunlap
- Michael Ellerman
- Pekka Enberg
- Jan Engelhardt
- Mark Fasheh
- J. Bruce Fields
- Larry Finger
- Jeremy Fitzhardinge
- Mike Frysinger
- Kumar Gala
- Robin Getz
- Liam Girdwood
- Jan-Benedict Glaw
- Thomas Gleixner
- Brice Goglin
- Cyrill Gorcunov
- Andy Gospodarek
- Thomas Graf
- Krzysztof Halasa
- Harvey Harrison
- Stephen Hemminger
- Michael Hennerich
- Tejun Heo
- Benjamin Herrenschmidt
- Kristian Høgsberg
- Henrique de Moraes Holschuh
- Marcel Holtmann
- Mike Isely
- Takashi Iwai
- Olof Johansson
- Dave Jones
- Jesper Juhl
- Matthias Kaehlcke
- Kenji Kaneshige
- Jan Kara
- Jeremy Kerr
- Russell King
- Olaf Kirch
- Roel Kluin
- Hans-Jürgen Koch
- Auke Kok
- Peter Korsgaard
- Jiri Kosina
- Aaro Koskinen
- Mariusz Kozlowski
- Greg Kroah-Hartman
- Michael Krufky
- Aneesh Kumar
- Clemens Ladisch
- Christoph Lameter
- Gunnar Larisch
- Anders Larsen
- Grant Likely
- John W. Linville
- Yinghai Lu
- Tony Luck
- Pavel Machek
- Matt Mackall
- Paul Mackerras
- Roland McGrath
- Patrick McHardy
- Kyle McMartin
- Paul Menage
- Thierry Merle
- Eric Miao
- Akinobu Mita
- Ingo Molnar
- James Morris
- Andrew Morton
- Paul Mundt
- Oleg Nesterov
- Luca Olivetti
- S.Çağlar Onur
- Pierre Ossman
- Keith Owens
- Venkatesh Pallipadi
- Nick Piggin
- Nicolas Pitre
- Evgeniy Polyakov
- Richard Purdie
- Mike Rapoport
- Sam Ravnborg
- Gerrit Renker
- Stefan Richter
- David Rientjes
- Luis R. Rodriguez
- Stefan Roese
- Francois Romieu
- Rami Rosen
- Stephen Rothwell
- Maciej W. Rozycki
- Mark Salyzyn
- Yoshinori Sato
- Deepak Saxena
- Holger Schurig
- Amit Shah
- Yoshihiro Shimoda
- Sergei Shtylyov
- Kay Sievers
- Sebastian Siewior
- Rik Snel
- Jes Sorensen
- Alexey Starikovskiy
- Alan Stern
- Timur Tabi
- Hirokazu Takata
- Eliezer Tamir
- Eugene Teo
- Doug Thompson
- FUJITA Tomonori
- Dmitry Torokhov
- Marcelo Tosatti
- Steven Toth
- Theodore Tso
- Matthias Urlichs
- Geert Uytterhoeven
- Arjan van de Ven
- Ivo van Doorn
- Rik van Riel
- Wim Van Sebroeck
- Hans Verkuil
- Horst H. von Brand
- Dmitri Vorobiev
- Anton Vorontsov
- Daniel Walker
- Johannes Weiner
- Harald Welte
- Matthew Wilcox
- Dan J. Williams
- Darrick J. Wong
- David Woodhouse
- Chris Wright
- Bryan Wu
- Rafael J. Wysocki
- Herbert Xu
- Vlad Yasevich
- Peter Zijlstra
- Bartlomiej Zolnierkiewicz
.. _cn_process_statement_kernel:
.. include:: ../disclaimer-zh_CN.rst
:Original: :ref:`Documentation/process/kernel-enforcement-statement.rst <process_statement_kernel>`
:Translator: Alex Shi <alex.shi@linux.alibaba.com>
Linux 内核执行声明
------------------
作为Linux内核的开发人员,我们对如何使用我们的软件以及如何实施软件许可证有着
浓厚的兴趣。遵守GPL-2.0的互惠共享义务对我们软件和社区的长期可持续性至关重要。
虽然有权强制执行对我们社区的贡献中的单独版权权益,但我们有共同的利益,即确保
个人强制执行行动的方式有利于我们的社区,不会对我们软件生态系统的健康和增长
产生意外的负面影响。为了阻止无益的执法行动,我们同意代表我们自己和我们版权
利益的任何继承人对Linux内核用户作出以下符合我们开发社区最大利益的承诺:
尽管有GPL-2.0的终止条款,我们同意,采用以下GPL-3.0条款作为我们许可证下的
附加许可,作为任何对许可证下权利的非防御性主张,这符合我们开发社区的最佳
利益。
但是,如果您停止所有违反本许可证的行为,则您从特定版权持有人处获得的
许可证将被恢复:(a)暂时恢复,除非版权持有人明确并最终终止您的许可证;
以及(b)永久恢复, 如果版权持有人未能在你终止违反后60天内以合理方式
通知您违反本许可证的行为,则永久恢复您的许可证。
此外,如果版权所有者以某种合理的方式通知您违反了本许可,这是您第一次
从该版权所有者处收到违反本许可的通知(对于任何作品),并且您在收到通知
后的30天内纠正违规行为。则您从特定版权所有者处获得的许可将永久恢复.
我们提供这些保证的目的是鼓励更多地使用该软件。我们希望公司和个人使用、修改和
分发此软件。我们希望以公开和透明的方式与用户合作,以消除我们对法规遵从性或强制
执行的任何不确定性,这些不确定性可能会限制我们软件的采用。我们将法律行动视为
最后手段,只有在其他社区努力未能解决这一问题时才采取行动。
最后,一旦一个不合规问题得到解决,我们希望用户会感到欢迎,加入我们为之努力的
这个项目。共同努力,我们会更强大。
除了下面提到的以外,我们只为自己说话,而不是为今天、过去或将来可能为之工作的
任何公司说话。
- Laura Abbott
- Bjorn Andersson (Linaro)
- Andrea Arcangeli
- Neil Armstrong
- Jens Axboe
- Pablo Neira Ayuso
- Khalid Aziz
- Ralf Baechle
- Felipe Balbi
- Arnd Bergmann
- Ard Biesheuvel
- Tim Bird
- Paolo Bonzini
- Christian Borntraeger
- Mark Brown (Linaro)
- Paul Burton
- Javier Martinez Canillas
- Rob Clark
- Kees Cook (Google)
- Jonathan Corbet
- Dennis Dalessandro
- Vivien Didelot (Savoir-faire Linux)
- Hans de Goede
- Mel Gorman (SUSE)
- Sven Eckelmann
- Alex Elder (Linaro)
- Fabio Estevam
- Larry Finger
- Bhumika Goyal
- Andy Gross
- Juergen Gross
- Shawn Guo
- Ulf Hansson
- Stephen Hemminger (Microsoft)
- Tejun Heo
- Rob Herring
- Masami Hiramatsu
- Michal Hocko
- Simon Horman
- Johan Hovold (Hovold Consulting AB)
- Christophe JAILLET
- Olof Johansson
- Lee Jones (Linaro)
- Heiner Kallweit
- Srinivas Kandagatla
- Jan Kara
- Shuah Khan (Samsung)
- David Kershner
- Jaegeuk Kim
- Namhyung Kim
- Colin Ian King
- Jeff Kirsher
- Greg Kroah-Hartman (Linux Foundation)
- Christian König
- Vinod Koul
- Krzysztof Kozlowski
- Viresh Kumar
- Aneesh Kumar K.V
- Julia Lawall
- Doug Ledford
- Chuck Lever (Oracle)
- Daniel Lezcano
- Shaohua Li
- Xin Long
- Tony Luck
- Catalin Marinas (Arm Ltd)
- Mike Marshall
- Chris Mason
- Paul E. McKenney
- Arnaldo Carvalho de Melo
- David S. Miller
- Ingo Molnar
- Kuninori Morimoto
- Trond Myklebust
- Martin K. Petersen (Oracle)
- Borislav Petkov
- Jiri Pirko
- Josh Poimboeuf
- Sebastian Reichel (Collabora)
- Guenter Roeck
- Joerg Roedel
- Leon Romanovsky
- Steven Rostedt (VMware)
- Frank Rowand
- Ivan Safonov
- Anna Schumaker
- Jes Sorensen
- K.Y. Srinivasan
- David Sterba (SUSE)
- Heiko Stuebner
- Jiri Kosina (SUSE)
- Willy Tarreau
- Dmitry Torokhov
- Linus Torvalds
- Thierry Reding
- Rik van Riel
- Luis R. Rodriguez
- Geert Uytterhoeven (Glider bvba)
- Eduardo Valentin (Amazon.com)
- Daniel Vetter
- Linus Walleij
- Richard Weinberger
- Dan Williams
- Rafael J. Wysocki
- Arvind Yadav
- Masahiro Yamada
- Wei Yongjun
- Lv Zheng
- Marc Zyngier (Arm Ltd)
...@@ -22,11 +22,9 @@ USB support ...@@ -22,11 +22,9 @@ USB support
misc_usbsevseg misc_usbsevseg
mtouchusb mtouchusb
ohci ohci
rio
usbip_protocol usbip_protocol
usbmon usbmon
usb-serial usb-serial
wusb-design-overview
usb-help usb-help
text_files text_files
......
...@@ -16,12 +16,6 @@ USB devfs drop permissions source ...@@ -16,12 +16,6 @@ USB devfs drop permissions source
.. literalinclude:: usbdevfs-drop-permissions.c .. literalinclude:: usbdevfs-drop-permissions.c
:language: c :language: c
WUSB command line script to manipulate auth credentials
-------------------------------------------------------
.. literalinclude:: wusb-cbaf
:language: shell
Credits Credits
------- -------
......
...@@ -44,7 +44,7 @@ that the ID used should be same for both master and slave driver loading. ...@@ -44,7 +44,7 @@ that the ID used should be same for both master and slave driver loading.
e.g:: e.g::
insmod omap_hdq.ko W1_ID=2 insmod omap_hdq.ko W1_ID=2
inamod w1_bq27000.ko F_ID=2 insmod w1_bq27000.ko F_ID=2
The driver also supports 1-wire mode. In this mode, there is no need to The driver also supports 1-wire mode. In this mode, there is no need to
pass slave ID as parameter. The driver will auto-detect slaves connected pass slave ID as parameter. The driver will auto-detect slaves connected
......
...@@ -69,11 +69,12 @@ Protocol 2.13 (Kernel 3.14) Support 32- and 64-bit flags being set in ...@@ -69,11 +69,12 @@ Protocol 2.13 (Kernel 3.14) Support 32- and 64-bit flags being set in
xloadflags to support booting a 64-bit kernel from 32-bit xloadflags to support booting a 64-bit kernel from 32-bit
EFI EFI
Protocol 2.14: BURNT BY INCORRECT COMMIT ae7e1238e68f2a472a125673ab506d49158c1889 Protocol 2.14 BURNT BY INCORRECT COMMIT
ae7e1238e68f2a472a125673ab506d49158c1889
(x86/boot: Add ACPI RSDP address to setup_header) (x86/boot: Add ACPI RSDP address to setup_header)
DO NOT USE!!! ASSUME SAME AS 2.13. DO NOT USE!!! ASSUME SAME AS 2.13.
Protocol 2.15: (Kernel 5.5) Added the kernel_info and kernel_info.setup_type_max. Protocol 2.15 (Kernel 5.5) Added the kernel_info and kernel_info.setup_type_max.
============= ============================================================ ============= ============================================================
.. note:: .. note::
...@@ -834,14 +835,14 @@ Protocol: 2.09+ ...@@ -834,14 +835,14 @@ Protocol: 2.09+
chunks of memory are occupied by kernel data. chunks of memory are occupied by kernel data.
Thus setup_indirect struct and SETUP_INDIRECT type were introduced in Thus setup_indirect struct and SETUP_INDIRECT type were introduced in
protocol 2.15. protocol 2.15::
struct setup_indirect { struct setup_indirect {
__u32 type; __u32 type;
__u32 reserved; /* Reserved, must be set to zero. */ __u32 reserved; /* Reserved, must be set to zero. */
__u64 len; __u64 len;
__u64 addr; __u64 addr;
}; };
The type member is a SETUP_INDIRECT | SETUP_* type. However, it cannot be The type member is a SETUP_INDIRECT | SETUP_* type. However, it cannot be
SETUP_INDIRECT itself since making the setup_indirect a tree structure SETUP_INDIRECT itself since making the setup_indirect a tree structure
...@@ -849,19 +850,19 @@ Protocol: 2.09+ ...@@ -849,19 +850,19 @@ Protocol: 2.09+
and stack space can be limited in boot contexts. and stack space can be limited in boot contexts.
Let's give an example how to point to SETUP_E820_EXT data using setup_indirect. Let's give an example how to point to SETUP_E820_EXT data using setup_indirect.
In this case setup_data and setup_indirect will look like this: In this case setup_data and setup_indirect will look like this::
struct setup_data { struct setup_data {
__u64 next = 0 or <addr_of_next_setup_data_struct>; __u64 next = 0 or <addr_of_next_setup_data_struct>;
__u32 type = SETUP_INDIRECT; __u32 type = SETUP_INDIRECT;
__u32 len = sizeof(setup_data); __u32 len = sizeof(setup_data);
__u8 data[sizeof(setup_indirect)] = struct setup_indirect { __u8 data[sizeof(setup_indirect)] = struct setup_indirect {
__u32 type = SETUP_INDIRECT | SETUP_E820_EXT; __u32 type = SETUP_INDIRECT | SETUP_E820_EXT;
__u32 reserved = 0; __u32 reserved = 0;
__u64 len = <len_of_SETUP_E820_EXT_data>; __u64 len = <len_of_SETUP_E820_EXT_data>;
__u64 addr = <addr_of_SETUP_E820_EXT_data>; __u64 addr = <addr_of_SETUP_E820_EXT_data>;
}
} }
}
.. note:: .. note::
SETUP_INDIRECT | SETUP_NONE objects cannot be properly distinguished SETUP_INDIRECT | SETUP_NONE objects cannot be properly distinguished
...@@ -964,7 +965,7 @@ expected to copy into a setup_data chunk. ...@@ -964,7 +965,7 @@ expected to copy into a setup_data chunk.
All kernel_info data should be part of this structure. Fixed size data have to All kernel_info data should be part of this structure. Fixed size data have to
be put before kernel_info_var_len_data label. Variable size data have to be put be put before kernel_info_var_len_data label. Variable size data have to be put
after kernel_info_var_len_data label. Each chunk of variable size data has to after kernel_info_var_len_data label. Each chunk of variable size data has to
be prefixed with header/magic and its size, e.g.: be prefixed with header/magic and its size, e.g.::
kernel_info: kernel_info:
.ascii "LToP" /* Header, Linux top (structure). */ .ascii "LToP" /* Header, Linux top (structure). */
......
.. SPDX-License-Identifier: GPL-2.0 .. SPDX-License-Identifier: GPL-2.0
================ =================
Memory Managment Memory Management
================ =================
Complete virtual memory map with 4-level page tables Complete virtual memory map with 4-level page tables
==================================================== ====================================================
......
...@@ -17511,7 +17511,7 @@ F: drivers/mtd/nand/raw/vf610_nfc.c ...@@ -17511,7 +17511,7 @@ F: drivers/mtd/nand/raw/vf610_nfc.c
VFAT/FAT/MSDOS FILESYSTEM VFAT/FAT/MSDOS FILESYSTEM
M: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> M: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
S: Maintained S: Maintained
F: Documentation/filesystems/vfat.txt F: Documentation/filesystems/vfat.rst
F: fs/fat/ F: fs/fat/
VFIO DRIVER VFIO DRIVER
......
...@@ -42,7 +42,7 @@ ...@@ -42,7 +42,7 @@
* @res2: reserved * @res2: reserved
* @magic: Magic number (RISC-V specific; deprecated) * @magic: Magic number (RISC-V specific; deprecated)
* @magic2: Magic number 2 (to match the ARM64 'magic' field pos) * @magic2: Magic number 2 (to match the ARM64 'magic' field pos)
* @res4: reserved (will be used for PE COFF offset) * @res3: reserved (will be used for PE COFF offset)
* *
* The intention is for this header format to be shared between multiple * The intention is for this header format to be shared between multiple
* architectures to avoid a proliferation of image header formats. * architectures to avoid a proliferation of image header formats.
...@@ -59,7 +59,7 @@ struct riscv_image_header { ...@@ -59,7 +59,7 @@ struct riscv_image_header {
u64 res2; u64 res2;
u64 magic; u64 magic;
u32 magic2; u32 magic2;
u32 res4; u32 res3;
}; };
#endif /* __ASSEMBLY__ */ #endif /* __ASSEMBLY__ */
#endif /* _ASM_RISCV_IMAGE_H */ #endif /* _ASM_RISCV_IMAGE_H */
...@@ -54,7 +54,7 @@ for file in `find $1 -name '*.c'`; do ...@@ -54,7 +54,7 @@ for file in `find $1 -name '*.c'`; do
if [[ ${FILES_INCLUDED[$file]+_} ]]; then if [[ ${FILES_INCLUDED[$file]+_} ]]; then
continue; continue;
fi fi
str=$(scripts/kernel-doc -text -export "$file" 2>/dev/null) str=$(scripts/kernel-doc -export "$file" 2>/dev/null)
if [[ -n "$str" ]]; then if [[ -n "$str" ]]; then
echo "$file" echo "$file"
fi fi
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment