Commit 7786fa9a authored by Yasunori Goto's avatar Yasunori Goto Committed by Linus Torvalds

Document lowmem_reserve_ratio

Though the lower_zone_protection was changed to lowmem_reserve_ratio, the
document has been not changed.  The lowmem_reserve_ratio seems quite hard
to estimate, but there is no guidance.  This patch is to change document
for it.
Signed-off-by: default avatarYasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent b5beb1ca
...@@ -1336,7 +1336,7 @@ legacy_va_layout ...@@ -1336,7 +1336,7 @@ legacy_va_layout
If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel
will use the legacy (2.4) layout for all processes. will use the legacy (2.4) layout for all processes.
lower_zone_protection lowmem_reserve_ratio
--------------------- ---------------------
For some specialised workloads on highmem machines it is dangerous for For some specialised workloads on highmem machines it is dangerous for
...@@ -1356,25 +1356,71 @@ captured into pinned user memory. ...@@ -1356,25 +1356,71 @@ captured into pinned user memory.
mechanism will also defend that region from allocations which could use mechanism will also defend that region from allocations which could use
highmem or lowmem). highmem or lowmem).
The `lower_zone_protection' tunable determines how aggressive the kernel is The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is
in defending these lower zones. The default value is zero - no in defending these lower zones.
protection at all.
If you have a machine which uses highmem or ISA DMA and your If you have a machine which uses highmem or ISA DMA and your
applications are using mlock(), or if you are running with no swap then applications are using mlock(), or if you are running with no swap then
you probably should increase the lower_zone_protection setting. you probably should change the lowmem_reserve_ratio setting.
The units of this tunable are fairly vague. It is approximately equal The lowmem_reserve_ratio is an array. You can see them by reading this file.
to "megabytes," so setting lower_zone_protection=100 will protect around 100 -
megabytes of the lowmem zone from user allocations. It will also make % cat /proc/sys/vm/lowmem_reserve_ratio
those 100 megabytes unavailable for use by applications and by 256 256 32
pagecache, so there is a cost. -
Note: # of this elements is one fewer than number of zones. Because the highest
The effects of this tunable may be observed by monitoring zone's value is not necessary for following calculation.
/proc/meminfo:LowFree. Write a single huge file and observe the point
at which LowFree ceases to fall. But, these values are not used directly. The kernel calculates # of protection
pages for each zones from them. These are shown as array of protection pages
A reasonable value for lower_zone_protection is 100. in /proc/zoneinfo like followings. (This is an example of x86-64 box).
Each zone has an array of protection pages like this.
-
Node 0, zone DMA
pages free 1355
min 3
low 3
high 4
:
:
numa_other 0
protection: (0, 2004, 2004, 2004)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pagesets
cpu: 0 pcp: 0
:
-
These protections are added to score to judge whether this zone should be used
for page allocation or should be reclaimed.
In this example, if normal pages (index=2) are required to this DMA zone and
pages_high is used for watermark, the kernel judges this zone should not be
used because pages_free(1355) is smaller than watermark + protection[2]
(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
normal page requirement. If requirement is DMA zone(index=0), protection[0]
(=0) is used.
zone[i]'s protection[j] is calculated by following exprssion.
(i < j):
zone[i]->protection[j]
= (total sums of present_pages from zone[i+1] to zone[j] on the node)
/ lowmem_reserve_ratio[i];
(i = j):
(should not be protected. = 0;
(i > j):
(not necessary, but looks 0)
The default values of lowmem_reserve_ratio[i] are
256 (if zone[i] means DMA or DMA32 zone)
32 (others).
As above expression, they are reciprocal number of ratio.
256 means 1/256. # of protection pages becomes about "0.39%" of total present
pages of higher zones on the node.
If you would like to protect more pages, smaller values are effective.
The minimum value is 1 (1/1 -> 100%).
page-cluster page-cluster
------------ ------------
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment