Commit 9c015162 authored by Dennis Zhou (Facebook)'s avatar Dennis Zhou (Facebook) Committed by Tejun Heo

percpu: update the header comment and pcpu_build_alloc_info comments

The header comment for percpu memory is a little hard to parse and is
not super clear about how the first chunk is managed. This adds a
little more clarity to the situation.

There is also quite a bit of tricky logic in the pcpu_build_alloc_info.
This adds a restructure of a comment to add a little more information.
Unfortunately, you will still have to piece together a handful of other
comments too, but should help direct you to the meaningful comments.
Signed-off-by: default avatarDennis Zhou <dennisszhou@gmail.com>
Signed-off-by: default avatarTejun Heo <tj@kernel.org>
parent 6b9b6f39
...@@ -4,36 +4,35 @@ ...@@ -4,36 +4,35 @@
* Copyright (C) 2009 SUSE Linux Products GmbH * Copyright (C) 2009 SUSE Linux Products GmbH
* Copyright (C) 2009 Tejun Heo <tj@kernel.org> * Copyright (C) 2009 Tejun Heo <tj@kernel.org>
* *
* This file is released under the GPLv2. * This file is released under the GPLv2 license.
* *
* This is percpu allocator which can handle both static and dynamic * The percpu allocator handles both static and dynamic areas. Percpu
* areas. Percpu areas are allocated in chunks. Each chunk is * areas are allocated in chunks which are divided into units. There is
* consisted of boot-time determined number of units and the first * a 1-to-1 mapping for units to possible cpus. These units are grouped
* chunk is used for static percpu variables in the kernel image * based on NUMA properties of the machine.
* (special boot time alloc/init handling necessary as these areas
* need to be brought up before allocation services are running).
* Unit grows as necessary and all units grow or shrink in unison.
* When a chunk is filled up, another chunk is allocated.
* *
* c0 c1 c2 * c0 c1 c2
* ------------------- ------------------- ------------ * ------------------- ------------------- ------------
* | u0 | u1 | u2 | u3 | | u0 | u1 | u2 | u3 | | u0 | u1 | u * | u0 | u1 | u2 | u3 | | u0 | u1 | u2 | u3 | | u0 | u1 | u
* ------------------- ...... ------------------- .... ------------ * ------------------- ...... ------------------- .... ------------
* *
* Allocation is done in offset-size areas of single unit space. Ie, * Allocation is done by offsets into a unit's address space. Ie., an
* an area of 512 bytes at 6k in c1 occupies 512 bytes at 6k of c1:u0, * area of 512 bytes at 6k in c1 occupies 512 bytes at 6k in c1:u0,
* c1:u1, c1:u2 and c1:u3. On UMA, units corresponds directly to * c1:u1, c1:u2, etc. On NUMA machines, the mapping may be non-linear
* cpus. On NUMA, the mapping can be non-linear and even sparse. * and even sparse. Access is handled by configuring percpu base
* Percpu access can be done by configuring percpu base registers * registers according to the cpu to unit mappings and offsetting the
* according to cpu to unit mapping and pcpu_unit_size. * base address using pcpu_unit_size.
* *
* There are usually many small percpu allocations many of them being * There is special consideration for the first chunk which must handle
* as small as 4 bytes. The allocator organizes chunks into lists * the static percpu variables in the kernel image as allocation services
* according to free size and tries to allocate from the fullest one. * are not online yet. In short, the first chunk is structure like so:
* Each chunk keeps the maximum contiguous area size hint which is *
* guaranteed to be equal to or larger than the maximum contiguous * <Static | [Reserved] | Dynamic>
* area in the chunk. This helps the allocator not to iterate the *
* chunk maps unnecessarily. * The static data is copied from the original section managed by the
* linker. The reserved section, if non-zero, primarily manages static
* percpu variables from kernel modules. Finally, the dynamic section
* takes care of normal allocations.
* *
* Allocation state in each chunk is kept using an array of integers * Allocation state in each chunk is kept using an array of integers
* on chunk->map. A positive value in the map represents a free * on chunk->map. A positive value in the map represents a free
...@@ -43,6 +42,12 @@ ...@@ -43,6 +42,12 @@
* Chunks can be determined from the address using the index field * Chunks can be determined from the address using the index field
* in the page struct. The index field contains a pointer to the chunk. * in the page struct. The index field contains a pointer to the chunk.
* *
* These chunks are organized into lists according to free_size and
* tries to allocate from the fullest chunk first. Each chunk maintains
* a maximum contiguous area size hint which is guaranteed to be equal
* to or larger than the maximum contiguous area in the chunk. This
* helps prevent the allocator from iterating over chunks unnecessarily.
*
* To use this allocator, arch code should do the following: * To use this allocator, arch code should do the following:
* *
* - define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() to translate * - define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() to translate
...@@ -1842,6 +1847,7 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info( ...@@ -1842,6 +1847,7 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
*/ */
min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE); min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE);
/* determine the maximum # of units that can fit in an allocation */
alloc_size = roundup(min_unit_size, atom_size); alloc_size = roundup(min_unit_size, atom_size);
upa = alloc_size / min_unit_size; upa = alloc_size / min_unit_size;
while (alloc_size % upa || (offset_in_page(alloc_size / upa))) while (alloc_size % upa || (offset_in_page(alloc_size / upa)))
...@@ -1868,9 +1874,9 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info( ...@@ -1868,9 +1874,9 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
} }
/* /*
* Expand unit size until address space usage goes over 75% * Wasted space is caused by a ratio imbalance of upa to group_cnt.
* and then as much as possible without using more address * Expand the unit_size until we use >= 75% of the units allocated.
* space. * Related to atom_size, which could be much larger than the unit_size.
*/ */
last_allocs = INT_MAX; last_allocs = INT_MAX;
for (upa = max_upa; upa; upa--) { for (upa = max_upa; upa; upa--) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment