[PATCH] rmap pte_chain speedup and space saving
The pte_chains presently consist of a pte pointer and a `next' link. So there's a 50% memory wastage here as well as potential for a lot of misses during walks of the singly-linked per-page list. This patch increases the pte_chain structure to occupy a full cacheline. There are 7, 15 or 31 pte pointers per structure rather than just one. So the wastage falls to a few percent and the number of misses during the walk is reduced. The patch doesn't make much difference in simple testing, because in those tests the pte_chain list from the previous page has good cache locality with the next page's list. The patch sped up Anton's "10,000 concurrently exitting shells" test by 3x or 4x. It gives a 10% reduction in system time for a kernel build on 16p NUMAQ. It saves memory and reduces the amount of work performed in the slab allocator. Pages which are mapped by only a single process continue to not have a pte_chain. The pointer in struct page points directly at the mapping pte (a "PageDirect" pte pointer). Once the page is shared a pte_chain is allocated and both the new and old pte pointers are moved into it. We used to collapse the pte_chain back to a PageDirect representation in page_remove_rmap(). That has been changed. That collapse is now performed inside page reclaim, via page_referenced(). The thinking here is that if a page was previously shared then it may become shared again, so leave the pte_chain structure in place. But if the system is under memory pressure then start reaping them anyway.
Showing
Please register or sign in to comment