[PATCH] don't allocate ratnodes under PF_MEMALLOC
On the swap_out() path, the radix-tree pagecache is allocating its nodes with PF_MEMALLOC set, which allows it to completely exhaust the free page lists(*). This is fairly easy to trigger with swap-intensive loads. It would be better to make those node allocations fail at an earlier time. When this happens, the radix-tree can still obtain nodes from its mempool, and we leave some memory available for the I/O layer. (Assuming that the I/O is being performed under PF_MEMALLOC, which it is). So the patch simply drops PF_MEMALLOC while adding nodes to the swapcache's tree. We're still performing atomic allocations, so the rat is still biting pretty deeply into the page reserves - under heavy load the amount of free memory is less than half of what it was pre-rat. It is unfortunate that the page allocator overloads !__GFP_WAIT to also mean "try harder". It would be better to separate these concepts, and to allow the radix-tree code (at least) to perform atomic allocations, but to not go below pages_min. It seems that __GFP_TRY_HARDER will be pretty straightforward to implement. Later. The patch also impements a workaround for the mempool list_head problem, until that is sorted out. (*) The usual result is that the SCSI layer dies at scsi_merge.c:82. It would be nice to have a fix for that - it's going BUG if 1-order allocations fail at interrupt time. That happens pretty easily.
Showing
Please register or sign in to comment