[PATCH] slab: cleanups and speedups
- enable the cpu array for all caches - remove the optimized implementations for quick list access - with cpu arrays in all caches, the list access is now rare. - make the cpu arrays mandatory, this removes 50% of the conditional branches from the hot path of kmem_cache_alloc [1] - poisoning for objects with constructors Patch got a bit longer... I forgot to mention this: head arrays mean that some pages can be blocked due to objects in the head arrays, and not returned to page_alloc.c. The current kernel never flushes the head arrays, this might worsen the behaviour of low memory systems. The hunk that flushes the arrays regularly comes next. Details changelog: [to be read site by side with the patch] * docu update * "growing" is not really needed: races between grow and shrink are handled by retrying. [additionally, the current kernel never shrinks] * move the batchcount into the cpu array: the old code contained a race during cpu cache tuning: update batchcount [in cachep] before or after the IPI? And NUMA will need it anyway. * bootstrap support: the cpu arrays are really mandatory, nothing works without them. Thus a statically allocated cpu array is needed to for starting the allocators. * move the full, partial & free lists into a separate structure, as a preparation for NUMA * structure reorganization: now the cpu arrays are the most important part, not the lists. * dead code elimination: remove "failures", nowhere read. * dead code elimination: remove "OPTIMIZE": not implemented. The idea is to skip the virt_to_page lookup for caches with on-slab slab structures, and use (ptr&PAGE_MASK) instead. The details are in Bonwicks paper. Not fully implemented. * remove GROWN: kernel never shrinks a cache, thus grown is meaningless. * bootstrap: starting the slab allocator is now a 3 stage process: - nothing works, use the statically allocated cpu arrays. - the smallest kmalloc allocator works, use it to allocate cpu arrays. - all kmalloc allocators work, use the default cpu array size * register a cpu nodifier callback, and allocate the needed head arrays if a new cpu arrives * always enable head arrays, even for DEBUG builds. Poisoning and red-zoning now happens before an object is added to the arrays. Insert enable_all_cpucaches into cpucache_init, there is no need for seperate function. * modifications to the debug checks due to the earlier calls of the dtor for caches with poisoning enabled * poison+ctor is now supported * squeezing 3 objects into a cacheline is hopeless, the FIXME is not solvable and can be removed. * add additional debug tests: check_irq_off(), check_irq_on(), check_spinlock_acquired(). * move do_ccupdate_local nearer to do_tune_cpucache. Should have been part of -04-drain. * additional objects checks. red-zoning is tricky: it's implemented by increasing the object size by 2*BYTES_PER_WORD. Thus BYTES_PER_WORD must be added to objp before calling the destructor, constructor or before returing the object from alloc. The poison functions add BYTES_PER_WORD internally. * create a flagcheck function, right now the tests are duplicated in cache_grow [always] and alloc_debugcheck_before [DEBUG only] * modify slab list updates: all allocs are now bulk allocs that try to get multiple objects at once, update the list pointers only at the end of a bulk alloc, not once per alloc. * might_sleep was moved into kmem_flagcheck. * major hotpath change: - cc always exists, no fallback - cache_alloc_refill is called with disabled interrupts, and does everything to recover from an empty cpu array. Far shorter & simpler __cache_alloc [inlined in both kmalloc and kmem_cache_alloc] * __free_block, free_block, cache_flusharray: main implementation of returning objects to the lists. no big changes, diff lost track. * new debug check: too early kmalloc or kmem_cache_alloc * slightly reduce the sizes of the cpu arrays: keep the size < a power of 2, including batchcount, avail and now limit, for optimal kmalloc memory efficiency. That's it. I even found 2 bugs while reading: dtors and ctors for verify were called with wrong parameters, with RED_ZONE enabled, and some checks still assumed that POISON and ctor are incompatible.
Showing
Please register or sign in to comment