• Linus Torvalds's avatar
    mm: remove per-zone hashtable of bitlock waitqueues · 9dcb8b68
    Linus Torvalds authored
    The per-zone waitqueues exist because of a scalability issue with the
    page waitqueues on some NUMA machines, but it turns out that they hurt
    normal loads, and now with the vmalloced stacks they also end up
    breaking gfs2 that uses a bit_wait on a stack object:
    
         wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)
    
    where 'gh' can be a reference to the local variable 'mount_gh' on the
    stack of fill_super().
    
    The reason the per-zone hash table breaks for this case is that there is
    no "zone" for virtual allocations, and trying to look up the physical
    page to get at it will fail (with a BUG_ON()).
    
    It turns out that I actually complained to the mm people about the
    per-zone hash table for another reason just a month ago: the zone lookup
    also hurts the regular use of "unlock_page()" a lot, because the zone
    lookup ends up forcing several unnecessary cache misses and generates
    horrible code.
    
    As part of that earlier discussion, we had a much better solution for
    the NUMA scalability issue - by just making the page lock have a
    separate contention bit, the waitqueue doesn't even have to be looked at
    for the normal case.
    
    Peter Zijlstra already has a patch for that, but let's see if anybody
    even notices.  In the meantime, let's fix the actual gfs2 breakage by
    simplifying the bitlock waitqueues and removing the per-zone issue.
    Reported-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
    Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Steven Whitehouse <swhiteho@redhat.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9dcb8b68
memory_hotplug.c 55.4 KB