• Dave Hansen's avatar
    [PATCH] r/o bind mounts: track numbers of writers to mounts · 3d733633
    Dave Hansen authored
    This is the real meat of the entire series.  It actually
    implements the tracking of the number of writers to a mount.
    However, it causes scalability problems because there can be
    hundreds of cpus doing open()/close() on files on the same mnt at
    the same time.  Even an atomic_t in the mnt has massive scalaing
    problems because the cacheline gets so terribly contended.
    
    This uses a statically-allocated percpu variable.  All want/drop
    operations are local to a cpu as long that cpu operates on the same
    mount, and there are no writer count imbalances.  Writer count
    imbalances happen when a write is taken on one cpu, and released
    on another, like when an open/close pair is performed on two
    
    Upon a remount,ro request, all of the data from the percpu
    variables is collected (expensive, but very rare) and we determine
    if there are any outstanding writers to the mount.
    
    I've written a little benchmark to sit in a loop for a couple of
    seconds in several cpus in parallel doing open/write/close loops.
    
    http://sr71.net/~dave/linux/openbench.c
    
    The code in here is a a worst-possible case for this patch.  It
    does opens on a _pair_ of files in two different mounts in parallel.
    This should cause my code to lose its "operate on the same mount"
    optimization completely.  This worst-case scenario causes a 3%
    degredation in the benchmark.
    
    I could probably get rid of even this 3%, but it would be more
    complex than what I have here, and I think this is getting into
    acceptable territory.  In practice, I expect writing more than 3
    bytes to a file, as well as disk I/O to mask any effects that this
    has.
    
    (To get rid of that 3%, we could have an #defined number of mounts
    in the percpu variable.  So, instead of a CPU getting operate only
    on percpu data when it accesses only one mount, it could stay on
    percpu data when it only accesses N or fewer mounts.)
    
    [AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount
    Acked-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
    Signed-off-by: default avatarChristoph Hellwig <hch@infradead.org>
    Signed-off-by: default avatarDave Hansen <haveblue@us.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    3d733633
namespace.c 54.2 KB