• Frederic Weisbecker's avatar
    perf lock: Drop the buffers multiplexing dependency · b67577df
    Frederic Weisbecker authored
    We need to deal with time ordered events to build a correct
    state machine of lock events. This is why we multiplex the lock
    events buffers. But the ordering is done from the kernel, on
    the tracing fast path, leading to high contention between cpus.
    
    Without multiplexing, the events appears in a weak order.
    If we have four events, each split per cpu, perf record will
    read the events buffers in the following order:
    
    [ CPU0 ev0, CPU0 ev1, CPU0 ev3, CPU0 ev4, CPU1 ev0, CPU1 ev0....]
    
    To handle a post processing reordering, we could just read and sort
    the whole in memory, but it just doesn't scale with high amounts
    of events: lock events can fill huge amounts in few times.
    
    Basically we need to sort in memory and find a "grace period"
    point when we know that a given slice of previously sorted events
    can be committed for post-processing, so that we can unload the
    memory usage step by step and keep a scalable sorting list.
    
    There is no strong rules about how to define such "grace period".
    What does this patch is:
    
    We define a FLUSH_PERIOD value that defines a grace period in
    seconds.
    We want to have a slice of events covering 2 * FLUSH_PERIOD in our
    sorted list.
    If FLUSH_PERIOD is big enough, it ensures every events that occured
    in the first half of the timeslice have all been buffered and there
    are none remaining and there won't be further to put inside this
    first timeslice. Then once we reach the 2 * FLUSH_PERIOD
    timeslice, we flush the first half to be gentle with the memory
    (the second half can still get new events in the middle, so wait
    another period to flush it)
    
    FLUSH_PERIOD is defined to 5 seconds. Say the first event started on
    time t0. We can safely assume that at the time we are processing
    events of t0 + 10 seconds, ther won't be anymore events to read
    from perf.data that occured between t0 and t0 + 5 seconds. Hence
    we can safely flush the first half.
    
    To point out funky bugs, we have a guardian that checks a new event
    timestamp is not below the last event's timestamp flushed and that
    displays a warning in this case.
    Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
    Cc: Masami Hiramatsu <mhiramat@redhat.com>
    Cc: Jens Axboe <jens.axboe@oracle.com>
    b67577df
builtin-lock.c 19 KB