• Haoran Luo's avatar
    tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop. · 67f0d6d9
    Haoran Luo authored
    The "rb_per_cpu_empty()" misinterpret the condition (as not-empty) when
    "head_page" and "commit_page" of "struct ring_buffer_per_cpu" points to
    the same buffer page, whose "buffer_data_page" is empty and "read" field
    is non-zero.
    
    An error scenario could be constructed as followed (kernel perspective):
    
    1. All pages in the buffer has been accessed by reader(s) so that all of
    them will have non-zero "read" field.
    
    2. Read and clear all buffer pages so that "rb_num_of_entries()" will
    return 0 rendering there's no more data to read. It is also required
    that the "read_page", "commit_page" and "tail_page" points to the same
    page, while "head_page" is the next page of them.
    
    3. Invoke "ring_buffer_lock_reserve()" with large enough "length"
    so that it shot pass the end of current tail buffer page. Now the
    "head_page", "commit_page" and "tail_page" points to the same page.
    
    4. Discard current event with "ring_buffer_discard_commit()", so that
    "head_page", "commit_page" and "tail_page" points to a page whose buffer
    data page is now empty.
    
    When the error scenario has been constructed, "tracing_read_pipe" will
    be trapped inside a deadloop: "trace_empty()" returns 0 since
    "rb_per_cpu_empty()" returns 0 when it hits the CPU containing such
    constructed ring buffer. Then "trace_find_next_entry_inc()" always
    return NULL since "rb_num_of_entries()" reports there's no more entry
    to read. Finally "trace_seq_to_user()" returns "-EBUSY" spanking
    "tracing_read_pipe" back to the start of the "waitagain" loop.
    
    I've also written a proof-of-concept script to construct the scenario
    and trigger the bug automatically, you can use it to trace and validate
    my reasoning above:
    
      https://github.com/aegistudio/RingBufferDetonator.git
    
    Tests has been carried out on linux kernel 5.14-rc2
    (2734d6c1), my fixed version
    of kernel (for testing whether my update fixes the bug) and
    some older kernels (for range of affected kernels). Test result is
    also attached to the proof-of-concept repository.
    
    Link: https://lore.kernel.org/linux-trace-devel/YPaNxsIlb2yjSi5Y@aegistudio/
    Link: https://lore.kernel.org/linux-trace-devel/YPgrN85WL9VyrZ55@aegistudio
    
    Cc: stable@vger.kernel.org
    Fixes: bf41a158 ("ring-buffer: make reentrant")
    Suggested-by: default avatarLinus Torvalds <torvalds@linuxfoundation.org>
    Signed-off-by: default avatarHaoran Luo <www@aegistudio.net>
    Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
    67f0d6d9
ring_buffer.c 160 KB