• Song Liu's avatar
    bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack() · eac9153f
    Song Liu authored
    bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
    deadlock on rq_lock():
    
    rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
    [...]
    Call Trace:
     try_to_wake_up+0x1ad/0x590
     wake_up_q+0x54/0x80
     rwsem_wake+0x8a/0xb0
     bpf_get_stack+0x13c/0x150
     bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
     bpf_overflow_handler+0x60/0x100
     __perf_event_overflow+0x4f/0xf0
     perf_swevent_overflow+0x99/0xc0
     ___perf_sw_event+0xe7/0x120
     __schedule+0x47d/0x620
     schedule+0x29/0x90
     futex_wait_queue_me+0xb9/0x110
     futex_wait+0x139/0x230
     do_futex+0x2ac/0xa50
     __x64_sys_futex+0x13c/0x180
     do_syscall_64+0x42/0x100
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    
    This can be reproduced by:
    1. Start a multi-thread program that does parallel mmap() and malloc();
    2. taskset the program to 2 CPUs;
    3. Attach bpf program to trace_sched_switch and gather stackmap with
       build-id, e.g. with trace.py from bcc tools:
       trace.py -U -p <pid> -s <some-bin,some-lib> t:sched:sched_switch
    
    A sample reproducer is attached at the end.
    
    This could also trigger deadlock with other locks that are nested with
    rq_lock.
    
    Fix this by checking whether irqs are disabled. Since rq_lock and all
    other nested locks are irq safe, it is safe to do up_read() when irqs are
    not disable. If the irqs are disabled, postpone up_read() in irq_work.
    
    Fixes: 615755a7 ("bpf: extend stackmap to save binary_build_id+offset instead of address")
    Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com
    
    Reproducer:
    ============================ 8< ============================
    
    char *filename;
    
    void *worker(void *p)
    {
            void *ptr;
            int fd;
            char *pptr;
    
            fd = open(filename, O_RDONLY);
            if (fd < 0)
                    return NULL;
            while (1) {
                    struct timespec ts = {0, 1000 + rand() % 2000};
    
                    ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                    usleep(1);
                    if (ptr == MAP_FAILED) {
                            printf("failed to mmap\n");
                            break;
                    }
                    munmap(ptr, 4096 * 64);
                    usleep(1);
                    pptr = malloc(1);
                    usleep(1);
                    pptr[0] = 1;
                    usleep(1);
                    free(pptr);
                    usleep(1);
                    nanosleep(&ts, NULL);
            }
            close(fd);
            return NULL;
    }
    
    int main(int argc, char *argv[])
    {
            void *ptr;
            int i;
            pthread_t threads[THREAD_COUNT];
    
            if (argc < 2)
                    return 0;
    
            filename = argv[1];
    
            for (i = 0; i < THREAD_COUNT; i++) {
                    if (pthread_create(threads + i, NULL, worker, NULL)) {
                            fprintf(stderr, "Error creating thread\n");
                            return 0;
                    }
            }
    
            for (i = 0; i < THREAD_COUNT; i++)
                    pthread_join(threads[i], NULL);
            return 0;
    }
    ============================ 8< ============================
    eac9153f
stackmap.c 16.6 KB