• Peter Zijlstra's avatar
    lockdep: fix oops in processing workqueue · 4d82a1de
    Peter Zijlstra authored
    Under memory load, on x86_64, with lockdep enabled, the workqueue's
    process_one_work() has been seen to oops in __lock_acquire(), barfing
    on a 0xffffffff00000000 pointer in the lockdep_map's class_cache[].
    
    Because it's permissible to free a work_struct from its callout function,
    the map used is an onstack copy of the map given in the work_struct: and
    that copy is made without any locking.
    
    Surprisingly, gcc (4.5.1 in Hugh's case) uses "rep movsl" rather than
    "rep movsq" for that structure copy: which might race with a workqueue
    user's wait_on_work() doing lock_map_acquire() on the source of the
    copy, putting a pointer into the class_cache[], but only in time for
    the top half of that pointer to be copied to the destination map.
    
    Boom when process_one_work() subsequently does lock_map_acquire()
    on its onstack copy of the lockdep_map.
    
    Fix this, and a similar instance in call_timer_fn(), with a
    lockdep_copy_map() function which additionally NULLs the class_cache[].
    
    Note: this oops was actually seen on 3.4-next, where flush_work() newly
    does the racing lock_map_acquire(); but Tejun points out that 3.4 and
    earlier are already vulnerable to the same through wait_on_work().
    
    * Patch orginally from Peter.  Hugh modified it a bit and wrote the
      description.
    Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
    Reported-by: default avatarHugh Dickins <hughd@google.com>
    LKML-Reference: <alpine.LSU.2.00.1205070951170.1544@eggly.anvils>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    4d82a1de
workqueue.c 104 KB