• Prarit Bhargava's avatar
    modules, lock around setting of MODULE_STATE_UNFORMED · 860ce424
    Prarit Bhargava authored
    commit d3051b48 upstream.
    
    A panic was seen in the following sitation.
    
    There are two threads running on the system. The first thread is a system
    monitoring thread that is reading /proc/modules. The second thread is
    loading and unloading a module (in this example I'm using my simple
    dummy-module.ko).  Note, in the "real world" this occurred with the qlogic
    driver module.
    
    When doing this, the following panic occurred:
    
     ------------[ cut here ]------------
     kernel BUG at kernel/module.c:3739!
     invalid opcode: 0000 [#1] SMP
     Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module]
     CPU: 37 PID: 186343 Comm: cat Tainted: GF          O--------------   3.10.0+ #7
     Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
     task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000
     RIP: 0010:[<ffffffff810d64c5>]  [<ffffffff810d64c5>] module_flags+0xb5/0xc0
     RSP: 0018:ffff88080fa7fe18  EFLAGS: 00010246
     RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000
     RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000
     RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000
     R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000
     R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000
     FS:  00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
     Stack:
      ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c
      ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00
      ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0
     Call Trace:
      [<ffffffff810d666c>] m_show+0x19c/0x1e0
      [<ffffffff811e4d7e>] seq_read+0x16e/0x3b0
      [<ffffffff812281ed>] proc_reg_read+0x3d/0x80
      [<ffffffff811c0f2c>] vfs_read+0x9c/0x170
      [<ffffffff811c1a58>] SyS_read+0x58/0xb0
      [<ffffffff81605829>] system_call_fastpath+0x16/0x1b
     Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
     RIP  [<ffffffff810d64c5>] module_flags+0xb5/0xc0
      RSP <ffff88080fa7fe18>
    
        Consider the two processes running on the system.
    
        CPU 0 (/proc/modules reader)
        CPU 1 (loading/unloading module)
    
        CPU 0 opens /proc/modules, and starts displaying data for each module by
        traversing the modules list via fs/seq_file.c:seq_open() and
        fs/seq_file.c:seq_read().  For each module in the modules list, seq_read
        does
    
                op->start()  <-- this is a pointer to m_start()
                op->show()   <- this is a pointer to m_show()
                op->stop()   <-- this is a pointer to m_stop()
    
        The m_start(), m_show(), and m_stop() module functions are defined in
        kernel/module.c. The m_start() and m_stop() functions acquire and release
        the module_mutex respectively.
    
        ie) When reading /proc/modules, the module_mutex is acquired and released
        for each module.
    
        m_show() is called with the module_mutex held.  It accesses the module
        struct data and attempts to write out module data.  It is in this code
        path that the above BUG_ON() warning is encountered, specifically m_show()
        calls
    
        static char *module_flags(struct module *mod, char *buf)
        {
                int bx = 0;
    
                BUG_ON(mod->state == MODULE_STATE_UNFORMED);
        ...
    
        The other thread, CPU 1, in unloading the module calls the syscall
        delete_module() defined in kernel/module.c.  The module_mutex is acquired
        for a short time, and then released.  free_module() is called without the
        module_mutex.  free_module() then sets mod->state = MODULE_STATE_UNFORMED,
        also without the module_mutex.  Some additional code is called and then the
        module_mutex is reacquired to remove the module from the modules list:
    
            /* Now we can delete it from the lists */
            mutex_lock(&module_mutex);
            stop_machine(__unlink_module, mod, NULL);
            mutex_unlock(&module_mutex);
    
    This is the sequence of events that leads to the panic.
    
    CPU 1 is removing dummy_module via delete_module().  It acquires the
    module_mutex, and then releases it.  CPU 1 has NOT set dummy_module->state to
    MODULE_STATE_UNFORMED yet.
    
    CPU 0, which is reading the /proc/modules, acquires the module_mutex and
    acquires a pointer to the dummy_module which is still in the modules list.
    CPU 0 calls m_show for dummy_module.  The check in m_show() for
    MODULE_STATE_UNFORMED passed for dummy_module even though it is being
    torn down.
    
    Meanwhile CPU 1, which has been continuing to remove dummy_module without
    holding the module_mutex, now calls free_module() and sets
    dummy_module->state to MODULE_STATE_UNFORMED.
    
    CPU 0 now calls module_flags() with dummy_module and ...
    
    static char *module_flags(struct module *mod, char *buf)
    {
            int bx = 0;
    
            BUG_ON(mod->state == MODULE_STATE_UNFORMED);
    
    and BOOM.
    
    Acquire and release the module_mutex lock around the setting of
    MODULE_STATE_UNFORMED in the teardown path, which should resolve the
    problem.
    
    Testing: In the unpatched kernel I can panic the system within 1 minute by
    doing
    
    while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done
    
    and
    
    while (true) do cat /proc/modules; done
    
    in separate terminals.
    
    In the patched kernel I was able to run just over one hour without seeing
    any issues.  I also verified the output of panic via sysrq-c and the output
    of /proc/modules looks correct for all three states for the dummy_module.
    
            dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-)
            dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE)
            dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+)
    Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
    Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
    Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
    860ce424
module.c 97.3 KB