• Lee, Chun-Yi's avatar
    x86/kexec: Fix kexec crash in syscall kexec_file_load() · 9a2a1db5
    Lee, Chun-Yi authored
    [ Upstream commit e3c41e37 ]
    
    The original bug is a page fault crash that sometimes happens
    on big machines when preparing ELF headers:
    
        BUG: unable to handle kernel paging request at ffffc90613fc9000
        IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
    
    The bug is caused by us under-counting the number of memory ranges
    and subsequently not allocating enough ELF header space for them.
    The bug is typically masked on smaller systems, because the ELF header
    allocation is rounded up to the next page.
    
    This patch modifies the code in fill_up_crash_elf_data() by using
    walk_system_ram_res() instead of walk_system_ram_range() to correctly
    count the max number of crash memory ranges. That's because the
    walk_system_ram_range() filters out small memory regions that
    reside in the same page, but walk_system_ram_res() does not.
    
    Here's how I found the bug:
    
    After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
    the code uses walk_system_ram_res() to fill-in crash memory regions information
    to the program header, so it counts those small memory regions that
    reside in a page area.
    
    But, when the kernel was using walk_system_ram_range() in
    fill_up_crash_elf_data() to count the number of crash memory regions,
    it filters out small regions.
    
    I printed those small memory regions, for example:
    
      kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
    
    Based on the code in walk_system_ram_range(), this memory region
    will be filtered out:
    
      pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
      end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
      end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
    
    So, the max_nr_ranges that's counted by the kernel doesn't include
    small memory regions - causing us to under-allocate the required space.
    That causes the page fault crash that happens in a later code path
    when preparing ELF headers.
    
    This bug is not easy to reproduce on small machines that have few
    CPUs, because the allocated page aligned ELF buffer has more free
    space to cover those small memory regions' PT_LOAD headers.
    Signed-off-by: default avatarLee, Chun-Yi <jlee@suse.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Jiang Liu <jiang.liu@linux.intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Cc: Takashi Iwai <tiwai@suse.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Viresh Kumar <viresh.kumar@linaro.org>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: kexec@lists.infradead.org
    Cc: linux-kernel@vger.kernel.org
    Cc: <stable@vger.kernel.org>
    Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
    9a2a1db5
crash.c 17 KB