• Andrew Morton's avatar
    [PATCH] /proc/kcore fixes · 9243548a
    Andrew Morton authored
    From: Tony Luck <tony.luck@intel.com>
    
    /proc/kcore has been broken on some architectures for a long time.  Problems
    surround the fact that some architectures allocate memory for vmalloc() and
    thus modules at addresses below PAGE_OFFSET, which results in negative file
    offsets in the virtual core file image provided by /proc/kcore.  There are
    also pending problems for discontig memory systems as /proc/kcore just
    pretends that there are no holes between "PAGE_OFFSET" and "high_memory", so
    an unwary user (ok super-user) can read non-existant memory which may do bad
    things.  There may also be kernel objects that would be nice to view in
    /proc/kcore, but do not show up there.
    
    A pending change on ia64 to allow booting on machines that don't have
    physical memory in any convenient pre-determined place will make things even
    worse, because the kernel itself won't show up in the current implementation
    of /proc/kcore!
    
    The patch attached provides enough hooks that each architecture should be
    able to make /proc/kcore useful.  The patch is INCOMPLETE in that *use* of
    those hooks is ONLY PROVIDED FOR IA64.
    
    Here's how it works.  The default code in fs/proc/kcore.c doesn't set up any
    "elf_phdr" sections ...  it is left to each architecture to make appropriate
    calls to "kclist_add()" to specify a base address and size for each piece of
    kernel virtual address space that needs to be made accessible through
    /proc/kcore.  To get the old functionality, you'll need two calls that look
    something like:
    
     kclist_add(&kcore_mem, __va(0),
                 max_low_pfn * PAGE_SIZE);
     kclist_add(&kcore_vmem, (void *)VMALLOC_START,
                 VMALLOC_END-VMALLOC_START);
    
    The first makes all of memory visible (__i386__, __mc68000__ and __x86_64__
    should use __va(PAGE_SIZE) to duplicate the original lack of access to page
    0).  The second provides a single map for all "vmalloc" space (the code still
    searches the vmlist to see what actually exists before accessing it).
    
    Other blocks of kernel virtual space can be added as needed, and removed
    again (with kclist_del()).  E.g.  discontiguous memory machines can add an
    entry for each block of memory.  Architectures that allocate memory for
    modules someplace outside of vmalloc-land can add/remove entries on module
    insert and remove.
    
    The second piece of abstraction is the kc_vaddr_to_offset() and
    kc_offset_to_vaddr() macros.  These provide mappings from kernel virtual
    addresses to offsets in the virtual file that /proc/kcore instantiates.  I
    hope they are sufficient to avoid negative offset problems that plagued the
    old /proc/kcore.  Default versions are provided for the old behaviour
    (mapping simply adds/subtracts PAGE_OFFSET).  For ia64 I just need to use a
    different offset as all kernel virtual allocations are in the high 37.5% of
    the 64-bit virtual address space.  x86_64 was the other architecture with
    this problem.  I don't know enough (anything) about the kernel memory map on
    x86_64, so I hope these provide a big enough hook.  I'm hoping that you have
    some low stuff, and some high stuff with a big hole in the middle ...  in
    which case the macros might look something like:
    
    #define kc_vaddr_to_offset(v) ((v) < 0x1000000000000000 ? (v) : \
                                  ((v) - 0xF000000000000000))
    
    But if you have interesting stuff scattered across *every* part of the
    unsigned address range, then you won't be able to squeeze it all into a
    signed file offset.
    
    There are a couple of bug fixes too:
    1) get_kcore_size() didn't account for the elf_prstatus, elf_prpsinfo
       and task_struct that are placed in the PT_NOTE section that is
       part of the header.  We were saved on most configurations by the
       round-up to PAGE_SIZE ... but it's possible that some architectures
       or machines corrupted memory beyond the space allocated for the
       header.
    
    2) The size of the PT_NOTES section was incorrectly set to the size
       of the last note, rather than the sum of the sizes of all the notes.
    9243548a
init.c 17.5 KB