Commit 2ef14f46 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm changes from Peter Anvin:
 "This is a huge set of several partly interrelated (and concurrently
  developed) changes, which is why the branch history is messier than
  one would like.

  The *really* big items are two humonguous patchsets mostly developed
  by Yinghai Lu at my request, which completely revamps the way we
  create initial page tables.  In particular, rather than estimating how
  much memory we will need for page tables and then build them into that
  memory -- a calculation that has shown to be incredibly fragile -- we
  now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
  a #PF handler which creates temporary page tables on demand.

  This has several advantages:

  1. It makes it much easier to support things that need access to data
     very early (a followon patchset uses this to load microcode way
     early in the kernel startup).

  2. It allows the kernel and all the kernel data objects to be invoked
     from above the 4 GB limit.  This allows kdump to work on very large
     systems.

  3. It greatly reduces the difference between Xen and native (Xen's
     equivalent of the #PF handler are the temporary page tables created
     by the domain builder), eliminating a bunch of fragile hooks.

  The patch series also gets us a bit closer to W^X.

  Additional work in this pull is the 64-bit get_user() work which you
  were also involved with, and a bunch of cleanups/speedups to
  __phys_addr()/__pa()."

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits)
  x86, mm: Move reserving low memory later in initialization
  x86, doc: Clarify the use of asm("%edx") in uaccess.h
  x86, mm: Redesign get_user with a __builtin_choose_expr hack
  x86: Be consistent with data size in getuser.S
  x86, mm: Use a bitfield to mask nuisance get_user() warnings
  x86/kvm: Fix compile warning in kvm_register_steal_time()
  x86-32: Add support for 64bit get_user()
  x86-32, mm: Remove reference to alloc_remap()
  x86-32, mm: Remove reference to resume_map_numa_kva()
  x86-32, mm: Rip out x86_32 NUMA remapping code
  x86/numa: Use __pa_nodebug() instead
  x86: Don't panic if can not alloc buffer for swiotlb
  mm: Add alloc_bootmem_low_pages_nopanic()
  x86, 64bit, mm: hibernate use generic mapping_init
  x86, 64bit, mm: Mark data/bss/brk to nx
  x86: Merge early kernel reserve for 32bit and 64bit
  x86: Add Crash kernel low reservation
  x86, kdump: Remove crashkernel range find limit for 64bit
  memblock: Add memblock_mem_size()
  x86, boot: Not need to check setup_header version for setup_data
  ...
parents cb715a83 0da3e7f5
...@@ -594,6 +594,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ...@@ -594,6 +594,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
is selected automatically. Check is selected automatically. Check
Documentation/kdump/kdump.txt for further details. Documentation/kdump/kdump.txt for further details.
crashkernel_low=size[KMG]
[KNL, x86] parts under 4G.
crashkernel=range1:size1[,range2:size2,...][@offset] crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory [KNL] Same as above, but depends on the memory
in the running system. The syntax of range is in the running system. The syntax of range is
......
...@@ -1055,6 +1055,44 @@ must have read/write permission; CS must be __BOOT_CS and DS, ES, SS ...@@ -1055,6 +1055,44 @@ must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
must be __BOOT_DS; interrupt must be disabled; %esi must hold the base must be __BOOT_DS; interrupt must be disabled; %esi must hold the base
address of the struct boot_params; %ebp, %edi and %ebx must be zero. address of the struct boot_params; %ebp, %edi and %ebx must be zero.
**** 64-bit BOOT PROTOCOL
For machine with 64bit cpus and 64bit kernel, we could use 64bit bootloader
and we need a 64-bit boot protocol.
In 64-bit boot protocol, the first step in loading a Linux kernel
should be to setup the boot parameters (struct boot_params,
traditionally known as "zero page"). The memory for struct boot_params
could be allocated anywhere (even above 4G) and initialized to all zero.
Then, the setup header at offset 0x01f1 of kernel image on should be
loaded into struct boot_params and examined. The end of setup header
can be calculated as follows:
0x0202 + byte value at offset 0x0201
In addition to read/modify/write the setup header of the struct
boot_params as that of 16-bit boot protocol, the boot loader should
also fill the additional fields of the struct boot_params as described
in zero-page.txt.
After setting up the struct boot_params, the boot loader can load
64-bit kernel in the same way as that of 16-bit boot protocol, but
kernel could be loaded above 4G.
In 64-bit boot protocol, the kernel is started by jumping to the
64-bit kernel entry point, which is the start address of loaded
64-bit kernel plus 0x200.
At entry, the CPU must be in 64-bit mode with paging enabled.
The range with setup_header.init_size from start address of loaded
kernel and zero page and command line buffer get ident mapping;
a GDT must be loaded with the descriptors for selectors
__BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
must be __BOOT_DS; interrupt must be disabled; %rsi must hold the base
address of the struct boot_params.
**** EFI HANDOVER PROTOCOL **** EFI HANDOVER PROTOCOL
This protocol allows boot loaders to defer initialisation to the EFI This protocol allows boot loaders to defer initialisation to the EFI
......
...@@ -317,7 +317,8 @@ void __init plat_swiotlb_setup(void) ...@@ -317,7 +317,8 @@ void __init plat_swiotlb_setup(void)
octeon_swiotlb = alloc_bootmem_low_pages(swiotlbsize); octeon_swiotlb = alloc_bootmem_low_pages(swiotlbsize);
swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1); if (swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1) == -ENOMEM)
panic("Cannot allocate SWIOTLB buffer");
mips_dma_map_ops = &octeon_linear_dma_map_ops.dma_map_ops; mips_dma_map_ops = &octeon_linear_dma_map_ops.dma_map_ops;
} }
......
...@@ -2027,6 +2027,16 @@ static void __init patch_tlb_miss_handler_bitmap(void) ...@@ -2027,6 +2027,16 @@ static void __init patch_tlb_miss_handler_bitmap(void)
flushi(&valid_addr_bitmap_insn[0]); flushi(&valid_addr_bitmap_insn[0]);
} }
static void __init register_page_bootmem_info(void)
{
#ifdef CONFIG_NEED_MULTIPLE_NODES
int i;
for_each_online_node(i)
if (NODE_DATA(i)->node_spanned_pages)
register_page_bootmem_info_node(NODE_DATA(i));
#endif
}
void __init mem_init(void) void __init mem_init(void)
{ {
unsigned long codepages, datapages, initpages; unsigned long codepages, datapages, initpages;
...@@ -2044,20 +2054,8 @@ void __init mem_init(void) ...@@ -2044,20 +2054,8 @@ void __init mem_init(void)
high_memory = __va(last_valid_pfn << PAGE_SHIFT); high_memory = __va(last_valid_pfn << PAGE_SHIFT);
#ifdef CONFIG_NEED_MULTIPLE_NODES register_page_bootmem_info();
{
int i;
for_each_online_node(i) {
if (NODE_DATA(i)->node_spanned_pages != 0) {
totalram_pages +=
free_all_bootmem_node(NODE_DATA(i));
}
}
totalram_pages += free_low_memory_core_early(MAX_NUMNODES);
}
#else
totalram_pages = free_all_bootmem(); totalram_pages = free_all_bootmem();
#endif
/* We subtract one to account for the mem_map_zero page /* We subtract one to account for the mem_map_zero page
* allocated below. * allocated below.
......
...@@ -1277,10 +1277,6 @@ config NODES_SHIFT ...@@ -1277,10 +1277,6 @@ config NODES_SHIFT
Specify the maximum number of NUMA Nodes available on the target Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accommodate various tables. system. Increases memory reserved to accommodate various tables.
config HAVE_ARCH_ALLOC_REMAP
def_bool y
depends on X86_32 && NUMA
config ARCH_HAVE_MEMORY_PRESENT config ARCH_HAVE_MEMORY_PRESENT
def_bool y def_bool y
depends on X86_32 && DISCONTIGMEM depends on X86_32 && DISCONTIGMEM
......
...@@ -285,16 +285,26 @@ struct biosregs { ...@@ -285,16 +285,26 @@ struct biosregs {
void intcall(u8 int_no, const struct biosregs *ireg, struct biosregs *oreg); void intcall(u8 int_no, const struct biosregs *ireg, struct biosregs *oreg);
/* cmdline.c */ /* cmdline.c */
int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int bufsize); int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char *buffer, int bufsize);
int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option); int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option);
static inline int cmdline_find_option(const char *option, char *buffer, int bufsize) static inline int cmdline_find_option(const char *option, char *buffer, int bufsize)
{ {
return __cmdline_find_option(boot_params.hdr.cmd_line_ptr, option, buffer, bufsize); unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
if (cmd_line_ptr >= 0x100000)
return -1; /* inaccessible */
return __cmdline_find_option(cmd_line_ptr, option, buffer, bufsize);
} }
static inline int cmdline_find_option_bool(const char *option) static inline int cmdline_find_option_bool(const char *option)
{ {
return __cmdline_find_option_bool(boot_params.hdr.cmd_line_ptr, option); unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
if (cmd_line_ptr >= 0x100000)
return -1; /* inaccessible */
return __cmdline_find_option_bool(cmd_line_ptr, option);
} }
......
...@@ -27,7 +27,7 @@ static inline int myisspace(u8 c) ...@@ -27,7 +27,7 @@ static inline int myisspace(u8 c)
* Returns the length of the argument (regardless of if it was * Returns the length of the argument (regardless of if it was
* truncated to fit in the buffer), or -1 on not found. * truncated to fit in the buffer), or -1 on not found.
*/ */
int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int bufsize) int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char *buffer, int bufsize)
{ {
addr_t cptr; addr_t cptr;
char c; char c;
...@@ -41,8 +41,8 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int ...@@ -41,8 +41,8 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
st_bufcpy /* Copying this to buffer */ st_bufcpy /* Copying this to buffer */
} state = st_wordstart; } state = st_wordstart;
if (!cmdline_ptr || cmdline_ptr >= 0x100000) if (!cmdline_ptr)
return -1; /* No command line, or inaccessible */ return -1; /* No command line */
cptr = cmdline_ptr & 0xf; cptr = cmdline_ptr & 0xf;
set_fs(cmdline_ptr >> 4); set_fs(cmdline_ptr >> 4);
...@@ -99,7 +99,7 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int ...@@ -99,7 +99,7 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
* Returns the position of that option (starts counting with 1) * Returns the position of that option (starts counting with 1)
* or 0 on not found * or 0 on not found
*/ */
int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option) int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option)
{ {
addr_t cptr; addr_t cptr;
char c; char c;
...@@ -111,8 +111,8 @@ int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option) ...@@ -111,8 +111,8 @@ int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option)
st_wordskip, /* Miscompare, skip */ st_wordskip, /* Miscompare, skip */
} state = st_wordstart; } state = st_wordstart;
if (!cmdline_ptr || cmdline_ptr >= 0x100000) if (!cmdline_ptr)
return -1; /* No command line, or inaccessible */ return -1; /* No command line */
cptr = cmdline_ptr & 0xf; cptr = cmdline_ptr & 0xf;
set_fs(cmdline_ptr >> 4); set_fs(cmdline_ptr >> 4);
......
...@@ -13,13 +13,21 @@ static inline char rdfs8(addr_t addr) ...@@ -13,13 +13,21 @@ static inline char rdfs8(addr_t addr)
return *((char *)(fs + addr)); return *((char *)(fs + addr));
} }
#include "../cmdline.c" #include "../cmdline.c"
static unsigned long get_cmd_line_ptr(void)
{
unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
return cmd_line_ptr;
}
int cmdline_find_option(const char *option, char *buffer, int bufsize) int cmdline_find_option(const char *option, char *buffer, int bufsize)
{ {
return __cmdline_find_option(real_mode->hdr.cmd_line_ptr, option, buffer, bufsize); return __cmdline_find_option(get_cmd_line_ptr(), option, buffer, bufsize);
} }
int cmdline_find_option_bool(const char *option) int cmdline_find_option_bool(const char *option)
{ {
return __cmdline_find_option_bool(real_mode->hdr.cmd_line_ptr, option); return __cmdline_find_option_bool(get_cmd_line_ptr(), option);
} }
#endif #endif
...@@ -37,6 +37,12 @@ ...@@ -37,6 +37,12 @@
__HEAD __HEAD
.code32 .code32
ENTRY(startup_32) ENTRY(startup_32)
/*
* 32bit entry is 0 and it is ABI so immutable!
* If we come here directly from a bootloader,
* kernel(text+data+bss+brk) ramdisk, zero_page, command line
* all need to be under the 4G limit.
*/
cld cld
/* /*
* Test KEEP_SEGMENTS flag to see if the bootloader is asking * Test KEEP_SEGMENTS flag to see if the bootloader is asking
...@@ -154,6 +160,12 @@ ENTRY(startup_32) ...@@ -154,6 +160,12 @@ ENTRY(startup_32)
btsl $_EFER_LME, %eax btsl $_EFER_LME, %eax
wrmsr wrmsr
/* After gdt is loaded */
xorl %eax, %eax
lldt %ax
movl $0x20, %eax
ltr %ax
/* /*
* Setup for the jump to 64bit mode * Setup for the jump to 64bit mode
* *
...@@ -176,28 +188,18 @@ ENTRY(startup_32) ...@@ -176,28 +188,18 @@ ENTRY(startup_32)
lret lret
ENDPROC(startup_32) ENDPROC(startup_32)
no_longmode:
/* This isn't an x86-64 CPU so hang */
1:
hlt
jmp 1b
#include "../../kernel/verify_cpu.S"
/*
* Be careful here startup_64 needs to be at a predictable
* address so I can export it in an ELF header. Bootloaders
* should look at the ELF header to find this address, as
* it may change in the future.
*/
.code64 .code64
.org 0x200 .org 0x200
ENTRY(startup_64) ENTRY(startup_64)
/* /*
* 64bit entry is 0x200 and it is ABI so immutable!
* We come here either from startup_32 or directly from a * We come here either from startup_32 or directly from a
* 64bit bootloader. If we come here from a bootloader we depend on * 64bit bootloader.
* an identity mapped page table being provied that maps our * If we come here from a bootloader, kernel(text+data+bss+brk),
* entire text+data+bss and hopefully all of memory. * ramdisk, zero_page, command line could be above 4G.
* We depend on an identity mapped page table being provided
* that maps our entire kernel(text+data+bss+brk), zero page
* and command line.
*/ */
#ifdef CONFIG_EFI_STUB #ifdef CONFIG_EFI_STUB
/* /*
...@@ -247,9 +249,6 @@ preferred_addr: ...@@ -247,9 +249,6 @@ preferred_addr:
movl %eax, %ss movl %eax, %ss
movl %eax, %fs movl %eax, %fs
movl %eax, %gs movl %eax, %gs
lldt %ax
movl $0x20, %eax
ltr %ax
/* /*
* Compute the decompressed kernel start address. It is where * Compute the decompressed kernel start address. It is where
...@@ -349,6 +348,15 @@ relocated: ...@@ -349,6 +348,15 @@ relocated:
*/ */
jmp *%rbp jmp *%rbp
.code32
no_longmode:
/* This isn't an x86-64 CPU so hang */
1:
hlt
jmp 1b
#include "../../kernel/verify_cpu.S"
.data .data
gdt: gdt:
.word gdt_end - gdt .word gdt_end - gdt
......
...@@ -374,6 +374,14 @@ xloadflags: ...@@ -374,6 +374,14 @@ xloadflags:
#else #else
# define XLF0 0 # define XLF0 0
#endif #endif
#if defined(CONFIG_RELOCATABLE) && defined(CONFIG_X86_64)
/* kernel/boot_param/ramdisk could be loaded above 4g */
# define XLF1 XLF_CAN_BE_LOADED_ABOVE_4G
#else
# define XLF1 0
#endif
#ifdef CONFIG_EFI_STUB #ifdef CONFIG_EFI_STUB
# ifdef CONFIG_X86_64 # ifdef CONFIG_X86_64
# define XLF23 XLF_EFI_HANDOVER_64 /* 64-bit EFI handover ok */ # define XLF23 XLF_EFI_HANDOVER_64 /* 64-bit EFI handover ok */
...@@ -383,7 +391,7 @@ xloadflags: ...@@ -383,7 +391,7 @@ xloadflags:
#else #else
# define XLF23 0 # define XLF23 0
#endif #endif
.word XLF0 | XLF23 .word XLF0 | XLF1 | XLF23
cmdline_size: .long COMMAND_LINE_SIZE-1 #length of the command line, cmdline_size: .long COMMAND_LINE_SIZE-1 #length of the command line,
#added with boot protocol #added with boot protocol
......
#ifndef _ASM_X86_INIT_32_H #ifndef _ASM_X86_INIT_H
#define _ASM_X86_INIT_32_H #define _ASM_X86_INIT_H
#ifdef CONFIG_X86_32 struct x86_mapping_info {
extern void __init early_ioremap_page_table_range_init(void); void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
#endif void *context; /* context for alloc_pgt_page */
unsigned long pmd_flag; /* page flag for PMD entry */
bool kernel_mapping; /* kernel mapping or ident mapping */
};
extern void __init zone_sizes_init(void); int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
unsigned long addr, unsigned long end);
extern unsigned long __init #endif /* _ASM_X86_INIT_H */
kernel_physical_mapping_init(unsigned long start,
unsigned long end,
unsigned long page_size_mask);
extern unsigned long __initdata pgt_buf_start;
extern unsigned long __meminitdata pgt_buf_end;
extern unsigned long __meminitdata pgt_buf_top;
#endif /* _ASM_X86_INIT_32_H */
...@@ -48,11 +48,11 @@ ...@@ -48,11 +48,11 @@
# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64) # define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
#else #else
/* Maximum physical address we can use pages from */ /* Maximum physical address we can use pages from */
# define KEXEC_SOURCE_MEMORY_LIMIT (0xFFFFFFFFFFUL) # define KEXEC_SOURCE_MEMORY_LIMIT (MAXMEM-1)
/* Maximum address we can reach in physical address mode */ /* Maximum address we can reach in physical address mode */
# define KEXEC_DESTINATION_MEMORY_LIMIT (0xFFFFFFFFFFUL) # define KEXEC_DESTINATION_MEMORY_LIMIT (MAXMEM-1)
/* Maximum address we can use for the control pages */ /* Maximum address we can use for the control pages */
# define KEXEC_CONTROL_MEMORY_LIMIT (0xFFFFFFFFFFUL) # define KEXEC_CONTROL_MEMORY_LIMIT (MAXMEM-1)
/* Allocate one page for the pdp and the second for the code */ /* Allocate one page for the pdp and the second for the code */
# define KEXEC_CONTROL_PAGE_SIZE (4096UL + 4096UL) # define KEXEC_CONTROL_PAGE_SIZE (4096UL + 4096UL)
......
...@@ -14,12 +14,6 @@ extern struct pglist_data *node_data[]; ...@@ -14,12 +14,6 @@ extern struct pglist_data *node_data[];
#include <asm/numaq.h> #include <asm/numaq.h>
extern void resume_map_numa_kva(pgd_t *pgd);
#else /* !CONFIG_NUMA */
static inline void resume_map_numa_kva(pgd_t *pgd) {}
#endif /* CONFIG_NUMA */ #endif /* CONFIG_NUMA */
#ifdef CONFIG_DISCONTIGMEM #ifdef CONFIG_DISCONTIGMEM
......
...@@ -54,8 +54,6 @@ static inline int numa_cpu_node(int cpu) ...@@ -54,8 +54,6 @@ static inline int numa_cpu_node(int cpu)
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
# include <asm/numa_32.h> # include <asm/numa_32.h>
#else
# include <asm/numa_64.h>
#endif #endif
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
......
#ifndef _ASM_X86_NUMA_64_H
#define _ASM_X86_NUMA_64_H
extern unsigned long numa_free_all_bootmem(void);
#endif /* _ASM_X86_NUMA_64_H */
...@@ -17,6 +17,10 @@ ...@@ -17,6 +17,10 @@
struct page; struct page;
#include <linux/range.h>
extern struct range pfn_mapped[];
extern int nr_pfn_mapped;
static inline void clear_user_page(void *page, unsigned long vaddr, static inline void clear_user_page(void *page, unsigned long vaddr,
struct page *pg) struct page *pg)
{ {
...@@ -44,7 +48,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, ...@@ -44,7 +48,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
* case properly. Once all supported versions of gcc understand it, we can * case properly. Once all supported versions of gcc understand it, we can
* remove this Voodoo magic stuff. (i.e. once gcc3.x is deprecated) * remove this Voodoo magic stuff. (i.e. once gcc3.x is deprecated)
*/ */
#define __pa_symbol(x) __pa(__phys_reloc_hide((unsigned long)(x))) #define __pa_symbol(x) \
__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) #define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
......
...@@ -15,6 +15,7 @@ extern unsigned long __phys_addr(unsigned long); ...@@ -15,6 +15,7 @@ extern unsigned long __phys_addr(unsigned long);
#else #else
#define __phys_addr(x) __phys_addr_nodebug(x) #define __phys_addr(x) __phys_addr_nodebug(x)
#endif #endif
#define __phys_addr_symbol(x) __phys_addr(x)
#define __phys_reloc_hide(x) RELOC_HIDE((x), 0) #define __phys_reloc_hide(x) RELOC_HIDE((x), 0)
#ifdef CONFIG_FLATMEM #ifdef CONFIG_FLATMEM
......
...@@ -3,4 +3,40 @@ ...@@ -3,4 +3,40 @@
#include <asm/page_64_types.h> #include <asm/page_64_types.h>
#ifndef __ASSEMBLY__
/* duplicated to the one in bootmem.h */
extern unsigned long max_pfn;
extern unsigned long phys_base;
static inline unsigned long __phys_addr_nodebug(unsigned long x)
{
unsigned long y = x - __START_KERNEL_map;
/* use the carry flag to determine if x was < __START_KERNEL_map */
x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));
return x;
}
#ifdef CONFIG_DEBUG_VIRTUAL
extern unsigned long __phys_addr(unsigned long);
extern unsigned long __phys_addr_symbol(unsigned long);
#else
#define __phys_addr(x) __phys_addr_nodebug(x)
#define __phys_addr_symbol(x) \
((unsigned long)(x) - __START_KERNEL_map + phys_base)
#endif
#define __phys_reloc_hide(x) (x)
#ifdef CONFIG_FLATMEM
#define pfn_valid(pfn) ((pfn) < max_pfn)
#endif
void clear_page(void *page);
void copy_page(void *to, void *from);
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_PAGE_64_H */ #endif /* _ASM_X86_PAGE_64_H */
...@@ -50,26 +50,4 @@ ...@@ -50,26 +50,4 @@
#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024) #define KERNEL_IMAGE_SIZE (512 * 1024 * 1024)
#define KERNEL_IMAGE_START _AC(0xffffffff80000000, UL) #define KERNEL_IMAGE_START _AC(0xffffffff80000000, UL)
#ifndef __ASSEMBLY__
void clear_page(void *page);
void copy_page(void *to, void *from);
/* duplicated to the one in bootmem.h */
extern unsigned long max_pfn;
extern unsigned long phys_base;
extern unsigned long __phys_addr(unsigned long);
#define __phys_reloc_hide(x) (x)
#define vmemmap ((struct page *)VMEMMAP_START)
extern void init_extra_mapping_uc(unsigned long phys, unsigned long size);
extern void init_extra_mapping_wb(unsigned long phys, unsigned long size);
#endif /* !__ASSEMBLY__ */
#ifdef CONFIG_FLATMEM
#define pfn_valid(pfn) ((pfn) < max_pfn)
#endif
#endif /* _ASM_X86_PAGE_64_DEFS_H */ #endif /* _ASM_X86_PAGE_64_DEFS_H */
...@@ -51,6 +51,8 @@ static inline phys_addr_t get_max_mapped(void) ...@@ -51,6 +51,8 @@ static inline phys_addr_t get_max_mapped(void)
return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT; return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
} }
bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
extern unsigned long init_memory_mapping(unsigned long start, extern unsigned long init_memory_mapping(unsigned long start,
unsigned long end); unsigned long end);
......
...@@ -395,6 +395,7 @@ pte_t *populate_extra_pte(unsigned long vaddr); ...@@ -395,6 +395,7 @@ pte_t *populate_extra_pte(unsigned long vaddr);
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
#include <linux/mm_types.h> #include <linux/mm_types.h>
#include <linux/log2.h>
static inline int pte_none(pte_t pte) static inline int pte_none(pte_t pte)
{ {
...@@ -620,6 +621,8 @@ static inline int pgd_none(pgd_t pgd) ...@@ -620,6 +621,8 @@ static inline int pgd_none(pgd_t pgd)
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
extern int direct_gbpages; extern int direct_gbpages;
void init_mem_mapping(void);
void early_alloc_pgt_buf(void);
/* local pte updates need not use xchg for locking */ /* local pte updates need not use xchg for locking */
static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep) static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
...@@ -786,6 +789,20 @@ static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count) ...@@ -786,6 +789,20 @@ static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
memcpy(dst, src, count * sizeof(pgd_t)); memcpy(dst, src, count * sizeof(pgd_t));
} }
#define PTE_SHIFT ilog2(PTRS_PER_PTE)
static inline int page_level_shift(enum pg_level level)
{
return (PAGE_SHIFT - PTE_SHIFT) + level * PTE_SHIFT;
}
static inline unsigned long page_level_size(enum pg_level level)
{
return 1UL << page_level_shift(level);
}
static inline unsigned long page_level_mask(enum pg_level level)
{
return ~(page_level_size(level) - 1);
}
/* /*
* The x86 doesn't have any external MMU info: the kernel page * The x86 doesn't have any external MMU info: the kernel page
* tables contain all the necessary information. * tables contain all the necessary information.
......
...@@ -180,6 +180,11 @@ extern void cleanup_highmap(void); ...@@ -180,6 +180,11 @@ extern void cleanup_highmap(void);
#define __HAVE_ARCH_PTE_SAME #define __HAVE_ARCH_PTE_SAME
#define vmemmap ((struct page *)VMEMMAP_START)
extern void init_extra_mapping_uc(unsigned long phys, unsigned long size);
extern void init_extra_mapping_wb(unsigned long phys, unsigned long size);
#endif /* !__ASSEMBLY__ */ #endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_PGTABLE_64_H */ #endif /* _ASM_X86_PGTABLE_64_H */
#ifndef _ASM_X86_PGTABLE_64_DEFS_H #ifndef _ASM_X86_PGTABLE_64_DEFS_H
#define _ASM_X86_PGTABLE_64_DEFS_H #define _ASM_X86_PGTABLE_64_DEFS_H
#include <asm/sparsemem.h>
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
#include <linux/types.h> #include <linux/types.h>
...@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t; ...@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
#define MODULES_END _AC(0xffffffffff000000, UL) #define MODULES_END _AC(0xffffffffff000000, UL)
#define MODULES_LEN (MODULES_END - MODULES_VADDR) #define MODULES_LEN (MODULES_END - MODULES_VADDR)
#define EARLY_DYNAMIC_PAGE_TABLES 64
#endif /* _ASM_X86_PGTABLE_64_DEFS_H */ #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
...@@ -321,7 +321,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, ...@@ -321,7 +321,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
/* Install a pte for a particular vaddr in kernel space. */ /* Install a pte for a particular vaddr in kernel space. */
void set_pte_vaddr(unsigned long vaddr, pte_t pte); void set_pte_vaddr(unsigned long vaddr, pte_t pte);
extern void native_pagetable_reserve(u64 start, u64 end);
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
extern void native_pagetable_init(void); extern void native_pagetable_init(void);
#else #else
...@@ -331,7 +330,7 @@ extern void native_pagetable_init(void); ...@@ -331,7 +330,7 @@ extern void native_pagetable_init(void);
struct seq_file; struct seq_file;
extern void arch_report_meminfo(struct seq_file *m); extern void arch_report_meminfo(struct seq_file *m);
enum { enum pg_level {
PG_LEVEL_NONE, PG_LEVEL_NONE,
PG_LEVEL_4K, PG_LEVEL_4K,
PG_LEVEL_2M, PG_LEVEL_2M,
...@@ -352,6 +351,7 @@ static inline void update_page_count(int level, unsigned long pages) { } ...@@ -352,6 +351,7 @@ static inline void update_page_count(int level, unsigned long pages) { }
* as a pte too. * as a pte too.
*/ */
extern pte_t *lookup_address(unsigned long address, unsigned int *level); extern pte_t *lookup_address(unsigned long address, unsigned int *level);
extern phys_addr_t slow_virt_to_phys(void *__address);
#endif /* !__ASSEMBLY__ */ #endif /* !__ASSEMBLY__ */
......
...@@ -721,6 +721,7 @@ extern void enable_sep_cpu(void); ...@@ -721,6 +721,7 @@ extern void enable_sep_cpu(void);
extern int sysenter_setup(void); extern int sysenter_setup(void);
extern void early_trap_init(void); extern void early_trap_init(void);
void early_trap_pf_init(void);
/* Defined in head.S */ /* Defined in head.S */
extern struct desc_ptr early_gdt_descr; extern struct desc_ptr early_gdt_descr;
......
...@@ -58,6 +58,7 @@ extern unsigned char boot_gdt[]; ...@@ -58,6 +58,7 @@ extern unsigned char boot_gdt[];
extern unsigned char secondary_startup_64[]; extern unsigned char secondary_startup_64[];
#endif #endif
extern void __init setup_real_mode(void); void reserve_real_mode(void);
void setup_real_mode(void);
#endif /* _ARCH_X86_REALMODE_H */ #endif /* _ARCH_X86_REALMODE_H */
...@@ -125,13 +125,12 @@ extern int __get_user_4(void); ...@@ -125,13 +125,12 @@ extern int __get_user_4(void);
extern int __get_user_8(void); extern int __get_user_8(void);
extern int __get_user_bad(void); extern int __get_user_bad(void);
#define __get_user_x(size, ret, x, ptr) \ /*
asm volatile("call __get_user_" #size \ * This is a type: either unsigned long, if the argument fits into
: "=a" (ret), "=d" (x) \ * that type, or otherwise unsigned long long.
: "0" (ptr)) \ */
#define __inttype(x) \
/* Careful: we have to cast the result to the type of the pointer __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
* for sign reasons */
/** /**
* get_user: - Get a simple variable from user space. * get_user: - Get a simple variable from user space.
...@@ -150,38 +149,26 @@ extern int __get_user_bad(void); ...@@ -150,38 +149,26 @@ extern int __get_user_bad(void);
* Returns zero on success, or -EFAULT on error. * Returns zero on success, or -EFAULT on error.
* On error, the variable @x is set to zero. * On error, the variable @x is set to zero.
*/ */
#ifdef CONFIG_X86_32 /*
#define __get_user_8(__ret_gu, __val_gu, ptr) \ * Careful: we have to cast the result to the type of the pointer
__get_user_x(X, __ret_gu, __val_gu, ptr) * for sign reasons.
#else *
#define __get_user_8(__ret_gu, __val_gu, ptr) \ * The use of %edx as the register specifier is a bit of a
__get_user_x(8, __ret_gu, __val_gu, ptr) * simplification, as gcc only cares about it as the starting point
#endif * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
* (%ecx being the next register in gcc's x86 register sequence), and
* %rdx on 64 bits.
*/
#define get_user(x, ptr) \ #define get_user(x, ptr) \
({ \ ({ \
int __ret_gu; \ int __ret_gu; \
unsigned long __val_gu; \ register __inttype(*(ptr)) __val_gu asm("%edx"); \
__chk_user_ptr(ptr); \ __chk_user_ptr(ptr); \
might_fault(); \ might_fault(); \
switch (sizeof(*(ptr))) { \ asm volatile("call __get_user_%P3" \
case 1: \ : "=a" (__ret_gu), "=r" (__val_gu) \
__get_user_x(1, __ret_gu, __val_gu, ptr); \ : "0" (ptr), "i" (sizeof(*(ptr)))); \
break; \ (x) = (__typeof__(*(ptr))) __val_gu; \
case 2: \
__get_user_x(2, __ret_gu, __val_gu, ptr); \
break; \
case 4: \
__get_user_x(4, __ret_gu, __val_gu, ptr); \
break; \
case 8: \
__get_user_8(__ret_gu, __val_gu, ptr); \
break; \
default: \
__get_user_x(X, __ret_gu, __val_gu, ptr); \
break; \
} \
(x) = (__typeof__(*(ptr)))__val_gu; \
__ret_gu; \ __ret_gu; \
}) })
......
...@@ -68,17 +68,6 @@ struct x86_init_oem { ...@@ -68,17 +68,6 @@ struct x86_init_oem {
void (*banner)(void); void (*banner)(void);
}; };
/**
* struct x86_init_mapping - platform specific initial kernel pagetable setup
* @pagetable_reserve: reserve a range of addresses for kernel pagetable usage
*
* For more details on the purpose of this hook, look in
* init_memory_mapping and the commit that added it.
*/
struct x86_init_mapping {
void (*pagetable_reserve)(u64 start, u64 end);
};
/** /**
* struct x86_init_paging - platform specific paging functions * struct x86_init_paging - platform specific paging functions
* @pagetable_init: platform specific paging initialization call to setup * @pagetable_init: platform specific paging initialization call to setup
...@@ -136,7 +125,6 @@ struct x86_init_ops { ...@@ -136,7 +125,6 @@ struct x86_init_ops {
struct x86_init_mpparse mpparse; struct x86_init_mpparse mpparse;
struct x86_init_irqs irqs; struct x86_init_irqs irqs;
struct x86_init_oem oem; struct x86_init_oem oem;
struct x86_init_mapping mapping;
struct x86_init_paging paging; struct x86_init_paging paging;
struct x86_init_timers timers; struct x86_init_timers timers;
struct x86_init_iommu iommu; struct x86_init_iommu iommu;
......
...@@ -51,7 +51,6 @@ EXPORT_SYMBOL(acpi_disabled); ...@@ -51,7 +51,6 @@ EXPORT_SYMBOL(acpi_disabled);
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
# include <asm/proto.h> # include <asm/proto.h>
# include <asm/numa_64.h>
#endif /* X86 */ #endif /* X86 */
#define BAD_MADT_ENTRY(entry, end) ( \ #define BAD_MADT_ENTRY(entry, end) ( \
......
...@@ -69,7 +69,7 @@ int acpi_suspend_lowlevel(void) ...@@ -69,7 +69,7 @@ int acpi_suspend_lowlevel(void)
#ifndef CONFIG_64BIT #ifndef CONFIG_64BIT
header->pmode_entry = (u32)&wakeup_pmode_return; header->pmode_entry = (u32)&wakeup_pmode_return;
header->pmode_cr3 = (u32)__pa(&initial_page_table); header->pmode_cr3 = (u32)__pa_symbol(initial_page_table);
saved_magic = 0x12345678; saved_magic = 0x12345678;
#else /* CONFIG_64BIT */ #else /* CONFIG_64BIT */
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
......
...@@ -768,10 +768,9 @@ int __init gart_iommu_init(void) ...@@ -768,10 +768,9 @@ int __init gart_iommu_init(void)
aper_base = info.aper_base; aper_base = info.aper_base;
end_pfn = (aper_base>>PAGE_SHIFT) + (aper_size>>PAGE_SHIFT); end_pfn = (aper_base>>PAGE_SHIFT) + (aper_size>>PAGE_SHIFT);
if (end_pfn > max_low_pfn_mapped) { start_pfn = PFN_DOWN(aper_base);
start_pfn = (aper_base>>PAGE_SHIFT); if (!pfn_range_is_mapped(start_pfn, end_pfn))
init_memory_mapping(start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT); init_memory_mapping(start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT);
}
pr_info("PCI-DMA: using GART IOMMU.\n"); pr_info("PCI-DMA: using GART IOMMU.\n");
iommu_size = check_iommu_size(info.aper_base, aper_size); iommu_size = check_iommu_size(info.aper_base, aper_size);
......
...@@ -28,6 +28,7 @@ ...@@ -28,6 +28,7 @@
#include <asm/apic.h> #include <asm/apic.h>
#include <asm/ipi.h> #include <asm/ipi.h>
#include <asm/apic_flat_64.h> #include <asm/apic_flat_64.h>
#include <asm/pgtable.h>
static int numachip_system __read_mostly; static int numachip_system __read_mostly;
......
...@@ -12,7 +12,6 @@ ...@@ -12,7 +12,6 @@
#include <asm/pci-direct.h> #include <asm/pci-direct.h>
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
# include <asm/numa_64.h>
# include <asm/mmconfig.h> # include <asm/mmconfig.h>
# include <asm/cacheflush.h> # include <asm/cacheflush.h>
#endif #endif
...@@ -680,12 +679,10 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) ...@@ -680,12 +679,10 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
* benefit in doing so. * benefit in doing so.
*/ */
if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) { if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) {
unsigned long pfn = tseg >> PAGE_SHIFT;
printk(KERN_DEBUG "tseg: %010llx\n", tseg); printk(KERN_DEBUG "tseg: %010llx\n", tseg);
if ((tseg>>PMD_SHIFT) < if (pfn_range_is_mapped(pfn, pfn + 1))
(max_low_pfn_mapped>>(PMD_SHIFT-PAGE_SHIFT)) ||
((tseg>>PMD_SHIFT) <
(max_pfn_mapped>>(PMD_SHIFT-PAGE_SHIFT)) &&
(tseg>>PMD_SHIFT) >= (1ULL<<(32 - PMD_SHIFT))))
set_memory_4k((unsigned long)__va(tseg), 1); set_memory_4k((unsigned long)__va(tseg), 1);
} }
} }
......
...@@ -17,7 +17,6 @@ ...@@ -17,7 +17,6 @@
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
#include <linux/topology.h> #include <linux/topology.h>
#include <asm/numa_64.h>
#endif #endif
#include "cpu.h" #include "cpu.h"
...@@ -168,7 +167,7 @@ int __cpuinit ppro_with_ram_bug(void) ...@@ -168,7 +167,7 @@ int __cpuinit ppro_with_ram_bug(void)
#ifdef CONFIG_X86_F00F_BUG #ifdef CONFIG_X86_F00F_BUG
static void __cpuinit trap_init_f00f_bug(void) static void __cpuinit trap_init_f00f_bug(void)
{ {
__set_fixmap(FIX_F00F_IDT, __pa(&idt_table), PAGE_KERNEL_RO); __set_fixmap(FIX_F00F_IDT, __pa_symbol(idt_table), PAGE_KERNEL_RO);
/* /*
* Update the IDT descriptor and reload the IDT so that * Update the IDT descriptor and reload the IDT so that
......
...@@ -835,7 +835,7 @@ static int __init parse_memopt(char *p) ...@@ -835,7 +835,7 @@ static int __init parse_memopt(char *p)
} }
early_param("mem", parse_memopt); early_param("mem", parse_memopt);
static int __init parse_memmap_opt(char *p) static int __init parse_memmap_one(char *p)
{ {
char *oldp; char *oldp;
u64 start_at, mem_size; u64 start_at, mem_size;
...@@ -877,6 +877,20 @@ static int __init parse_memmap_opt(char *p) ...@@ -877,6 +877,20 @@ static int __init parse_memmap_opt(char *p)
return *p == '\0' ? 0 : -EINVAL; return *p == '\0' ? 0 : -EINVAL;
} }
static int __init parse_memmap_opt(char *str)
{
while (str) {
char *k = strchr(str, ',');
if (k)
*k++ = 0;
parse_memmap_one(str);
str = k;
}
return 0;
}
early_param("memmap", parse_memmap_opt); early_param("memmap", parse_memmap_opt);
void __init finish_e820_parsing(void) void __init finish_e820_parsing(void)
......
...@@ -89,7 +89,7 @@ do_ftrace_mod_code(unsigned long ip, const void *new_code) ...@@ -89,7 +89,7 @@ do_ftrace_mod_code(unsigned long ip, const void *new_code)
* kernel identity mapping to modify code. * kernel identity mapping to modify code.
*/ */
if (within(ip, (unsigned long)_text, (unsigned long)_etext)) if (within(ip, (unsigned long)_text, (unsigned long)_etext))
ip = (unsigned long)__va(__pa(ip)); ip = (unsigned long)__va(__pa_symbol(ip));
return probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE); return probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE);
} }
...@@ -279,7 +279,7 @@ static int ftrace_write(unsigned long ip, const char *val, int size) ...@@ -279,7 +279,7 @@ static int ftrace_write(unsigned long ip, const char *val, int size)
* kernel identity mapping to modify code. * kernel identity mapping to modify code.
*/ */
if (within(ip, (unsigned long)_text, (unsigned long)_etext)) if (within(ip, (unsigned long)_text, (unsigned long)_etext))
ip = (unsigned long)__va(__pa(ip)); ip = (unsigned long)__va(__pa_symbol(ip));
return probe_kernel_write((void *)ip, val, size); return probe_kernel_write((void *)ip, val, size);
} }
......
...@@ -33,20 +33,6 @@ void __init i386_start_kernel(void) ...@@ -33,20 +33,6 @@ void __init i386_start_kernel(void)
{ {
sanitize_boot_params(&boot_params); sanitize_boot_params(&boot_params);
memblock_reserve(__pa_symbol(&_text),
__pa_symbol(&__bss_stop) - __pa_symbol(&_text));
#ifdef CONFIG_BLK_DEV_INITRD
/* Reserve INITRD */
if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
/* Assume only end is not page aligned */
u64 ramdisk_image = boot_params.hdr.ramdisk_image;
u64 ramdisk_size = boot_params.hdr.ramdisk_size;
u64 ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
}
#endif
/* Call the subarch specific early setup function */ /* Call the subarch specific early setup function */
switch (boot_params.hdr.hardware_subarch) { switch (boot_params.hdr.hardware_subarch) {
case X86_SUBARCH_MRST: case X86_SUBARCH_MRST:
...@@ -60,11 +46,5 @@ void __init i386_start_kernel(void) ...@@ -60,11 +46,5 @@ void __init i386_start_kernel(void)
break; break;
} }
/*
* At this point everything still needed from the boot loader
* or BIOS or kernel text should be early reserved or marked not
* RAM in e820. All other memory is free game.
*/
start_kernel(); start_kernel();
} }
...@@ -27,11 +27,81 @@ ...@@ -27,11 +27,81 @@
#include <asm/bios_ebda.h> #include <asm/bios_ebda.h>
#include <asm/bootparam_utils.h> #include <asm/bootparam_utils.h>
static void __init zap_identity_mappings(void) /*
* Manage page tables very early on.
*/
extern pgd_t early_level4_pgt[PTRS_PER_PGD];
extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
static unsigned int __initdata next_early_pgt = 2;
/* Wipe all early page tables except for the kernel symbol map */
static void __init reset_early_page_tables(void)
{
unsigned long i;
for (i = 0; i < PTRS_PER_PGD-1; i++)
early_level4_pgt[i].pgd = 0;
next_early_pgt = 0;
write_cr3(__pa(early_level4_pgt));
}
/* Create a new PMD entry */
int __init early_make_pgtable(unsigned long address)
{ {
pgd_t *pgd = pgd_offset_k(0UL); unsigned long physaddr = address - __PAGE_OFFSET;
pgd_clear(pgd); unsigned long i;
__flush_tlb_all(); pgdval_t pgd, *pgd_p;
pudval_t pud, *pud_p;
pmdval_t pmd, *pmd_p;
/* Invalid address or early pgt is done ? */
if (physaddr >= MAXMEM || read_cr3() != __pa(early_level4_pgt))
return -1;
again:
pgd_p = &early_level4_pgt[pgd_index(address)].pgd;
pgd = *pgd_p;
/*
* The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
* critical -- __PAGE_OFFSET would point us back into the dynamic
* range and we might end up looping forever...
*/
if (pgd)
pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
else {
if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
reset_early_page_tables();
goto again;
}
pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
for (i = 0; i < PTRS_PER_PUD; i++)
pud_p[i] = 0;
*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
}
pud_p += pud_index(address);
pud = *pud_p;
if (pud)
pmd_p = (pmdval_t *)((pud & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
else {
if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
reset_early_page_tables();
goto again;
}
pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
for (i = 0; i < PTRS_PER_PMD; i++)
pmd_p[i] = 0;
*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
}
pmd = (physaddr & PMD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
pmd_p[pmd_index(address)] = pmd;
return 0;
} }
/* Don't add a printk in there. printk relies on the PDA which is not initialized /* Don't add a printk in there. printk relies on the PDA which is not initialized
...@@ -42,14 +112,25 @@ static void __init clear_bss(void) ...@@ -42,14 +112,25 @@ static void __init clear_bss(void)
(unsigned long) __bss_stop - (unsigned long) __bss_start); (unsigned long) __bss_stop - (unsigned long) __bss_start);
} }
static unsigned long get_cmd_line_ptr(void)
{
unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
cmd_line_ptr |= (u64)boot_params.ext_cmd_line_ptr << 32;
return cmd_line_ptr;
}
static void __init copy_bootdata(char *real_mode_data) static void __init copy_bootdata(char *real_mode_data)
{ {
char * command_line; char * command_line;
unsigned long cmd_line_ptr;
memcpy(&boot_params, real_mode_data, sizeof boot_params); memcpy(&boot_params, real_mode_data, sizeof boot_params);
sanitize_boot_params(&boot_params); sanitize_boot_params(&boot_params);
if (boot_params.hdr.cmd_line_ptr) { cmd_line_ptr = get_cmd_line_ptr();
command_line = __va(boot_params.hdr.cmd_line_ptr); if (cmd_line_ptr) {
command_line = __va(cmd_line_ptr);
memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE); memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
} }
} }
...@@ -72,14 +153,12 @@ void __init x86_64_start_kernel(char * real_mode_data) ...@@ -72,14 +153,12 @@ void __init x86_64_start_kernel(char * real_mode_data)
(__START_KERNEL & PGDIR_MASK))); (__START_KERNEL & PGDIR_MASK)));
BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END); BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
/* Kill off the identity-map trampoline */
reset_early_page_tables();
/* clear bss before set_intr_gate with early_idt_handler */ /* clear bss before set_intr_gate with early_idt_handler */
clear_bss(); clear_bss();
/* Make NULL pointers segfault */
zap_identity_mappings();
max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) { for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
#ifdef CONFIG_EARLY_PRINTK #ifdef CONFIG_EARLY_PRINTK
set_intr_gate(i, &early_idt_handlers[i]); set_intr_gate(i, &early_idt_handlers[i]);
...@@ -89,37 +168,25 @@ void __init x86_64_start_kernel(char * real_mode_data) ...@@ -89,37 +168,25 @@ void __init x86_64_start_kernel(char * real_mode_data)
} }
load_idt((const struct desc_ptr *)&idt_descr); load_idt((const struct desc_ptr *)&idt_descr);
copy_bootdata(__va(real_mode_data));
if (console_loglevel == 10) if (console_loglevel == 10)
early_printk("Kernel alive\n"); early_printk("Kernel alive\n");
clear_page(init_level4_pgt);
/* set init_level4_pgt kernel high mapping*/
init_level4_pgt[511] = early_level4_pgt[511];
x86_64_start_reservations(real_mode_data); x86_64_start_reservations(real_mode_data);
} }
void __init x86_64_start_reservations(char *real_mode_data) void __init x86_64_start_reservations(char *real_mode_data)
{ {
copy_bootdata(__va(real_mode_data)); /* version is always not zero if it is copied */
if (!boot_params.hdr.version)
memblock_reserve(__pa_symbol(&_text), copy_bootdata(__va(real_mode_data));
__pa_symbol(&__bss_stop) - __pa_symbol(&_text));
#ifdef CONFIG_BLK_DEV_INITRD
/* Reserve INITRD */
if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
/* Assume only end is not page aligned */
unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
unsigned long ramdisk_size = boot_params.hdr.ramdisk_size;
unsigned long ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
}
#endif
reserve_ebda_region(); reserve_ebda_region();
/*
* At this point everything still needed from the boot loader
* or BIOS or kernel text should be early reserved or marked not
* RAM in e820. All other memory is free game.
*/
start_kernel(); start_kernel();
} }
This diff is collapsed.
...@@ -26,6 +26,7 @@ EXPORT_SYMBOL(csum_partial_copy_generic); ...@@ -26,6 +26,7 @@ EXPORT_SYMBOL(csum_partial_copy_generic);
EXPORT_SYMBOL(__get_user_1); EXPORT_SYMBOL(__get_user_1);
EXPORT_SYMBOL(__get_user_2); EXPORT_SYMBOL(__get_user_2);
EXPORT_SYMBOL(__get_user_4); EXPORT_SYMBOL(__get_user_4);
EXPORT_SYMBOL(__get_user_8);
EXPORT_SYMBOL(__put_user_1); EXPORT_SYMBOL(__put_user_1);
EXPORT_SYMBOL(__put_user_2); EXPORT_SYMBOL(__put_user_2);
......
...@@ -297,9 +297,9 @@ static void kvm_register_steal_time(void) ...@@ -297,9 +297,9 @@ static void kvm_register_steal_time(void)
memset(st, 0, sizeof(*st)); memset(st, 0, sizeof(*st));
wrmsrl(MSR_KVM_STEAL_TIME, (__pa(st) | KVM_MSR_ENABLED)); wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
printk(KERN_INFO "kvm-stealtime: cpu %d, msr %lx\n", pr_info("kvm-stealtime: cpu %d, msr %llx\n",
cpu, __pa(st)); cpu, (unsigned long long) slow_virt_to_phys(st));
} }
static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
...@@ -324,7 +324,7 @@ void __cpuinit kvm_guest_cpu_init(void) ...@@ -324,7 +324,7 @@ void __cpuinit kvm_guest_cpu_init(void)
return; return;
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) { if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
u64 pa = __pa(&__get_cpu_var(apf_reason)); u64 pa = slow_virt_to_phys(&__get_cpu_var(apf_reason));
#ifdef CONFIG_PREEMPT #ifdef CONFIG_PREEMPT
pa |= KVM_ASYNC_PF_SEND_ALWAYS; pa |= KVM_ASYNC_PF_SEND_ALWAYS;
...@@ -340,7 +340,8 @@ void __cpuinit kvm_guest_cpu_init(void) ...@@ -340,7 +340,8 @@ void __cpuinit kvm_guest_cpu_init(void)
/* Size alignment is implied but just to make it explicit. */ /* Size alignment is implied but just to make it explicit. */
BUILD_BUG_ON(__alignof__(kvm_apic_eoi) < 4); BUILD_BUG_ON(__alignof__(kvm_apic_eoi) < 4);
__get_cpu_var(kvm_apic_eoi) = 0; __get_cpu_var(kvm_apic_eoi) = 0;
pa = __pa(&__get_cpu_var(kvm_apic_eoi)) | KVM_MSR_ENABLED; pa = slow_virt_to_phys(&__get_cpu_var(kvm_apic_eoi))
| KVM_MSR_ENABLED;
wrmsrl(MSR_KVM_PV_EOI_EN, pa); wrmsrl(MSR_KVM_PV_EOI_EN, pa);
} }
......
...@@ -162,8 +162,8 @@ int kvm_register_clock(char *txt) ...@@ -162,8 +162,8 @@ int kvm_register_clock(char *txt)
int low, high, ret; int low, high, ret;
struct pvclock_vcpu_time_info *src = &hv_clock[cpu].pvti; struct pvclock_vcpu_time_info *src = &hv_clock[cpu].pvti;
low = (int)__pa(src) | 1; low = (int)slow_virt_to_phys(src) | 1;
high = ((u64)__pa(src) >> 32); high = ((u64)slow_virt_to_phys(src) >> 32);
ret = native_write_msr_safe(msr_kvm_system_time, low, high); ret = native_write_msr_safe(msr_kvm_system_time, low, high);
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
cpu, high, low, txt); cpu, high, low, txt);
......
...@@ -16,125 +16,12 @@ ...@@ -16,125 +16,12 @@
#include <linux/io.h> #include <linux/io.h>
#include <linux/suspend.h> #include <linux/suspend.h>
#include <asm/init.h>
#include <asm/pgtable.h> #include <asm/pgtable.h>
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
#include <asm/mmu_context.h> #include <asm/mmu_context.h>
#include <asm/debugreg.h> #include <asm/debugreg.h>
static int init_one_level2_page(struct kimage *image, pgd_t *pgd,
unsigned long addr)
{
pud_t *pud;
pmd_t *pmd;
struct page *page;
int result = -ENOMEM;
addr &= PMD_MASK;
pgd += pgd_index(addr);
if (!pgd_present(*pgd)) {
page = kimage_alloc_control_pages(image, 0);
if (!page)
goto out;
pud = (pud_t *)page_address(page);
clear_page(pud);
set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
}
pud = pud_offset(pgd, addr);
if (!pud_present(*pud)) {
page = kimage_alloc_control_pages(image, 0);
if (!page)
goto out;
pmd = (pmd_t *)page_address(page);
clear_page(pmd);
set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
}
pmd = pmd_offset(pud, addr);
if (!pmd_present(*pmd))
set_pmd(pmd, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
result = 0;
out:
return result;
}
static void init_level2_page(pmd_t *level2p, unsigned long addr)
{
unsigned long end_addr;
addr &= PAGE_MASK;
end_addr = addr + PUD_SIZE;
while (addr < end_addr) {
set_pmd(level2p++, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
addr += PMD_SIZE;
}
}
static int init_level3_page(struct kimage *image, pud_t *level3p,
unsigned long addr, unsigned long last_addr)
{
unsigned long end_addr;
int result;
result = 0;
addr &= PAGE_MASK;
end_addr = addr + PGDIR_SIZE;
while ((addr < last_addr) && (addr < end_addr)) {
struct page *page;
pmd_t *level2p;
page = kimage_alloc_control_pages(image, 0);
if (!page) {
result = -ENOMEM;
goto out;
}
level2p = (pmd_t *)page_address(page);
init_level2_page(level2p, addr);
set_pud(level3p++, __pud(__pa(level2p) | _KERNPG_TABLE));
addr += PUD_SIZE;
}
/* clear the unused entries */
while (addr < end_addr) {
pud_clear(level3p++);
addr += PUD_SIZE;
}
out:
return result;
}
static int init_level4_page(struct kimage *image, pgd_t *level4p,
unsigned long addr, unsigned long last_addr)
{
unsigned long end_addr;
int result;
result = 0;
addr &= PAGE_MASK;
end_addr = addr + (PTRS_PER_PGD * PGDIR_SIZE);
while ((addr < last_addr) && (addr < end_addr)) {
struct page *page;
pud_t *level3p;
page = kimage_alloc_control_pages(image, 0);
if (!page) {
result = -ENOMEM;
goto out;
}
level3p = (pud_t *)page_address(page);
result = init_level3_page(image, level3p, addr, last_addr);
if (result)
goto out;
set_pgd(level4p++, __pgd(__pa(level3p) | _KERNPG_TABLE));
addr += PGDIR_SIZE;
}
/* clear the unused entries */
while (addr < end_addr) {
pgd_clear(level4p++);
addr += PGDIR_SIZE;
}
out:
return result;
}
static void free_transition_pgtable(struct kimage *image) static void free_transition_pgtable(struct kimage *image)
{ {
free_page((unsigned long)image->arch.pud); free_page((unsigned long)image->arch.pud);
...@@ -184,22 +71,62 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd) ...@@ -184,22 +71,62 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
return result; return result;
} }
static void *alloc_pgt_page(void *data)
{
struct kimage *image = (struct kimage *)data;
struct page *page;
void *p = NULL;
page = kimage_alloc_control_pages(image, 0);
if (page) {
p = page_address(page);
clear_page(p);
}
return p;
}
static int init_pgtable(struct kimage *image, unsigned long start_pgtable) static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
{ {
struct x86_mapping_info info = {
.alloc_pgt_page = alloc_pgt_page,
.context = image,
.pmd_flag = __PAGE_KERNEL_LARGE_EXEC,
};
unsigned long mstart, mend;
pgd_t *level4p; pgd_t *level4p;
int result; int result;
int i;
level4p = (pgd_t *)__va(start_pgtable); level4p = (pgd_t *)__va(start_pgtable);
result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT); clear_page(level4p);
if (result) for (i = 0; i < nr_pfn_mapped; i++) {
return result; mstart = pfn_mapped[i].start << PAGE_SHIFT;
mend = pfn_mapped[i].end << PAGE_SHIFT;
result = kernel_ident_mapping_init(&info,
level4p, mstart, mend);
if (result)
return result;
}
/* /*
* image->start may be outside 0 ~ max_pfn, for example when * segments's mem ranges could be outside 0 ~ max_pfn,
* jump back to original kernel from kexeced kernel * for example when jump back to original kernel from kexeced kernel.
* or first kernel is booted with user mem map, and second kernel
* could be loaded out of that range.
*/ */
result = init_one_level2_page(image, level4p, image->start); for (i = 0; i < image->nr_segments; i++) {
if (result) mstart = image->segment[i].mem;
return result; mend = mstart + image->segment[i].memsz;
result = kernel_ident_mapping_init(&info,
level4p, mstart, mend);
if (result)
return result;
}
return init_transition_pgtable(image, level4p); return init_transition_pgtable(image, level4p);
} }
......
This diff is collapsed.
...@@ -688,10 +688,19 @@ void __init early_trap_init(void) ...@@ -688,10 +688,19 @@ void __init early_trap_init(void)
set_intr_gate_ist(X86_TRAP_DB, &debug, DEBUG_STACK); set_intr_gate_ist(X86_TRAP_DB, &debug, DEBUG_STACK);
/* int3 can be called from all */ /* int3 can be called from all */
set_system_intr_gate_ist(X86_TRAP_BP, &int3, DEBUG_STACK); set_system_intr_gate_ist(X86_TRAP_BP, &int3, DEBUG_STACK);
#ifdef CONFIG_X86_32
set_intr_gate(X86_TRAP_PF, &page_fault); set_intr_gate(X86_TRAP_PF, &page_fault);
#endif
load_idt(&idt_descr); load_idt(&idt_descr);
} }
void __init early_trap_pf_init(void)
{
#ifdef CONFIG_X86_64
set_intr_gate(X86_TRAP_PF, &page_fault);
#endif
}
void __init trap_init(void) void __init trap_init(void)
{ {
int i; int i;
......
...@@ -59,6 +59,9 @@ EXPORT_SYMBOL(memcpy); ...@@ -59,6 +59,9 @@ EXPORT_SYMBOL(memcpy);
EXPORT_SYMBOL(__memcpy); EXPORT_SYMBOL(__memcpy);
EXPORT_SYMBOL(memmove); EXPORT_SYMBOL(memmove);
#ifndef CONFIG_DEBUG_VIRTUAL
EXPORT_SYMBOL(phys_base);
#endif
EXPORT_SYMBOL(empty_zero_page); EXPORT_SYMBOL(empty_zero_page);
#ifndef CONFIG_PARAVIRT #ifndef CONFIG_PARAVIRT
EXPORT_SYMBOL(native_load_gs_index); EXPORT_SYMBOL(native_load_gs_index);
......
...@@ -63,10 +63,6 @@ struct x86_init_ops x86_init __initdata = { ...@@ -63,10 +63,6 @@ struct x86_init_ops x86_init __initdata = {
.banner = default_banner, .banner = default_banner,
}, },
.mapping = {
.pagetable_reserve = native_pagetable_reserve,
},
.paging = { .paging = {
.pagetable_init = native_pagetable_init, .pagetable_init = native_pagetable_init,
}, },
......
...@@ -552,7 +552,8 @@ static void lguest_write_cr3(unsigned long cr3) ...@@ -552,7 +552,8 @@ static void lguest_write_cr3(unsigned long cr3)
current_cr3 = cr3; current_cr3 = cr3;
/* These two page tables are simple, linear, and used during boot */ /* These two page tables are simple, linear, and used during boot */
if (cr3 != __pa(swapper_pg_dir) && cr3 != __pa(initial_page_table)) if (cr3 != __pa_symbol(swapper_pg_dir) &&
cr3 != __pa_symbol(initial_page_table))
cr3_changed = true; cr3_changed = true;
} }
......
...@@ -15,11 +15,10 @@ ...@@ -15,11 +15,10 @@
* __get_user_X * __get_user_X
* *
* Inputs: %[r|e]ax contains the address. * Inputs: %[r|e]ax contains the address.
* The register is modified, but all changes are undone
* before returning because the C code doesn't know about it.
* *
* Outputs: %[r|e]ax is error code (0 or -EFAULT) * Outputs: %[r|e]ax is error code (0 or -EFAULT)
* %[r|e]dx contains zero-extended value * %[r|e]dx contains zero-extended value
* %ecx contains the high half for 32-bit __get_user_8
* *
* *
* These functions should not modify any other registers, * These functions should not modify any other registers,
...@@ -42,7 +41,7 @@ ENTRY(__get_user_1) ...@@ -42,7 +41,7 @@ ENTRY(__get_user_1)
cmp TI_addr_limit(%_ASM_DX),%_ASM_AX cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user jae bad_get_user
ASM_STAC ASM_STAC
1: movzb (%_ASM_AX),%edx 1: movzbl (%_ASM_AX),%edx
xor %eax,%eax xor %eax,%eax
ASM_CLAC ASM_CLAC
ret ret
...@@ -72,29 +71,42 @@ ENTRY(__get_user_4) ...@@ -72,29 +71,42 @@ ENTRY(__get_user_4)
cmp TI_addr_limit(%_ASM_DX),%_ASM_AX cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user jae bad_get_user
ASM_STAC ASM_STAC
3: mov -3(%_ASM_AX),%edx 3: movl -3(%_ASM_AX),%edx
xor %eax,%eax xor %eax,%eax
ASM_CLAC ASM_CLAC
ret ret
CFI_ENDPROC CFI_ENDPROC
ENDPROC(__get_user_4) ENDPROC(__get_user_4)
#ifdef CONFIG_X86_64
ENTRY(__get_user_8) ENTRY(__get_user_8)
CFI_STARTPROC CFI_STARTPROC
#ifdef CONFIG_X86_64
add $7,%_ASM_AX add $7,%_ASM_AX
jc bad_get_user jc bad_get_user
GET_THREAD_INFO(%_ASM_DX) GET_THREAD_INFO(%_ASM_DX)
cmp TI_addr_limit(%_ASM_DX),%_ASM_AX cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user jae bad_get_user
ASM_STAC ASM_STAC
4: movq -7(%_ASM_AX),%_ASM_DX 4: movq -7(%_ASM_AX),%rdx
xor %eax,%eax xor %eax,%eax
ASM_CLAC ASM_CLAC
ret ret
#else
add $7,%_ASM_AX
jc bad_get_user_8
GET_THREAD_INFO(%_ASM_DX)
cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user_8
ASM_STAC
4: movl -7(%_ASM_AX),%edx
5: movl -3(%_ASM_AX),%ecx
xor %eax,%eax
ASM_CLAC
ret
#endif
CFI_ENDPROC CFI_ENDPROC
ENDPROC(__get_user_8) ENDPROC(__get_user_8)
#endif
bad_get_user: bad_get_user:
CFI_STARTPROC CFI_STARTPROC
...@@ -105,9 +117,24 @@ bad_get_user: ...@@ -105,9 +117,24 @@ bad_get_user:
CFI_ENDPROC CFI_ENDPROC
END(bad_get_user) END(bad_get_user)
#ifdef CONFIG_X86_32
bad_get_user_8:
CFI_STARTPROC
xor %edx,%edx
xor %ecx,%ecx
mov $(-EFAULT),%_ASM_AX
ASM_CLAC
ret
CFI_ENDPROC
END(bad_get_user_8)
#endif
_ASM_EXTABLE(1b,bad_get_user) _ASM_EXTABLE(1b,bad_get_user)
_ASM_EXTABLE(2b,bad_get_user) _ASM_EXTABLE(2b,bad_get_user)
_ASM_EXTABLE(3b,bad_get_user) _ASM_EXTABLE(3b,bad_get_user)
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
_ASM_EXTABLE(4b,bad_get_user) _ASM_EXTABLE(4b,bad_get_user)
#else
_ASM_EXTABLE(4b,bad_get_user_8)
_ASM_EXTABLE(5b,bad_get_user_8)
#endif #endif
This diff is collapsed.
...@@ -53,25 +53,14 @@ ...@@ -53,25 +53,14 @@
#include <asm/page_types.h> #include <asm/page_types.h>
#include <asm/init.h> #include <asm/init.h>
#include "mm_internal.h"
unsigned long highstart_pfn, highend_pfn; unsigned long highstart_pfn, highend_pfn;
static noinline int do_test_wp_bit(void); static noinline int do_test_wp_bit(void);
bool __read_mostly __vmalloc_start_set = false; bool __read_mostly __vmalloc_start_set = false;
static __init void *alloc_low_page(void)
{
unsigned long pfn = pgt_buf_end++;
void *adr;
if (pfn >= pgt_buf_top)
panic("alloc_low_page: ran out of memory");
adr = __va(pfn * PAGE_SIZE);
clear_page(adr);
return adr;
}
/* /*
* Creates a middle page table and puts a pointer to it in the * Creates a middle page table and puts a pointer to it in the
* given global directory entry. This only returns the gd entry * given global directory entry. This only returns the gd entry
...@@ -84,10 +73,7 @@ static pmd_t * __init one_md_table_init(pgd_t *pgd) ...@@ -84,10 +73,7 @@ static pmd_t * __init one_md_table_init(pgd_t *pgd)
#ifdef CONFIG_X86_PAE #ifdef CONFIG_X86_PAE
if (!(pgd_val(*pgd) & _PAGE_PRESENT)) { if (!(pgd_val(*pgd) & _PAGE_PRESENT)) {
if (after_bootmem) pmd_table = (pmd_t *)alloc_low_page();
pmd_table = (pmd_t *)alloc_bootmem_pages(PAGE_SIZE);
else
pmd_table = (pmd_t *)alloc_low_page();
paravirt_alloc_pmd(&init_mm, __pa(pmd_table) >> PAGE_SHIFT); paravirt_alloc_pmd(&init_mm, __pa(pmd_table) >> PAGE_SHIFT);
set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT)); set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT));
pud = pud_offset(pgd, 0); pud = pud_offset(pgd, 0);
...@@ -109,17 +95,7 @@ static pmd_t * __init one_md_table_init(pgd_t *pgd) ...@@ -109,17 +95,7 @@ static pmd_t * __init one_md_table_init(pgd_t *pgd)
static pte_t * __init one_page_table_init(pmd_t *pmd) static pte_t * __init one_page_table_init(pmd_t *pmd)
{ {
if (!(pmd_val(*pmd) & _PAGE_PRESENT)) { if (!(pmd_val(*pmd) & _PAGE_PRESENT)) {
pte_t *page_table = NULL; pte_t *page_table = (pte_t *)alloc_low_page();
if (after_bootmem) {
#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KMEMCHECK)
page_table = (pte_t *) alloc_bootmem_pages(PAGE_SIZE);
#endif
if (!page_table)
page_table =
(pte_t *)alloc_bootmem_pages(PAGE_SIZE);
} else
page_table = (pte_t *)alloc_low_page();
paravirt_alloc_pte(&init_mm, __pa(page_table) >> PAGE_SHIFT); paravirt_alloc_pte(&init_mm, __pa(page_table) >> PAGE_SHIFT);
set_pmd(pmd, __pmd(__pa(page_table) | _PAGE_TABLE)); set_pmd(pmd, __pmd(__pa(page_table) | _PAGE_TABLE));
...@@ -146,8 +122,39 @@ pte_t * __init populate_extra_pte(unsigned long vaddr) ...@@ -146,8 +122,39 @@ pte_t * __init populate_extra_pte(unsigned long vaddr)
return one_page_table_init(pmd) + pte_idx; return one_page_table_init(pmd) + pte_idx;
} }
static unsigned long __init
page_table_range_init_count(unsigned long start, unsigned long end)
{
unsigned long count = 0;
#ifdef CONFIG_HIGHMEM
int pmd_idx_kmap_begin = fix_to_virt(FIX_KMAP_END) >> PMD_SHIFT;
int pmd_idx_kmap_end = fix_to_virt(FIX_KMAP_BEGIN) >> PMD_SHIFT;
int pgd_idx, pmd_idx;
unsigned long vaddr;
if (pmd_idx_kmap_begin == pmd_idx_kmap_end)
return 0;
vaddr = start;
pgd_idx = pgd_index(vaddr);
for ( ; (pgd_idx < PTRS_PER_PGD) && (vaddr != end); pgd_idx++) {
for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end);
pmd_idx++) {
if ((vaddr >> PMD_SHIFT) >= pmd_idx_kmap_begin &&
(vaddr >> PMD_SHIFT) <= pmd_idx_kmap_end)
count++;
vaddr += PMD_SIZE;
}
pmd_idx = 0;
}
#endif
return count;
}
static pte_t *__init page_table_kmap_check(pte_t *pte, pmd_t *pmd, static pte_t *__init page_table_kmap_check(pte_t *pte, pmd_t *pmd,
unsigned long vaddr, pte_t *lastpte) unsigned long vaddr, pte_t *lastpte,
void **adr)
{ {
#ifdef CONFIG_HIGHMEM #ifdef CONFIG_HIGHMEM
/* /*
...@@ -161,16 +168,15 @@ static pte_t *__init page_table_kmap_check(pte_t *pte, pmd_t *pmd, ...@@ -161,16 +168,15 @@ static pte_t *__init page_table_kmap_check(pte_t *pte, pmd_t *pmd,
if (pmd_idx_kmap_begin != pmd_idx_kmap_end if (pmd_idx_kmap_begin != pmd_idx_kmap_end
&& (vaddr >> PMD_SHIFT) >= pmd_idx_kmap_begin && (vaddr >> PMD_SHIFT) >= pmd_idx_kmap_begin
&& (vaddr >> PMD_SHIFT) <= pmd_idx_kmap_end && (vaddr >> PMD_SHIFT) <= pmd_idx_kmap_end) {
&& ((__pa(pte) >> PAGE_SHIFT) < pgt_buf_start
|| (__pa(pte) >> PAGE_SHIFT) >= pgt_buf_end)) {
pte_t *newpte; pte_t *newpte;
int i; int i;
BUG_ON(after_bootmem); BUG_ON(after_bootmem);
newpte = alloc_low_page(); newpte = *adr;
for (i = 0; i < PTRS_PER_PTE; i++) for (i = 0; i < PTRS_PER_PTE; i++)
set_pte(newpte + i, pte[i]); set_pte(newpte + i, pte[i]);
*adr = (void *)(((unsigned long)(*adr)) + PAGE_SIZE);
paravirt_alloc_pte(&init_mm, __pa(newpte) >> PAGE_SHIFT); paravirt_alloc_pte(&init_mm, __pa(newpte) >> PAGE_SHIFT);
set_pmd(pmd, __pmd(__pa(newpte)|_PAGE_TABLE)); set_pmd(pmd, __pmd(__pa(newpte)|_PAGE_TABLE));
...@@ -204,6 +210,11 @@ page_table_range_init(unsigned long start, unsigned long end, pgd_t *pgd_base) ...@@ -204,6 +210,11 @@ page_table_range_init(unsigned long start, unsigned long end, pgd_t *pgd_base)
pgd_t *pgd; pgd_t *pgd;
pmd_t *pmd; pmd_t *pmd;
pte_t *pte = NULL; pte_t *pte = NULL;
unsigned long count = page_table_range_init_count(start, end);
void *adr = NULL;
if (count)
adr = alloc_low_pages(count);
vaddr = start; vaddr = start;
pgd_idx = pgd_index(vaddr); pgd_idx = pgd_index(vaddr);
...@@ -216,7 +227,7 @@ page_table_range_init(unsigned long start, unsigned long end, pgd_t *pgd_base) ...@@ -216,7 +227,7 @@ page_table_range_init(unsigned long start, unsigned long end, pgd_t *pgd_base)
for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end); for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end);
pmd++, pmd_idx++) { pmd++, pmd_idx++) {
pte = page_table_kmap_check(one_page_table_init(pmd), pte = page_table_kmap_check(one_page_table_init(pmd),
pmd, vaddr, pte); pmd, vaddr, pte, &adr);
vaddr += PMD_SIZE; vaddr += PMD_SIZE;
} }
...@@ -310,6 +321,7 @@ kernel_physical_mapping_init(unsigned long start, ...@@ -310,6 +321,7 @@ kernel_physical_mapping_init(unsigned long start,
__pgprot(PTE_IDENT_ATTR | __pgprot(PTE_IDENT_ATTR |
_PAGE_PSE); _PAGE_PSE);
pfn &= PMD_MASK >> PAGE_SHIFT;
addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE + addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE +
PAGE_OFFSET + PAGE_SIZE-1; PAGE_OFFSET + PAGE_SIZE-1;
...@@ -455,9 +467,14 @@ void __init native_pagetable_init(void) ...@@ -455,9 +467,14 @@ void __init native_pagetable_init(void)
/* /*
* Remove any mappings which extend past the end of physical * Remove any mappings which extend past the end of physical
* memory from the boot time page table: * memory from the boot time page table.
* In virtual address space, we should have at least two pages
* from VMALLOC_END to pkmap or fixmap according to VMALLOC_END
* definition. And max_low_pfn is set to VMALLOC_END physical
* address. If initial memory mapping is doing right job, we
* should have pte used near max_low_pfn or one pmd is not present.
*/ */
for (pfn = max_low_pfn + 1; pfn < 1<<(32-PAGE_SHIFT); pfn++) { for (pfn = max_low_pfn; pfn < 1<<(32-PAGE_SHIFT); pfn++) {
va = PAGE_OFFSET + (pfn<<PAGE_SHIFT); va = PAGE_OFFSET + (pfn<<PAGE_SHIFT);
pgd = base + pgd_index(va); pgd = base + pgd_index(va);
if (!pgd_present(*pgd)) if (!pgd_present(*pgd))
...@@ -468,10 +485,19 @@ void __init native_pagetable_init(void) ...@@ -468,10 +485,19 @@ void __init native_pagetable_init(void)
if (!pmd_present(*pmd)) if (!pmd_present(*pmd))
break; break;
/* should not be large page here */
if (pmd_large(*pmd)) {
pr_warn("try to clear pte for ram above max_low_pfn: pfn: %lx pmd: %p pmd phys: %lx, but pmd is big page and is not using pte !\n",
pfn, pmd, __pa(pmd));
BUG_ON(1);
}
pte = pte_offset_kernel(pmd, va); pte = pte_offset_kernel(pmd, va);
if (!pte_present(*pte)) if (!pte_present(*pte))
break; break;
printk(KERN_DEBUG "clearing pte for ram above max_low_pfn: pfn: %lx pmd: %p pmd phys: %lx pte: %p pte phys: %lx\n",
pfn, pmd, __pa(pmd), pte, __pa(pte));
pte_clear(NULL, va, pte); pte_clear(NULL, va, pte);
} }
paravirt_alloc_pmd(&init_mm, __pa(base) >> PAGE_SHIFT); paravirt_alloc_pmd(&init_mm, __pa(base) >> PAGE_SHIFT);
...@@ -550,7 +576,7 @@ early_param("highmem", parse_highmem); ...@@ -550,7 +576,7 @@ early_param("highmem", parse_highmem);
* artificially via the highmem=x boot parameter then create * artificially via the highmem=x boot parameter then create
* it: * it:
*/ */
void __init lowmem_pfn_init(void) static void __init lowmem_pfn_init(void)
{ {
/* max_low_pfn is 0, we already have early_res support */ /* max_low_pfn is 0, we already have early_res support */
max_low_pfn = max_pfn; max_low_pfn = max_pfn;
...@@ -586,7 +612,7 @@ void __init lowmem_pfn_init(void) ...@@ -586,7 +612,7 @@ void __init lowmem_pfn_init(void)
* We have more RAM than fits into lowmem - we try to put it into * We have more RAM than fits into lowmem - we try to put it into
* highmem, also taking the highmem=x boot parameter into account: * highmem, also taking the highmem=x boot parameter into account:
*/ */
void __init highmem_pfn_init(void) static void __init highmem_pfn_init(void)
{ {
max_low_pfn = MAXMEM_PFN; max_low_pfn = MAXMEM_PFN;
...@@ -669,8 +695,6 @@ void __init setup_bootmem_allocator(void) ...@@ -669,8 +695,6 @@ void __init setup_bootmem_allocator(void)
printk(KERN_INFO " mapped low ram: 0 - %08lx\n", printk(KERN_INFO " mapped low ram: 0 - %08lx\n",
max_pfn_mapped<<PAGE_SHIFT); max_pfn_mapped<<PAGE_SHIFT);
printk(KERN_INFO " low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT); printk(KERN_INFO " low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);
after_bootmem = 1;
} }
/* /*
...@@ -753,6 +777,8 @@ void __init mem_init(void) ...@@ -753,6 +777,8 @@ void __init mem_init(void)
if (page_is_ram(tmp) && PageReserved(pfn_to_page(tmp))) if (page_is_ram(tmp) && PageReserved(pfn_to_page(tmp)))
reservedpages++; reservedpages++;
after_bootmem = 1;
codesize = (unsigned long) &_etext - (unsigned long) &_text; codesize = (unsigned long) &_etext - (unsigned long) &_text;
datasize = (unsigned long) &_edata - (unsigned long) &_etext; datasize = (unsigned long) &_edata - (unsigned long) &_etext;
initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin; initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;
......
This diff is collapsed.
#ifndef __X86_MM_INTERNAL_H
#define __X86_MM_INTERNAL_H
void *alloc_low_pages(unsigned int num);
static inline void *alloc_low_page(void)
{
return alloc_low_pages(1);
}
void early_ioremap_page_table_range_init(void);
unsigned long kernel_physical_mapping_init(unsigned long start,
unsigned long end,
unsigned long page_size_mask);
void zone_sizes_init(void);
extern int after_bootmem;
#endif /* __X86_MM_INTERNAL_H */
...@@ -193,7 +193,6 @@ int __init numa_add_memblk(int nid, u64 start, u64 end) ...@@ -193,7 +193,6 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
static void __init setup_node_data(int nid, u64 start, u64 end) static void __init setup_node_data(int nid, u64 start, u64 end)
{ {
const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE); const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
bool remapped = false;
u64 nd_pa; u64 nd_pa;
void *nd; void *nd;
int tnid; int tnid;
...@@ -205,37 +204,28 @@ static void __init setup_node_data(int nid, u64 start, u64 end) ...@@ -205,37 +204,28 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
if (end && (end - start) < NODE_MIN_SIZE) if (end && (end - start) < NODE_MIN_SIZE)
return; return;
/* initialize remap allocator before aligning to ZONE_ALIGN */
init_alloc_remap(nid, start, end);
start = roundup(start, ZONE_ALIGN); start = roundup(start, ZONE_ALIGN);
printk(KERN_INFO "Initmem setup node %d [mem %#010Lx-%#010Lx]\n", printk(KERN_INFO "Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
nid, start, end - 1); nid, start, end - 1);
/* /*
* Allocate node data. Try remap allocator first, node-local * Allocate node data. Try node-local memory and then any node.
* memory and then any node. Never allocate in DMA zone. * Never allocate in DMA zone.
*/ */
nd = alloc_remap(nid, nd_size); nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
if (nd) { if (!nd_pa) {
nd_pa = __pa(nd); pr_err("Cannot find %zu bytes in node %d\n",
remapped = true; nd_size, nid);
} else { return;
nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
pr_err("Cannot find %zu bytes in node %d\n",
nd_size, nid);
return;
}
nd = __va(nd_pa);
} }
nd = __va(nd_pa);
/* report and initialize */ /* report and initialize */
printk(KERN_INFO " NODE_DATA [mem %#010Lx-%#010Lx]%s\n", printk(KERN_INFO " NODE_DATA [mem %#010Lx-%#010Lx]\n",
nd_pa, nd_pa + nd_size - 1, remapped ? " (remapped)" : ""); nd_pa, nd_pa + nd_size - 1);
tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT); tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
if (!remapped && tnid != nid) if (tnid != nid)
printk(KERN_INFO " NODE_DATA(%d) on node %d\n", nid, tnid); printk(KERN_INFO " NODE_DATA(%d) on node %d\n", nid, tnid);
node_data[nid] = nd; node_data[nid] = nd;
......
...@@ -73,167 +73,6 @@ unsigned long node_memmap_size_bytes(int nid, unsigned long start_pfn, ...@@ -73,167 +73,6 @@ unsigned long node_memmap_size_bytes(int nid, unsigned long start_pfn,
extern unsigned long highend_pfn, highstart_pfn; extern unsigned long highend_pfn, highstart_pfn;
#define LARGE_PAGE_BYTES (PTRS_PER_PTE * PAGE_SIZE)
static void *node_remap_start_vaddr[MAX_NUMNODES];
void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
/*
* Remap memory allocator
*/
static unsigned long node_remap_start_pfn[MAX_NUMNODES];
static void *node_remap_end_vaddr[MAX_NUMNODES];
static void *node_remap_alloc_vaddr[MAX_NUMNODES];
/**
* alloc_remap - Allocate remapped memory
* @nid: NUMA node to allocate memory from
* @size: The size of allocation
*
* Allocate @size bytes from the remap area of NUMA node @nid. The
* size of the remap area is predetermined by init_alloc_remap() and
* only the callers considered there should call this function. For
* more info, please read the comment on top of init_alloc_remap().
*
* The caller must be ready to handle allocation failure from this
* function and fall back to regular memory allocator in such cases.
*
* CONTEXT:
* Single CPU early boot context.
*
* RETURNS:
* Pointer to the allocated memory on success, %NULL on failure.
*/
void *alloc_remap(int nid, unsigned long size)
{
void *allocation = node_remap_alloc_vaddr[nid];
size = ALIGN(size, L1_CACHE_BYTES);
if (!allocation || (allocation + size) > node_remap_end_vaddr[nid])
return NULL;
node_remap_alloc_vaddr[nid] += size;
memset(allocation, 0, size);
return allocation;
}
#ifdef CONFIG_HIBERNATION
/**
* resume_map_numa_kva - add KVA mapping to the temporary page tables created
* during resume from hibernation
* @pgd_base - temporary resume page directory
*/
void resume_map_numa_kva(pgd_t *pgd_base)
{
int node;
for_each_online_node(node) {
unsigned long start_va, start_pfn, nr_pages, pfn;
start_va = (unsigned long)node_remap_start_vaddr[node];
start_pfn = node_remap_start_pfn[node];
nr_pages = (node_remap_end_vaddr[node] -
node_remap_start_vaddr[node]) >> PAGE_SHIFT;
printk(KERN_DEBUG "%s: node %d\n", __func__, node);
for (pfn = 0; pfn < nr_pages; pfn += PTRS_PER_PTE) {
unsigned long vaddr = start_va + (pfn << PAGE_SHIFT);
pgd_t *pgd = pgd_base + pgd_index(vaddr);
pud_t *pud = pud_offset(pgd, vaddr);
pmd_t *pmd = pmd_offset(pud, vaddr);
set_pmd(pmd, pfn_pmd(start_pfn + pfn,
PAGE_KERNEL_LARGE_EXEC));
printk(KERN_DEBUG "%s: %08lx -> pfn %08lx\n",
__func__, vaddr, start_pfn + pfn);
}
}
}
#endif
/**
* init_alloc_remap - Initialize remap allocator for a NUMA node
* @nid: NUMA node to initizlie remap allocator for
*
* NUMA nodes may end up without any lowmem. As allocating pgdat and
* memmap on a different node with lowmem is inefficient, a special
* remap allocator is implemented which can be used by alloc_remap().
*
* For each node, the amount of memory which will be necessary for
* pgdat and memmap is calculated and two memory areas of the size are
* allocated - one in the node and the other in lowmem; then, the area
* in the node is remapped to the lowmem area.
*
* As pgdat and memmap must be allocated in lowmem anyway, this
* doesn't waste lowmem address space; however, the actual lowmem
* which gets remapped over is wasted. The amount shouldn't be
* problematic on machines this feature will be used.
*
* Initialization failure isn't fatal. alloc_remap() is used
* opportunistically and the callers will fall back to other memory
* allocation mechanisms on failure.
*/
void __init init_alloc_remap(int nid, u64 start, u64 end)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long end_pfn = end >> PAGE_SHIFT;
unsigned long size, pfn;
u64 node_pa, remap_pa;
void *remap_va;
/*
* The acpi/srat node info can show hot-add memroy zones where
* memory could be added but not currently present.
*/
printk(KERN_DEBUG "node %d pfn: [%lx - %lx]\n",
nid, start_pfn, end_pfn);
/* calculate the necessary space aligned to large page size */
size = node_memmap_size_bytes(nid, start_pfn, end_pfn);
size += ALIGN(sizeof(pg_data_t), PAGE_SIZE);
size = ALIGN(size, LARGE_PAGE_BYTES);
/* allocate node memory and the lowmem remap area */
node_pa = memblock_find_in_range(start, end, size, LARGE_PAGE_BYTES);
if (!node_pa) {
pr_warning("remap_alloc: failed to allocate %lu bytes for node %d\n",
size, nid);
return;
}
memblock_reserve(node_pa, size);
remap_pa = memblock_find_in_range(min_low_pfn << PAGE_SHIFT,
max_low_pfn << PAGE_SHIFT,
size, LARGE_PAGE_BYTES);
if (!remap_pa) {
pr_warning("remap_alloc: failed to allocate %lu bytes remap area for node %d\n",
size, nid);
memblock_free(node_pa, size);
return;
}
memblock_reserve(remap_pa, size);
remap_va = phys_to_virt(remap_pa);
/* perform actual remap */
for (pfn = 0; pfn < size >> PAGE_SHIFT; pfn += PTRS_PER_PTE)
set_pmd_pfn((unsigned long)remap_va + (pfn << PAGE_SHIFT),
(node_pa >> PAGE_SHIFT) + pfn,
PAGE_KERNEL_LARGE);
/* initialize remap allocator parameters */
node_remap_start_pfn[nid] = node_pa >> PAGE_SHIFT;
node_remap_start_vaddr[nid] = remap_va;
node_remap_end_vaddr[nid] = remap_va + size;
node_remap_alloc_vaddr[nid] = remap_va;
printk(KERN_DEBUG "remap_alloc: node %d [%08llx-%08llx) -> [%p-%p)\n",
nid, node_pa, node_pa + size, remap_va, remap_va + size);
}
void __init initmem_init(void) void __init initmem_init(void)
{ {
x86_numa_init(); x86_numa_init();
......
...@@ -10,16 +10,3 @@ void __init initmem_init(void) ...@@ -10,16 +10,3 @@ void __init initmem_init(void)
{ {
x86_numa_init(); x86_numa_init();
} }
unsigned long __init numa_free_all_bootmem(void)
{
unsigned long pages = 0;
int i;
for_each_online_node(i)
pages += free_all_bootmem_node(NODE_DATA(i));
pages += free_low_memory_core_early(MAX_NUMNODES);
return pages;
}
...@@ -21,12 +21,6 @@ void __init numa_reset_distance(void); ...@@ -21,12 +21,6 @@ void __init numa_reset_distance(void);
void __init x86_numa_init(void); void __init x86_numa_init(void);
#ifdef CONFIG_X86_64
static inline void init_alloc_remap(int nid, u64 start, u64 end) { }
#else
void __init init_alloc_remap(int nid, u64 start, u64 end);
#endif
#ifdef CONFIG_NUMA_EMU #ifdef CONFIG_NUMA_EMU
void __init numa_emulation(struct numa_meminfo *numa_meminfo, void __init numa_emulation(struct numa_meminfo *numa_meminfo,
int numa_dist_cnt); int numa_dist_cnt);
......
...@@ -94,12 +94,12 @@ static inline void split_page_count(int level) { } ...@@ -94,12 +94,12 @@ static inline void split_page_count(int level) { }
static inline unsigned long highmap_start_pfn(void) static inline unsigned long highmap_start_pfn(void)
{ {
return __pa(_text) >> PAGE_SHIFT; return __pa_symbol(_text) >> PAGE_SHIFT;
} }
static inline unsigned long highmap_end_pfn(void) static inline unsigned long highmap_end_pfn(void)
{ {
return __pa(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT; return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT;
} }
#endif #endif
...@@ -276,8 +276,8 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long address, ...@@ -276,8 +276,8 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
* The .rodata section needs to be read-only. Using the pfn * The .rodata section needs to be read-only. Using the pfn
* catches all aliases. * catches all aliases.
*/ */
if (within(pfn, __pa((unsigned long)__start_rodata) >> PAGE_SHIFT, if (within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
__pa((unsigned long)__end_rodata) >> PAGE_SHIFT)) __pa_symbol(__end_rodata) >> PAGE_SHIFT))
pgprot_val(forbidden) |= _PAGE_RW; pgprot_val(forbidden) |= _PAGE_RW;
#if defined(CONFIG_X86_64) && defined(CONFIG_DEBUG_RODATA) #if defined(CONFIG_X86_64) && defined(CONFIG_DEBUG_RODATA)
...@@ -363,6 +363,37 @@ pte_t *lookup_address(unsigned long address, unsigned int *level) ...@@ -363,6 +363,37 @@ pte_t *lookup_address(unsigned long address, unsigned int *level)
} }
EXPORT_SYMBOL_GPL(lookup_address); EXPORT_SYMBOL_GPL(lookup_address);
/*
* This is necessary because __pa() does not work on some
* kinds of memory, like vmalloc() or the alloc_remap()
* areas on 32-bit NUMA systems. The percpu areas can
* end up in this kind of memory, for instance.
*
* This could be optimized, but it is only intended to be
* used at inititalization time, and keeping it
* unoptimized should increase the testing coverage for
* the more obscure platforms.
*/
phys_addr_t slow_virt_to_phys(void *__virt_addr)
{
unsigned long virt_addr = (unsigned long)__virt_addr;
phys_addr_t phys_addr;
unsigned long offset;
enum pg_level level;
unsigned long psize;
unsigned long pmask;
pte_t *pte;
pte = lookup_address(virt_addr, &level);
BUG_ON(!pte);
psize = page_level_size(level);
pmask = page_level_mask(level);
offset = virt_addr & ~pmask;
phys_addr = pte_pfn(*pte) << PAGE_SHIFT;
return (phys_addr | offset);
}
EXPORT_SYMBOL_GPL(slow_virt_to_phys);
/* /*
* Set the new pmd in all the pgds we know about: * Set the new pmd in all the pgds we know about:
*/ */
...@@ -396,7 +427,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, ...@@ -396,7 +427,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
pte_t new_pte, old_pte, *tmp; pte_t new_pte, old_pte, *tmp;
pgprot_t old_prot, new_prot, req_prot; pgprot_t old_prot, new_prot, req_prot;
int i, do_split = 1; int i, do_split = 1;
unsigned int level; enum pg_level level;
if (cpa->force_split) if (cpa->force_split)
return 1; return 1;
...@@ -412,15 +443,12 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, ...@@ -412,15 +443,12 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
switch (level) { switch (level) {
case PG_LEVEL_2M: case PG_LEVEL_2M:
psize = PMD_PAGE_SIZE;
pmask = PMD_PAGE_MASK;
break;
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
case PG_LEVEL_1G: case PG_LEVEL_1G:
psize = PUD_PAGE_SIZE;
pmask = PUD_PAGE_MASK;
break;
#endif #endif
psize = page_level_size(level);
pmask = page_level_mask(level);
break;
default: default:
do_split = -EINVAL; do_split = -EINVAL;
goto out_unlock; goto out_unlock;
...@@ -551,16 +579,10 @@ static int split_large_page(pte_t *kpte, unsigned long address) ...@@ -551,16 +579,10 @@ static int split_large_page(pte_t *kpte, unsigned long address)
for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc) for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc)
set_pte(&pbase[i], pfn_pte(pfn, ref_prot)); set_pte(&pbase[i], pfn_pte(pfn, ref_prot));
if (address >= (unsigned long)__va(0) && if (pfn_range_is_mapped(PFN_DOWN(__pa(address)),
address < (unsigned long)__va(max_low_pfn_mapped << PAGE_SHIFT)) PFN_DOWN(__pa(address)) + 1))
split_page_count(level); split_page_count(level);
#ifdef CONFIG_X86_64
if (address >= (unsigned long)__va(1UL<<32) &&
address < (unsigned long)__va(max_pfn_mapped << PAGE_SHIFT))
split_page_count(level);
#endif
/* /*
* Install the new, split up pagetable. * Install the new, split up pagetable.
* *
...@@ -729,13 +751,9 @@ static int cpa_process_alias(struct cpa_data *cpa) ...@@ -729,13 +751,9 @@ static int cpa_process_alias(struct cpa_data *cpa)
unsigned long vaddr; unsigned long vaddr;
int ret; int ret;
if (cpa->pfn >= max_pfn_mapped) if (!pfn_range_is_mapped(cpa->pfn, cpa->pfn + 1))
return 0; return 0;
#ifdef CONFIG_X86_64
if (cpa->pfn >= max_low_pfn_mapped && cpa->pfn < (1UL<<(32-PAGE_SHIFT)))
return 0;
#endif
/* /*
* No need to redo, when the primary call touched the direct * No need to redo, when the primary call touched the direct
* mapping already: * mapping already:
......
...@@ -560,10 +560,10 @@ int kernel_map_sync_memtype(u64 base, unsigned long size, unsigned long flags) ...@@ -560,10 +560,10 @@ int kernel_map_sync_memtype(u64 base, unsigned long size, unsigned long flags)
{ {
unsigned long id_sz; unsigned long id_sz;
if (base >= __pa(high_memory)) if (base > __pa(high_memory-1))
return 0; return 0;
id_sz = (__pa(high_memory) < base + size) ? id_sz = (__pa(high_memory-1) <= base + size) ?
__pa(high_memory) - base : __pa(high_memory) - base :
size; size;
......
...@@ -334,7 +334,12 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, ...@@ -334,7 +334,12 @@ int pmdp_set_access_flags(struct vm_area_struct *vma,
if (changed && dirty) { if (changed && dirty) {
*pmdp = entry; *pmdp = entry;
pmd_update_defer(vma->vm_mm, address, pmdp); pmd_update_defer(vma->vm_mm, address, pmdp);
flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE); /*
* We had a write-protection fault here and changed the pmd
* to to more permissive. No need to flush the TLB for that,
* #PF is architecturally guaranteed to do that and in the
* worst-case we'll generate a spurious fault.
*/
} }
return changed; return changed;
......
#include <linux/bootmem.h>
#include <linux/mmdebug.h> #include <linux/mmdebug.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/mm.h> #include <linux/mm.h>
...@@ -8,33 +9,54 @@ ...@@ -8,33 +9,54 @@
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
#ifdef CONFIG_DEBUG_VIRTUAL
unsigned long __phys_addr(unsigned long x) unsigned long __phys_addr(unsigned long x)
{ {
if (x >= __START_KERNEL_map) { unsigned long y = x - __START_KERNEL_map;
x -= __START_KERNEL_map;
VIRTUAL_BUG_ON(x >= KERNEL_IMAGE_SIZE); /* use the carry flag to determine if x was < __START_KERNEL_map */
x += phys_base; if (unlikely(x > y)) {
x = y + phys_base;
VIRTUAL_BUG_ON(y >= KERNEL_IMAGE_SIZE);
} else { } else {
VIRTUAL_BUG_ON(x < PAGE_OFFSET); x = y + (__START_KERNEL_map - PAGE_OFFSET);
x -= PAGE_OFFSET;
VIRTUAL_BUG_ON(!phys_addr_valid(x)); /* carry flag will be set if starting x was >= PAGE_OFFSET */
VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
} }
return x; return x;
} }
EXPORT_SYMBOL(__phys_addr); EXPORT_SYMBOL(__phys_addr);
unsigned long __phys_addr_symbol(unsigned long x)
{
unsigned long y = x - __START_KERNEL_map;
/* only check upper bounds since lower bounds will trigger carry */
VIRTUAL_BUG_ON(y >= KERNEL_IMAGE_SIZE);
return y + phys_base;
}
EXPORT_SYMBOL(__phys_addr_symbol);
#endif
bool __virt_addr_valid(unsigned long x) bool __virt_addr_valid(unsigned long x)
{ {
if (x >= __START_KERNEL_map) { unsigned long y = x - __START_KERNEL_map;
x -= __START_KERNEL_map;
if (x >= KERNEL_IMAGE_SIZE) /* use the carry flag to determine if x was < __START_KERNEL_map */
if (unlikely(x > y)) {
x = y + phys_base;
if (y >= KERNEL_IMAGE_SIZE)
return false; return false;
x += phys_base;
} else { } else {
if (x < PAGE_OFFSET) x = y + (__START_KERNEL_map - PAGE_OFFSET);
return false;
x -= PAGE_OFFSET; /* carry flag will be set if starting x was >= PAGE_OFFSET */
if (!phys_addr_valid(x)) if ((x > y) || !phys_addr_valid(x))
return false; return false;
} }
...@@ -47,10 +69,16 @@ EXPORT_SYMBOL(__virt_addr_valid); ...@@ -47,10 +69,16 @@ EXPORT_SYMBOL(__virt_addr_valid);
#ifdef CONFIG_DEBUG_VIRTUAL #ifdef CONFIG_DEBUG_VIRTUAL
unsigned long __phys_addr(unsigned long x) unsigned long __phys_addr(unsigned long x)
{ {
unsigned long phys_addr = x - PAGE_OFFSET;
/* VMALLOC_* aren't constants */ /* VMALLOC_* aren't constants */
VIRTUAL_BUG_ON(x < PAGE_OFFSET); VIRTUAL_BUG_ON(x < PAGE_OFFSET);
VIRTUAL_BUG_ON(__vmalloc_start_set && is_vmalloc_addr((void *) x)); VIRTUAL_BUG_ON(__vmalloc_start_set && is_vmalloc_addr((void *) x));
return x - PAGE_OFFSET; /* max_low_pfn is set early, but not _that_ early */
if (max_low_pfn) {
VIRTUAL_BUG_ON((phys_addr >> PAGE_SHIFT) > max_low_pfn);
BUG_ON(slow_virt_to_phys((void *)x) != phys_addr);
}
return phys_addr;
} }
EXPORT_SYMBOL(__phys_addr); EXPORT_SYMBOL(__phys_addr);
#endif #endif
......
...@@ -416,8 +416,8 @@ void __init efi_reserve_boot_services(void) ...@@ -416,8 +416,8 @@ void __init efi_reserve_boot_services(void)
* - Not within any part of the kernel * - Not within any part of the kernel
* - Not the bios reserved area * - Not the bios reserved area
*/ */
if ((start+size >= virt_to_phys(_text) if ((start+size >= __pa_symbol(_text)
&& start <= virt_to_phys(_end)) || && start <= __pa_symbol(_end)) ||
!e820_all_mapped(start, start+size, E820_RAM) || !e820_all_mapped(start, start+size, E820_RAM) ||
memblock_is_region_reserved(start, size)) { memblock_is_region_reserved(start, size)) {
/* Could not reserve, skip it */ /* Could not reserve, skip it */
...@@ -843,7 +843,7 @@ void __init efi_enter_virtual_mode(void) ...@@ -843,7 +843,7 @@ void __init efi_enter_virtual_mode(void)
efi_memory_desc_t *md, *prev_md = NULL; efi_memory_desc_t *md, *prev_md = NULL;
efi_status_t status; efi_status_t status;
unsigned long size; unsigned long size;
u64 end, systab, end_pfn; u64 end, systab, start_pfn, end_pfn;
void *p, *va, *new_memmap = NULL; void *p, *va, *new_memmap = NULL;
int count = 0; int count = 0;
...@@ -896,10 +896,9 @@ void __init efi_enter_virtual_mode(void) ...@@ -896,10 +896,9 @@ void __init efi_enter_virtual_mode(void)
size = md->num_pages << EFI_PAGE_SHIFT; size = md->num_pages << EFI_PAGE_SHIFT;
end = md->phys_addr + size; end = md->phys_addr + size;
start_pfn = PFN_DOWN(md->phys_addr);
end_pfn = PFN_UP(end); end_pfn = PFN_UP(end);
if (end_pfn <= max_low_pfn_mapped if (pfn_range_is_mapped(start_pfn, end_pfn)) {
|| (end_pfn > (1UL << (32 - PAGE_SHIFT))
&& end_pfn <= max_pfn_mapped)) {
va = __va(md->phys_addr); va = __va(md->phys_addr);
if (!(md->attribute & EFI_MEMORY_WB)) if (!(md->attribute & EFI_MEMORY_WB))
......
...@@ -129,8 +129,6 @@ static int resume_physical_mapping_init(pgd_t *pgd_base) ...@@ -129,8 +129,6 @@ static int resume_physical_mapping_init(pgd_t *pgd_base)
} }
} }
resume_map_numa_kva(pgd_base);
return 0; return 0;
} }
......
...@@ -11,6 +11,8 @@ ...@@ -11,6 +11,8 @@
#include <linux/gfp.h> #include <linux/gfp.h>
#include <linux/smp.h> #include <linux/smp.h>
#include <linux/suspend.h> #include <linux/suspend.h>
#include <asm/init.h>
#include <asm/proto.h> #include <asm/proto.h>
#include <asm/page.h> #include <asm/page.h>
#include <asm/pgtable.h> #include <asm/pgtable.h>
...@@ -39,41 +41,21 @@ pgd_t *temp_level4_pgt; ...@@ -39,41 +41,21 @@ pgd_t *temp_level4_pgt;
void *relocated_restore_code; void *relocated_restore_code;
static int res_phys_pud_init(pud_t *pud, unsigned long address, unsigned long end) static void *alloc_pgt_page(void *context)
{ {
long i, j; return (void *)get_safe_page(GFP_ATOMIC);
i = pud_index(address);
pud = pud + i;
for (; i < PTRS_PER_PUD; pud++, i++) {
unsigned long paddr;
pmd_t *pmd;
paddr = address + i*PUD_SIZE;
if (paddr >= end)
break;
pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
if (!pmd)
return -ENOMEM;
set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
for (j = 0; j < PTRS_PER_PMD; pmd++, j++, paddr += PMD_SIZE) {
unsigned long pe;
if (paddr >= end)
break;
pe = __PAGE_KERNEL_LARGE_EXEC | paddr;
pe &= __supported_pte_mask;
set_pmd(pmd, __pmd(pe));
}
}
return 0;
} }
static int set_up_temporary_mappings(void) static int set_up_temporary_mappings(void)
{ {
unsigned long start, end, next; struct x86_mapping_info info = {
int error; .alloc_pgt_page = alloc_pgt_page,
.pmd_flag = __PAGE_KERNEL_LARGE_EXEC,
.kernel_mapping = true,
};
unsigned long mstart, mend;
int result;
int i;
temp_level4_pgt = (pgd_t *)get_safe_page(GFP_ATOMIC); temp_level4_pgt = (pgd_t *)get_safe_page(GFP_ATOMIC);
if (!temp_level4_pgt) if (!temp_level4_pgt)
...@@ -84,21 +66,17 @@ static int set_up_temporary_mappings(void) ...@@ -84,21 +66,17 @@ static int set_up_temporary_mappings(void)
init_level4_pgt[pgd_index(__START_KERNEL_map)]); init_level4_pgt[pgd_index(__START_KERNEL_map)]);
/* Set up the direct mapping from scratch */ /* Set up the direct mapping from scratch */
start = (unsigned long)pfn_to_kaddr(0); for (i = 0; i < nr_pfn_mapped; i++) {
end = (unsigned long)pfn_to_kaddr(max_pfn); mstart = pfn_mapped[i].start << PAGE_SHIFT;
mend = pfn_mapped[i].end << PAGE_SHIFT;
for (; start < end; start = next) {
pud_t *pud = (pud_t *)get_safe_page(GFP_ATOMIC); result = kernel_ident_mapping_init(&info, temp_level4_pgt,
if (!pud) mstart, mend);
return -ENOMEM;
next = start + PGDIR_SIZE; if (result)
if (next > end) return result;
next = end;
if ((error = res_phys_pud_init(pud, __pa(start), __pa(next))))
return error;
set_pgd(temp_level4_pgt + pgd_index(start),
mk_kernel_pgd(__pa(pud)));
} }
return 0; return 0;
} }
......
...@@ -8,9 +8,26 @@ ...@@ -8,9 +8,26 @@
struct real_mode_header *real_mode_header; struct real_mode_header *real_mode_header;
u32 *trampoline_cr4_features; u32 *trampoline_cr4_features;
void __init setup_real_mode(void) void __init reserve_real_mode(void)
{ {
phys_addr_t mem; phys_addr_t mem;
unsigned char *base;
size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
/* Has to be under 1M so we can execute real-mode AP code. */
mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
if (!mem)
panic("Cannot allocate trampoline\n");
base = __va(mem);
memblock_reserve(mem, size);
real_mode_header = (struct real_mode_header *) base;
printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
base, (unsigned long long)mem, size);
}
void __init setup_real_mode(void)
{
u16 real_mode_seg; u16 real_mode_seg;
u32 *rel; u32 *rel;
u32 count; u32 count;
...@@ -25,16 +42,7 @@ void __init setup_real_mode(void) ...@@ -25,16 +42,7 @@ void __init setup_real_mode(void)
u64 efer; u64 efer;
#endif #endif
/* Has to be in very low memory so we can execute real-mode AP code. */ base = (unsigned char *)real_mode_header;
mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
if (!mem)
panic("Cannot allocate trampoline\n");
base = __va(mem);
memblock_reserve(mem, size);
real_mode_header = (struct real_mode_header *) base;
printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
base, (unsigned long long)mem, size);
memcpy(base, real_mode_blob, size); memcpy(base, real_mode_blob, size);
...@@ -62,9 +70,9 @@ void __init setup_real_mode(void) ...@@ -62,9 +70,9 @@ void __init setup_real_mode(void)
__va(real_mode_header->trampoline_header); __va(real_mode_header->trampoline_header);
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
trampoline_header->start = __pa(startup_32_smp); trampoline_header->start = __pa_symbol(startup_32_smp);
trampoline_header->gdt_limit = __BOOT_DS + 7; trampoline_header->gdt_limit = __BOOT_DS + 7;
trampoline_header->gdt_base = __pa(boot_gdt); trampoline_header->gdt_base = __pa_symbol(boot_gdt);
#else #else
/* /*
* Some AMD processors will #GP(0) if EFER.LMA is set in WRMSR * Some AMD processors will #GP(0) if EFER.LMA is set in WRMSR
...@@ -78,16 +86,18 @@ void __init setup_real_mode(void) ...@@ -78,16 +86,18 @@ void __init setup_real_mode(void)
*trampoline_cr4_features = read_cr4(); *trampoline_cr4_features = read_cr4();
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd); trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
trampoline_pgd[0] = __pa(level3_ident_pgt) + _KERNPG_TABLE; trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
trampoline_pgd[511] = __pa(level3_kernel_pgt) + _KERNPG_TABLE; trampoline_pgd[511] = init_level4_pgt[511].pgd;
#endif #endif
} }
/* /*
* set_real_mode_permissions() gets called very early, to guarantee the * reserve_real_mode() gets called very early, to guarantee the
* availability of low memory. This is before the proper kernel page * availability of low memory. This is before the proper kernel page
* tables are set up, so we cannot set page permissions in that * tables are set up, so we cannot set page permissions in that
* function. Thus, we use an arch_initcall instead. * function. Also trampoline code will be executed by APs so we
* need to mark it executable at do_pre_smp_initcalls() at least,
* thus run it as a early_initcall().
*/ */
static int __init set_real_mode_permissions(void) static int __init set_real_mode_permissions(void)
{ {
...@@ -111,5 +121,4 @@ static int __init set_real_mode_permissions(void) ...@@ -111,5 +121,4 @@ static int __init set_real_mode_permissions(void)
return 0; return 0;
} }
early_initcall(set_real_mode_permissions);
arch_initcall(set_real_mode_permissions);
...@@ -1178,20 +1178,6 @@ static void xen_exit_mmap(struct mm_struct *mm) ...@@ -1178,20 +1178,6 @@ static void xen_exit_mmap(struct mm_struct *mm)
static void xen_post_allocator_init(void); static void xen_post_allocator_init(void);
static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
{
/* reserve the range used */
native_pagetable_reserve(start, end);
/* set as RW the rest */
printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
PFN_PHYS(pgt_buf_top));
while (end < PFN_PHYS(pgt_buf_top)) {
make_lowmem_page_readwrite(__va(end));
end += PAGE_SIZE;
}
}
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
static void __init xen_cleanhighmap(unsigned long vaddr, static void __init xen_cleanhighmap(unsigned long vaddr,
unsigned long vaddr_end) unsigned long vaddr_end)
...@@ -1503,19 +1489,6 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte) ...@@ -1503,19 +1489,6 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
#else /* CONFIG_X86_64 */ #else /* CONFIG_X86_64 */
static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte) static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
{ {
unsigned long pfn = pte_pfn(pte);
/*
* If the new pfn is within the range of the newly allocated
* kernel pagetable, and it isn't being mapped into an
* early_ioremap fixmap slot as a freshly allocated page, make sure
* it is RO.
*/
if (((!is_early_ioremap_ptep(ptep) &&
pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
pte = pte_wrprotect(pte);
return pte; return pte;
} }
#endif /* CONFIG_X86_64 */ #endif /* CONFIG_X86_64 */
...@@ -2197,7 +2170,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { ...@@ -2197,7 +2170,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
void __init xen_init_mmu_ops(void) void __init xen_init_mmu_ops(void)
{ {
x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
x86_init.paging.pagetable_init = xen_pagetable_init; x86_init.paging.pagetable_init = xen_pagetable_init;
pv_mmu_ops = xen_mmu_ops; pv_mmu_ops = xen_mmu_ops;
......
...@@ -231,7 +231,9 @@ int __ref xen_swiotlb_init(int verbose, bool early) ...@@ -231,7 +231,9 @@ int __ref xen_swiotlb_init(int verbose, bool early)
} }
start_dma_addr = xen_virt_to_bus(xen_io_tlb_start); start_dma_addr = xen_virt_to_bus(xen_io_tlb_start);
if (early) { if (early) {
swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs, verbose); if (swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs,
verbose))
panic("Cannot allocate SWIOTLB buffer");
rc = 0; rc = 0;
} else } else
rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs); rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs);
......
...@@ -99,6 +99,9 @@ void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat, ...@@ -99,6 +99,9 @@ void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
extern void *__alloc_bootmem_low(unsigned long size, extern void *__alloc_bootmem_low(unsigned long size,
unsigned long align, unsigned long align,
unsigned long goal); unsigned long goal);
void *__alloc_bootmem_low_nopanic(unsigned long size,
unsigned long align,
unsigned long goal);
extern void *__alloc_bootmem_low_node(pg_data_t *pgdat, extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
unsigned long size, unsigned long size,
unsigned long align, unsigned long align,
...@@ -132,6 +135,8 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat, ...@@ -132,6 +135,8 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
#define alloc_bootmem_low(x) \ #define alloc_bootmem_low(x) \
__alloc_bootmem_low(x, SMP_CACHE_BYTES, 0) __alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
#define alloc_bootmem_low_pages_nopanic(x) \
__alloc_bootmem_low_nopanic(x, PAGE_SIZE, 0)
#define alloc_bootmem_low_pages(x) \ #define alloc_bootmem_low_pages(x) \
__alloc_bootmem_low(x, PAGE_SIZE, 0) __alloc_bootmem_low(x, PAGE_SIZE, 0)
#define alloc_bootmem_low_pages_node(pgdat, x) \ #define alloc_bootmem_low_pages_node(pgdat, x) \
......
...@@ -191,6 +191,7 @@ extern struct kimage *kexec_crash_image; ...@@ -191,6 +191,7 @@ extern struct kimage *kexec_crash_image;
/* Location of a reserved region to hold the crash kernel. /* Location of a reserved region to hold the crash kernel.
*/ */
extern struct resource crashk_res; extern struct resource crashk_res;
extern struct resource crashk_low_res;
typedef u32 note_buf_t[KEXEC_NOTE_BYTES/4]; typedef u32 note_buf_t[KEXEC_NOTE_BYTES/4];
extern note_buf_t __percpu *crash_notes; extern note_buf_t __percpu *crash_notes;
extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
...@@ -199,6 +200,8 @@ extern size_t vmcoreinfo_max_size; ...@@ -199,6 +200,8 @@ extern size_t vmcoreinfo_max_size;
int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base); unsigned long long *crash_size, unsigned long long *crash_base);
int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
int crash_shrink_memory(unsigned long new_size); int crash_shrink_memory(unsigned long new_size);
size_t crash_get_memory_size(void); size_t crash_get_memory_size(void);
void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
......
...@@ -155,6 +155,7 @@ phys_addr_t memblock_alloc_base(phys_addr_t size, phys_addr_t align, ...@@ -155,6 +155,7 @@ phys_addr_t memblock_alloc_base(phys_addr_t size, phys_addr_t align,
phys_addr_t __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t __memblock_alloc_base(phys_addr_t size, phys_addr_t align,
phys_addr_t max_addr); phys_addr_t max_addr);
phys_addr_t memblock_phys_mem_size(void); phys_addr_t memblock_phys_mem_size(void);
phys_addr_t memblock_mem_size(unsigned long limit_pfn);
phys_addr_t memblock_start_of_DRAM(void); phys_addr_t memblock_start_of_DRAM(void);
phys_addr_t memblock_end_of_DRAM(void); phys_addr_t memblock_end_of_DRAM(void);
void memblock_enforce_memory_limit(phys_addr_t memory_limit); void memblock_enforce_memory_limit(phys_addr_t memory_limit);
......
...@@ -1386,7 +1386,6 @@ extern void __init mmap_init(void); ...@@ -1386,7 +1386,6 @@ extern void __init mmap_init(void);
extern void show_mem(unsigned int flags); extern void show_mem(unsigned int flags);
extern void si_meminfo(struct sysinfo * val); extern void si_meminfo(struct sysinfo * val);
extern void si_meminfo_node(struct sysinfo *val, int nid); extern void si_meminfo_node(struct sysinfo *val, int nid);
extern int after_bootmem;
extern __printf(3, 4) extern __printf(3, 4)
void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...); void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...);
......
...@@ -23,7 +23,7 @@ extern int swiotlb_force; ...@@ -23,7 +23,7 @@ extern int swiotlb_force;
#define IO_TLB_SHIFT 11 #define IO_TLB_SHIFT 11
extern void swiotlb_init(int verbose); extern void swiotlb_init(int verbose);
extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
extern unsigned long swiotlb_nr_tbl(void); extern unsigned long swiotlb_nr_tbl(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs); extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
......
...@@ -54,6 +54,12 @@ struct resource crashk_res = { ...@@ -54,6 +54,12 @@ struct resource crashk_res = {
.end = 0, .end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM .flags = IORESOURCE_BUSY | IORESOURCE_MEM
}; };
struct resource crashk_low_res = {
.name = "Crash kernel low",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
int kexec_should_crash(struct task_struct *p) int kexec_should_crash(struct task_struct *p)
{ {
...@@ -1369,10 +1375,11 @@ static int __init parse_crashkernel_simple(char *cmdline, ...@@ -1369,10 +1375,11 @@ static int __init parse_crashkernel_simple(char *cmdline,
* That function is the entry point for command line parsing and should be * That function is the entry point for command line parsing and should be
* called from the arch-specific code. * called from the arch-specific code.
*/ */
int __init parse_crashkernel(char *cmdline, static int __init __parse_crashkernel(char *cmdline,
unsigned long long system_ram, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_size,
unsigned long long *crash_base) unsigned long long *crash_base,
const char *name)
{ {
char *p = cmdline, *ck_cmdline = NULL; char *p = cmdline, *ck_cmdline = NULL;
char *first_colon, *first_space; char *first_colon, *first_space;
...@@ -1382,16 +1389,16 @@ int __init parse_crashkernel(char *cmdline, ...@@ -1382,16 +1389,16 @@ int __init parse_crashkernel(char *cmdline,
*crash_base = 0; *crash_base = 0;
/* find crashkernel and use the last one if there are more */ /* find crashkernel and use the last one if there are more */
p = strstr(p, "crashkernel="); p = strstr(p, name);
while (p) { while (p) {
ck_cmdline = p; ck_cmdline = p;
p = strstr(p+1, "crashkernel="); p = strstr(p+1, name);
} }
if (!ck_cmdline) if (!ck_cmdline)
return -EINVAL; return -EINVAL;
ck_cmdline += 12; /* strlen("crashkernel=") */ ck_cmdline += strlen(name);
/* /*
* if the commandline contains a ':', then that's the extended * if the commandline contains a ':', then that's the extended
...@@ -1409,6 +1416,23 @@ int __init parse_crashkernel(char *cmdline, ...@@ -1409,6 +1416,23 @@ int __init parse_crashkernel(char *cmdline,
return 0; return 0;
} }
int __init parse_crashkernel(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
"crashkernel=");
}
int __init parse_crashkernel_low(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
"crashkernel_low=");
}
static void update_vmcoreinfo_note(void) static void update_vmcoreinfo_note(void)
{ {
......
...@@ -122,11 +122,18 @@ static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev, ...@@ -122,11 +122,18 @@ static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
return phys_to_dma(hwdev, virt_to_phys(address)); return phys_to_dma(hwdev, virt_to_phys(address));
} }
static bool no_iotlb_memory;
void swiotlb_print_info(void) void swiotlb_print_info(void)
{ {
unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT; unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
unsigned char *vstart, *vend; unsigned char *vstart, *vend;
if (no_iotlb_memory) {
pr_warn("software IO TLB: No low mem\n");
return;
}
vstart = phys_to_virt(io_tlb_start); vstart = phys_to_virt(io_tlb_start);
vend = phys_to_virt(io_tlb_end); vend = phys_to_virt(io_tlb_end);
...@@ -136,7 +143,7 @@ void swiotlb_print_info(void) ...@@ -136,7 +143,7 @@ void swiotlb_print_info(void)
bytes >> 20, vstart, vend - 1); bytes >> 20, vstart, vend - 1);
} }
void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
{ {
void *v_overflow_buffer; void *v_overflow_buffer;
unsigned long i, bytes; unsigned long i, bytes;
...@@ -150,9 +157,10 @@ void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) ...@@ -150,9 +157,10 @@ void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
/* /*
* Get the overflow emergency buffer * Get the overflow emergency buffer
*/ */
v_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow)); v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
PAGE_ALIGN(io_tlb_overflow));
if (!v_overflow_buffer) if (!v_overflow_buffer)
panic("Cannot allocate SWIOTLB overflow buffer!\n"); return -ENOMEM;
io_tlb_overflow_buffer = __pa(v_overflow_buffer); io_tlb_overflow_buffer = __pa(v_overflow_buffer);
...@@ -169,15 +177,19 @@ void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) ...@@ -169,15 +177,19 @@ void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
if (verbose) if (verbose)
swiotlb_print_info(); swiotlb_print_info();
return 0;
} }
/* /*
* Statically reserve bounce buffer space and initialize bounce buffer data * Statically reserve bounce buffer space and initialize bounce buffer data
* structures for the software IO TLB used to implement the DMA API. * structures for the software IO TLB used to implement the DMA API.
*/ */
static void __init void __init
swiotlb_init_with_default_size(size_t default_size, int verbose) swiotlb_init(int verbose)
{ {
/* default to 64MB */
size_t default_size = 64UL<<20;
unsigned char *vstart; unsigned char *vstart;
unsigned long bytes; unsigned long bytes;
...@@ -188,20 +200,16 @@ swiotlb_init_with_default_size(size_t default_size, int verbose) ...@@ -188,20 +200,16 @@ swiotlb_init_with_default_size(size_t default_size, int verbose)
bytes = io_tlb_nslabs << IO_TLB_SHIFT; bytes = io_tlb_nslabs << IO_TLB_SHIFT;
/* /* Get IO TLB memory from the low pages */
* Get IO TLB memory from the low pages vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
*/ if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
vstart = alloc_bootmem_low_pages(PAGE_ALIGN(bytes)); return;
if (!vstart)
panic("Cannot allocate SWIOTLB buffer");
swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose);
}
void __init if (io_tlb_start)
swiotlb_init(int verbose) free_bootmem(io_tlb_start,
{ PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
swiotlb_init_with_default_size(64 * (1<<20), verbose); /* default to 64MB */ pr_warn("Cannot allocate SWIOTLB buffer");
no_iotlb_memory = true;
} }
/* /*
...@@ -405,6 +413,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, ...@@ -405,6 +413,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
unsigned long offset_slots; unsigned long offset_slots;
unsigned long max_slots; unsigned long max_slots;
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
mask = dma_get_seg_boundary(hwdev); mask = dma_get_seg_boundary(hwdev);
tbl_dma_addr &= mask; tbl_dma_addr &= mask;
......
...@@ -833,6 +833,14 @@ void * __init __alloc_bootmem_low(unsigned long size, unsigned long align, ...@@ -833,6 +833,14 @@ void * __init __alloc_bootmem_low(unsigned long size, unsigned long align,
return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT); return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
} }
void * __init __alloc_bootmem_low_nopanic(unsigned long size,
unsigned long align,
unsigned long goal)
{
return ___alloc_bootmem_nopanic(size, align, goal,
ARCH_LOW_ADDRESS_LIMIT);
}
/** /**
* __alloc_bootmem_low_node - allocate low boot memory from a specific node * __alloc_bootmem_low_node - allocate low boot memory from a specific node
* @pgdat: node to allocate from * @pgdat: node to allocate from
......
...@@ -828,6 +828,23 @@ phys_addr_t __init memblock_phys_mem_size(void) ...@@ -828,6 +828,23 @@ phys_addr_t __init memblock_phys_mem_size(void)
return memblock.memory.total_size; return memblock.memory.total_size;
} }
phys_addr_t __init memblock_mem_size(unsigned long limit_pfn)
{
unsigned long pages = 0;
struct memblock_region *r;
unsigned long start_pfn, end_pfn;
for_each_memblock(memory, r) {
start_pfn = memblock_region_memory_base_pfn(r);
end_pfn = memblock_region_memory_end_pfn(r);
start_pfn = min_t(unsigned long, start_pfn, limit_pfn);
end_pfn = min_t(unsigned long, end_pfn, limit_pfn);
pages += end_pfn - start_pfn;
}
return (phys_addr_t)pages << PAGE_SHIFT;
}
/* lowest address */ /* lowest address */
phys_addr_t __init_memblock memblock_start_of_DRAM(void) phys_addr_t __init_memblock memblock_start_of_DRAM(void)
{ {
......
...@@ -153,21 +153,6 @@ static void reset_node_lowmem_managed_pages(pg_data_t *pgdat) ...@@ -153,21 +153,6 @@ static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
z->managed_pages = 0; z->managed_pages = 0;
} }
/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
* @pgdat: node to be released
*
* Returns the number of pages actually released.
*/
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
reset_node_lowmem_managed_pages(pgdat);
/* free_low_memory_core_early(MAX_NUMNODES) will be called later */
return 0;
}
/** /**
* free_all_bootmem - release free pages to the buddy allocator * free_all_bootmem - release free pages to the buddy allocator
* *
...@@ -406,6 +391,14 @@ void * __init __alloc_bootmem_low(unsigned long size, unsigned long align, ...@@ -406,6 +391,14 @@ void * __init __alloc_bootmem_low(unsigned long size, unsigned long align,
return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT); return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
} }
void * __init __alloc_bootmem_low_nopanic(unsigned long size,
unsigned long align,
unsigned long goal)
{
return ___alloc_bootmem_nopanic(size, align, goal,
ARCH_LOW_ADDRESS_LIMIT);
}
/** /**
* __alloc_bootmem_low_node - allocate low boot memory from a specific node * __alloc_bootmem_low_node - allocate low boot memory from a specific node
* @pgdat: node to allocate from * @pgdat: node to allocate from
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment