- 06 Jan, 2007 40 commits
-
-
Avi Kivity authored
Since we write protect shadowed guest page tables, there is no need to trap page invalidations (the guest will always change the mapping before issuing the invlpg instruction). Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
When beginning to process a page fault, make sure we have enough shadow pages available to service the fault. If not, free some pages. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
... and so must not free it unconditionally. Move the freeing to kvm_mmu_zap_page(). Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
When removing a page table, we must maintain the parent_pte field all child shadow page tables. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
A page table may have been recycled into a regular page, and so any instruction can be executed on it. Unprotect the page and let the cpu do its thing. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
Iterate over all shadow pages which correspond to a the given guest page table and remove the mappings. A subsequent page fault will reestablish the new mapping. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
As the mmu write protects guest page table, we emulate those writes. Since they are not mmio, there is no need to go to userspace to perform them. So, perform the writes in the kernel if possible, and notify the mmu about them so it can take the approriate action. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
This fixes a problem where set_pte_common() looked for shadowed pages based on the page directory gfn (a huge page) instead of the actual gfn being mapped. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
When we cache a guest page table into a shadow page table, we need to prevent further access to that page by the guest, as that would render the cache incoherent. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
Define a hashtable for caching shadow page tables. Look up the cache on context switch (cr3 change) or during page faults. The key to the cache is a combination of - the guest page table frame number - the number of paging levels in the guest * we can cache real mode, 32-bit mode, pae, and long mode page tables simultaneously. this is useful for smp bootup. - the guest page table table * some kernels use a page as both a page table and a page directory. this allows multiple shadow pages to exist for that page, one per level - the "quadrant" * 32-bit mode page tables span 4MB, whereas a shadow page table spans 2MB. similarly, a 32-bit page directory spans 4GB, while a shadow page directory spans 1GB. the quadrant allows caching up to 4 shadow page tables for one guest page in one level. - a "metaphysical" bit * for real mode, and for pse pages, there is no guest page table, so set the bit to avoid write protecting the page. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
This allows further manipulation on the shadow page table. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
This lets us not write protect a partial page, and is anyway what a real processor does. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
Since we're not going to cache the pae-mode shadow root pages, allocate a single pae shadow that will hold the four lower-level pages, which will act as roots. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
It is never necessary to fetch a guest entry from an intermediate page table level (except for large pages), so avoid some confusion by always descending into the lowest possible level. Rename init_walker() to walk_addr() as it is no longer restricted to initialization. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
In pae mode, a load of cr3 loads the four third-level page table entries in addition to cr3 itself. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
Saving the table gfns removes the need to walk the guest and host page tables in lockstep. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
Keep in each host page frame's page->private a pointer to the shadow pte which maps it. If there are multiple shadow ptes mapping the page, set bit 0 of page->private, and use the rest as a pointer to a linked list of all such mappings. Reverse mappings are needed because we when we cache shadow page tables, we must protect the guest page tables from being modified by the guest, as that would invalidate the cached ptes. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Avi Kivity authored
Hardware virtualization implementations allow the guests to freely change some of the bits in cr0 and cr4, but trap when changing the other bits. This is useful to avoid excessive exits due to changing, for example, the ts flag. It also means the kvm's copy of cr0 and cr4 may be stale with respect to these bits. most of the time this doesn't matter as these bits are not very interesting. Other times, however (for example when returning cr0 to userspace), they are, so get the fresh contents of these bits from the guest by means of a new arch operation. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
David Brownell authored
Bugfixes: - Handle RTCs which are configured to use 12-hour mode. - Never report bogus/un-initialized times. - Displaying "raw trim" requires not masking it first! - Fix the sysfs and procfs display of crystal and trim data. Features: - Handle other RTCs in this family, notably rv5c386/rv5c387. - Declare the other registers. - Provide alarm get/set functionality. - Handle AIE and UIE; but no IRQ handling yet. Cleanup: - Shrink object by not including needless sysfs or procfs support - We don't need no steenkin' forward declarations. (Except one.) Until the I2C framework merges "new style" driver support, matching the driver model better, using rv5c chips or alarm IRQs requires a separate board-specific patch. (And an IRQ handler, handing off labor through a work_struct...) This uses the "method 3" register reads, but notes that it's done to work around an evident i2c adapter driver bug. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
pdflush hit the BUG_ON(!PageSlab(page)) in kmem_freepages called from fallback_alloc: cache_grow already freed those pages when alloc_slabmgmt failed. But it wouldn't have freed them if __GFP_NO_GROW, so make sure fallback_alloc doesn't waste its time on that case. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka J Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Eric Sandeen authored
CVE-2006-5753 is for a case where an inode can be marked bad, switching the ops to bad_inode_ops, which are all connected as: static int return_EIO(void) { return -EIO; } #define EIO_ERROR ((void *) (return_EIO)) static struct inode_operations bad_inode_ops = { .create = bad_inode_create ...etc... The problem here is that the void cast causes return types to not be promoted, and for ops such as listxattr which expect more than 32 bits of return value, the 32-bit -EIO is interpreted as a large positive 64-bit number, i.e. 0x00000000fffffffa instead of 0xfffffffa. This goes particularly badly when the return value is taken as a number of bytes to copy into, say, a user's buffer for example... I originally had coded up the fix by creating a return_EIO_<TYPE> macro for each return type, like this: static int return_EIO_int(void) { return -EIO; } #define EIO_ERROR_INT ((void *) (return_EIO_int)) static struct inode_operations bad_inode_ops = { .create = EIO_ERROR_INT, ...etc... but Al felt that it was probably better to create an EIO-returner for each actual op signature. Since so few ops share a signature, I just went ahead & created an EIO function for each individual file & inode op that returns a value. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Make this: drivers/char/ip2/ip2main.c: In function 'ip2_loadmain': drivers/char/ip2/ip2main.c:654: warning: control may reach end of non-void function 'iiSetAddress' being inlined drivers/char/ip2/ip2main.c:808: warning: control may reach end of non-void function 'iiInitialize' being inlined go away. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Vivek Goyal authored
o Currently synchronize_tsc_ap() is of type __init. It is called by smp_callin() which is of type __cpuinit. So synchronize_tsc_ap() should be of type __cpuinit. o Modpost generates warnings for i386 if CONFIG_RELOCATABLE=y and CONFIG_HOTPLUG_CPU=y WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between 'start_secondary' (at offset 0xc01164dc) and 'initialize_secondary' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between 'start_secondary' (at offset 0xc01164e8) and 'initialize_secondary' o tsc is of type __initdata. It should be of type __cpuinitdata. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Vivek Goyal authored
o MODPOST generates warning for i386 if kernel is compiled with CONFIG_RELOCATABLE=y WARNING: vmlinux - Section mismatch: reference to .init.data: from .data between 'this_cpu' (at offset 0xc05194d0) and 'cpuinfo_op' o this_cpu pointer should be of type __cpuinitdata. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Vivek Goyal authored
o MODPOST generates warning for i386 if kernel is compiled with CONFIG_RELOCATABLE=y WARNING: vmlinux - Section mismatch: reference to .init.text:startup_32_smp from .data between 'trampoline_data' (at offset 0xc0519cf8) and 'boot_gdt' o trampoline code/data can go into init section is CPU hotplug is not enabled. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Paul Mundt authored
At the moment the inode/dentry cache hash tables (common by way of alloc_large_system_hash()) are incorrectly sized by their respective detection logic when we attempt to use large base pages on systems with little memory. This results in odd behaviour when using a 64kB PAGE_SIZE, such as: Dentry cache hash table entries: 8192 (order: -1, 32768 bytes) Inode-cache hash table entries: 4096 (order: -2, 16384 bytes) The mount cache hash table is seemingly the only one that gets this right by directly taking PAGE_SIZE in to account. The following patch attempts to catch the bogus values and round it up to at least 0-order. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Vivek Goyal authored
o Relocatable bzImage support had got rid of CONFIG_PHYSICAL_START option thinking that now this option is not required as people can build a second kernel as relocatable and load it anywhere. So need of compiling the kernel for a custom address was gone. But Magnus uses vmlinux images for second kernel in Xen environment and he wants to continue to use it. o Restoring the CONFIG_PHYSICAL_START option for the time being. I think down the line we can get rid of it. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
Fix sched profiling typo, introduced by the sleep profiling patch. This bug caused profile=sched to not work. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Rafael J. Wysocki authored
In the kernels later than 2.6.19 there is a regression that makes swsusp fail if the resume device is not explicitly specified. It can be fixed by adding an additional parameter to mm/swapfile.c:swap_type_of() allowing us to pass the (struct block_device *) corresponding to the first available swap back to the caller. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
James Bursa authored
Fix filenames on adfs discs being terminated at the first character greater than 128 (adfs filenames are Latin 1). I saw this problem when using a loopback adfs image on a 2.6.17-rc5 x86_64 machine, and the patch fixed it there. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Alan authored
When the old IDE layer calls into methods in the driver during error handling it is essentially random whether ide_lock is already held. This causes a deadlock in the atiixp driver which also uses ide_lock internally for locking. Switch to a private lock instead. [akpm@osl.org: cleanup] Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Christoph Hellwig authored
Fix http://bugzilla.kernel.org/show_bug.cgi?id=7667 This is because the packet driver tries to send down read/write BLOCK_PC commands that don't use a bio and do not use sg lists. The right fix is to replace all the packet_command stuff in the packet driver by scsi_execute() which needs to be lifted from scsi code to the block code for that. Fix the bug for now. It's not the full way to a generic execute block pc infrastcuture but fixes the bug for the time being. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
David Brownell authored
The at91rm9200 RTC driver needs some assistance to build, because of recent header file rearrangement. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Alessandro Zummo <alessandro.zummo@towertech.it> Cc: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Dor Laor authored
The current interrupt injection mechanism might delay an interrupt under the following circumstances: - if injection fails because the guest is not interruptible (rflags.IF clear, or after a 'mov ss' or 'sti' instruction). Userspace can check rflags, but the other cases or not testable under the current API. - if injection fails because of a fault during delivery. This probably never happens under normal guests. - if injection fails due to a physical interrupt causing a vmexit so that it can be handled by the host. In all cases the guest proceeds without processing the interrupt, reducing the interactive feel and interrupt throughput of the guest. This patch fixes the situation by allowing userspace to request an exit when the 'interrupt window' opens, so that it can re-inject the interrupt at the right time. Guest interactivity is very visibly improved. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Yoshimi Ichiyanagi authored
If we load the wrong arch module, it leaves behind kvm_arch_ops set, which prevents loading of the correct arch module later. Fix be not setting kvm_arch_ops until we're sure it's good. Signed-off-by: Yoshimi Ichiyanagi <ichiyanagi.yoshimi@lab.ntt.co.jp> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
KVM does kmalloc() in an atomic section while having preemption disabled via vcpu_load(). Fix this by moving the ->*_msr setup from the vcpu_setup method to the vcpu_create method. (This is also a small speedup for setting up a vcpu, which can in theory be more frequent than the vcpu_create method). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Bartlomiej Zolnierkiewicz authored
This patch fixes 2.6.15 regression, is straightforward and tested. Cable detection got broken probably while converting the driver to support multiple controllers. Cable detection is done by examining how BIOS configured the attached devices. The current code is broken in that it examines the status *after* modifying Clk66 configuration ending up detecting 40c cables as 80c. This patch fixes it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ard van Breemen authored
The pci_find_subsys gets called very early by obsolete ide setup parameters. This is a bogus call since pci is not initialized yet, so the list is empty. But in the mean time, interrupts get enabled by down_read. This can result in a kernel panic when the irq controller gets initialized. This patch checks if the device list is empty before taking the semaphore, and hence will not enable irq's. Furthermore it will inform that it is called while pci_devices is empty as a reminder that the ide code needs to be fixed. The pci_get_subsys can get called in the same manner, and as such is patched in the same manner. [akpm@osdl.org: cleanups] Signed-off-by: Ard van Breemen <ard@telegraafnet.nl> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-