Commits · 9e9a5fcb04e3af077d1be32710298b852210d93f · nexedi / linux

22 Oct, 2010 28 commits

xen: use host E820 map for dom0 · 9e9a5fcb

Ian Campbell authored Sep 02, 2010

When running as initial domain, get the real physical memory map from
xen using the XENMEM_machine_memory_map hypercall and use it to setup
the e820 regions.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

9e9a5fcb

xen: correctly rebuild mfn list list after migration. · 375b2a9a

Ian Campbell authored Oct 21, 2010

Otherwise the second migration attempt fails because the mfn_list_list
still refers to all the old mfns.

We need to update the entires in both p2m_top_mfn and the mid_mfn
pages which p2m_top_mfn refers to.

In order to do this we need to keep track of the virtual addresses
mapping the p2m_mid_mfn pages since we cannot rely on
mfn_to_virt(p2m_top_mfn[idx]) since p2m_top_mfn[idx] will still
contain the old MFN after a migration, which may now belong to another
domain and hence have a different mapping in the m2p.

Therefore add and maintain a third top level page, p2m_top_mfn_p[],
which tracks the virtual addresses of the mfns contained in
p2m_top_mfn[].

We also need to update the content of the p2m_mid_missing_mfn page on
resume to refer to the page's new mfn.

p2m_missing does not need updating since the migration process takes
care of the leaf p2m pages for us.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

375b2a9a

xen: improvements to VIRQ_DEBUG output · cb52e6d9

Ian Campbell authored Oct 15, 2010

* Fix bitmask formatting on 64 bit by specifying correct field widths.

* Output both global and local masked and pending information.

* Indicate in list of pending interrupts whether they are pending in
  the L2, masked globally and/or masked locally.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

cb52e6d9

xen: set up IRQ before binding virq to evtchn · a52521f1

Jeremy Fitzhardinge authored Sep 22, 2010

Make sure the irq is set up before binding a virq event channel to it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

a52521f1

xen: ensure that all event channels start off bound to VCPU 0 · b0097ade

Ian Campbell authored Oct 08, 2010

All event channels startbound to VCPU 0 so ensure that cpu_evtchn_mask
is initialised to reflect this. Otherwise there is a race after registering an
event channel but before the affinity is explicitly set where the event channel
can be delivered. If this happens then the event channel remains pending in the
L1 (evtchn_pending) array but is cleared in L2 (evtchn_pending_sel), this means
the event channel cannot be reraised until another event channel happens to
trigger the same L2 entry on that VCPU.

sizeof(cpu_evtchn_mask(0))==sizeof(unsigned long*) which is not correct, and
causes only the first 32 or 64 event channels (depending on architecture) to be
initially bound to VCPU0. Use sizeof(struct cpu_evtchn_s) instead.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: stable@kernel.org

b0097ade

xen/hvc: only notify if we actually sent something · 403a85ff

Jeremy Fitzhardinge authored Oct 14, 2010

Don't spam dom0/xenconsoled with events unless we've actually added
something to the ring.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

403a85ff

xen: don't add extra_pages for RAM after mem_end · 3654581e

Jeremy Fitzhardinge authored Sep 29, 2010

If an E820 region is entirely beyond mem_end, don't attempt to truncate
it and add the truncated pages to extra_pages, as they will be negative.

Also, make sure the extra memory region starts after all BIOS provided
E820 regions (and in the case of RAM regions, post-clipping).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

3654581e

xen: add support for PAT · 41f2e477

Jeremy Fitzhardinge authored Mar 30, 2010

Convert Linux PAT entries into Xen ones when constructing ptes.  Linux
doesn't use _PAGE_PAT for ptes, so the only difference in the first 4
entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default)
use it for WT.

xen_pte_val does the inverse conversion.

We hard-code assumptions about Linux's current PAT layout, but a
warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems.
If necessary we could go to a more general table-based conversion between
Linux and Xen PAT entries.

hugetlbfs poses a problem at the moment, the x86 architecture uses the
same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending
on which pagetable level we're using.  At the moment this should be OK
so long as nobody tries to do a pte_val on a hugetlbfs pte.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

41f2e477

xen: make sure xen_max_p2m_pfn is up to date · 2f7acb20

Jeremy Fitzhardinge authored Sep 15, 2010

Keep xen_max_p2m_pfn up to date with the end of the extra memory
we're adding.  It is possible that it will be too high since memory
may be truncated by a "mem=" option on the kernel command line, but
that won't matter.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

2f7acb20

xen: limit extra memory to a certain ratio of base · 698bb8d1

Jeremy Fitzhardinge authored Sep 14, 2010

If extra memory is very much larger than the base memory size
then all of the base memory can be filled with structures reserved to
describe the extra memory, leaving no space for anything else.

Even at the maximum ratio there will be little space for anything else,
but this change is intended to at least allow the system to boot rather
than crash mysteriously.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

698bb8d1

xen: add extra pages for E820 RAM regions, even if beyond mem_end · b5b43ced

Jeremy Fitzhardinge authored Sep 02, 2010

If an entire E820 RAM region is beyond mem_end, still add its
pages to the extra area so that space can be used by the kernel.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

b5b43ced

xen: make sure xen_extra_mem_start is beyond all non-RAM e820 · 36bc251b

Jeremy Fitzhardinge authored Sep 02, 2010

If Xen gives us non-RAM E820 entries (dom0 only, typically), then
make sure the extra RAM region is beyond them.  It's OK for
the extra space to grow into E820 regions, however.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

36bc251b

xen: implement "extra" memory to reserve space for pages not present at boot · 42ee1471

Jeremy Fitzhardinge authored Aug 30, 2010

When using the e820 map to get the initial pseudo-physical address space,
look for either Xen-provided memory which doesn't lie within an E820
region, or an E820 RAM region which extends beyond the Xen-provided
memory range.

Count these pages, and add them to a new "extra memory" range.  This range
has an E820 RAM range to describe it - so the kernel will allocate page
structures for it - but it is also marked reserved so that the kernel
will not attempt to use it.

The balloon driver can then add this range as a set of currently
ballooned-out pages, which can be used to extend the domain beyond its
original size.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

42ee1471

xen: Use host-provided E820 map · 35ae11fd

Ian Campbell authored Feb 06, 2009

Rather than simply using a flat memory map from Xen, use its provided
E820 map. This allows the domain builder to tell the domain to reserve
space for more pages than those initially provided at domain-build time.

It also allows the host to specify holes in the address space (for
PCI-passthrough, for example).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

35ae11fd

xen: don't map missing memory · cfd8951e

Jeremy Fitzhardinge authored Aug 31, 2010

When setting up a pte for a missing pfn (no matching mfn), just create
an empty pte rather than a junk mapping.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

cfd8951e

xen: defer building p2m mfn structures until kernel is mapped · 33a84750

Jeremy Fitzhardinge authored Aug 27, 2010

When building mfn parts of p2m structure, we rely on being able to
use mfn_to_virt, which in turn requires kernel to be mapped into
the linear area (which is distinct from the kernel image mapping
on 64-bit).  Defer calling xen_build_mfn_list_list() until after
xen_setup_kernel_pagetable();
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

33a84750

xen: add return value to set_phys_to_machine() · c3798062

Jeremy Fitzhardinge authored Aug 27, 2010

set_phys_to_machine() can return false on failure, which means a memory
allocation failure for the p2m structure. It can only fail if setting
the mfn for a pfn in previously unused address space. It is guaranteed
to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating
the mfn for an existing pfn.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

c3798062

xen: convert p2m to a 3 level tree · 58e05027

Jeremy Fitzhardinge authored Aug 27, 2010

Make the p2m structure a 3 level tree which covers the full possible
physical space.

The p2m structure contains mappings from the domain's pfns to system-wide
mfns.  The structure has 3 levels and two roots.  The first root is for
the domain's own use, and is linked with virtual addresses.  The second
is all mfn references, and is used by Xen on save/restore to allow it to
update the p2m mapping for the domain.

At boot, the domain builder provides a simple flat p2m array for all the
initially present pages.  We construct the two levels above that using
the early_brk allocator.  After early boot time, set_phys_to_machine()
will allocate any missing levels using the normal kernel allocator
(at GFP_KERNEL, so it must be called in a normal blocking context).

Because the early_brk() API requires us to pre-reserve the maximum amount
of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY
config option, but its only negative side-effect is to increase the
kernel's apparent bss size.  However, since all unused brk memory is
returned to the heap, there's no real downside to making it large.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

58e05027

xen: make install_p2mtop_page() static · bbbf61ef
Jeremy Fitzhardinge authored Aug 26, 2010
```
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
bbbf61ef
xen: set the actual extent of the mfn_list_list · 1f2d9dd3
Jeremy Fitzhardinge authored Aug 26, 2010
```
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
1f2d9dd3
xen: set shared_info->arch.max_pfn to max_p2m_pfn · b7eb4ad3
Jeremy Fitzhardinge authored Aug 26, 2010
```
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
b7eb4ad3

xen/events: change to using fasteoi · 3588fe2e

Jeremy Fitzhardinge authored Aug 27, 2010

Change event delivery to:
 - mask+clear event in the upcall function
 - use handle_fasteoi_irq as the handler
 - unmask in the eoi function (and handle migration)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

3588fe2e

xen: remove noise about registering vcpu info · 1e17fc7e
Jeremy Fitzhardinge authored Sep 03, 2010
```
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
1e17fc7e
xen: allocate level1_ident_pgt · 764f0138
Jeremy Fitzhardinge authored Aug 26, 2010
```
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
764f0138
xen: use early_brk for level2_kernel_pgt · f0991802
Jeremy Fitzhardinge authored Aug 26, 2010
```
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
f0991802

xen: allocate p2m size based on actual max size · a2e87529

Jeremy Fitzhardinge authored Aug 26, 2010

Allocate p2m tables based on the actual runtime maximum pfn rather than
the static config-time limit.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

a2e87529

xen: dynamically allocate p2m space · a171ce6e

Jeremy Fitzhardinge authored Aug 26, 2010

Use early brk mechanism to allocate p2m tables, to save memory when
booting non-Xen.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

a171ce6e

x86: add RESERVE_BRK_ARRAY() helper · 5e941c09

Jeremy Fitzhardinge authored Aug 26, 2010

Useful when converting static arrays into boottime brk allocated objects.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

5e941c09

14 Oct, 2010 4 commits

Linux 2.6.36-rc8 · cd07202c
Linus Torvalds authored Oct 14, 2010

cd07202c

Un-inline the core-dump helper functions · 3aa0ce82

Linus Torvalds authored Oct 14, 2010

Tony Luck reports that the addition of the access_ok() check in commit
0eead9ab ("Don't dump task struct in a.out core-dumps") broke the
ia64 compile due to missing the necessary header file includes.

Rather than add yet another include (<asm/unistd.h>) to make everything
happy, just uninline the silly core dump helper functions and move the
bodies to fs/exec.c where they make a lot more sense.

dump_seek() in particular was too big to be an inline function anyway,
and none of them are in any way performance-critical.  And we really
don't need to mess up our include file headers more than they already
are.
Reported-and-tested-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3aa0ce82

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · ae42d8d4

Linus Torvalds authored Oct 14, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  ehea: Fix a checksum issue on the receive path
  net: allow FEC driver to use fixed PHY support
  tg3: restore rx_dropped accounting
  b44: fix carrier detection on bind
  net: clear heap allocations for privileged ethtool actions
  NET: wimax, fix use after free
  ATM: iphase, remove sleep-inside-atomic
  ATM: mpc, fix use after free
  ATM: solos-pci, remove use after free
  net/fec: carrier off initially to avoid root mount failure
  r8169: use device model DMA API
  r8169: allocate with GFP_KERNEL flag when able to sleep

ae42d8d4

Don't dump task struct in a.out core-dumps · 0eead9ab

Linus Torvalds authored Oct 14, 2010

akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code).  Just remove it.

Also do the access_ok() check on dump_write().  It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...

[ I suspect that we should possibly do "vfs_write()" instead of
  calling ->write directly.  That also does the whole fsnotify and write
  statistics thing, which may or may not be a good idea. ]

And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)
Reported-by: akiphie <akiphie@lavabit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0eead9ab

13 Oct, 2010 8 commits

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx · 53eeb64e
Linus Torvalds authored Oct 13, 2010
```
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
  ioat2: fix performance regression
```
53eeb64e
Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux · 8c35bf36
Linus Torvalds authored Oct 13, 2010
```
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink
```
8c35bf36

Merge branch 'perf-fixes-for-linus' of... · fec896e2

Linus Torvalds authored Oct 13, 2010

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ring-buffer: Fix typo of time extends per page
  perf, MIPS: Support cross compiling of tools/perf for MIPS
  perf: Fix incorrect copy_from_user() usage

fec896e2

Merge master.kernel.org:/home/rmk/linux-2.6-arm · d94bc4fc

Linus Torvalds authored Oct 13, 2010

* master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: relax ioremap prohibition (309caa9c) for -final and -stable
  ARM: 6440/1: ep93xx: DMA: fix channel_disable
  cpuimx27: fix i2c bus selection
  cpuimx27: fix compile when ULPI is selected
  ARM: 6435/1: Fix HWCAP_TLS flag for ARM11MPCore/Cortex-A9
  ARM: 6436/1: AT91: Fix power-saving in idle-mode on 926T processors
  ARM: fix section mismatch warnings in Versatile Express
  ARM: 6412/1: kprobes-decode: add support for MOVW instruction
  ARM: 6419/1: mmu: Fix MT_MEMORY and MT_MEMORY_NONCACHED pte flags
  ARM: 6416/1: errata: faulty hazard checking in the Store Buffer may lead to data corruption

d94bc4fc

Merge branch 'omap-fixes-for-linus' of... · 70813196

Linus Torvalds authored Oct 13, 2010

Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6

* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  omap: iommu-load cam register before flushing the entry

70813196

Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · a56f31a0

Linus Torvalds authored Oct 13, 2010

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: Silent spurious error message
  drm/radeon/kms: fix bad cast/shift in evergreen.c
  drm/radeon/kms: make TV/DFP table info less verbose
  drm/radeon/kms: leave certain CP int bits enabled
  drm/radeon/kms: avoid corner case issue with unmappable vram V2

a56f31a0

Merge branch 'x86-fixes-for-linus' of... · 509d4486

Linus Torvalds authored Oct 13, 2010

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: For each node, register the memory blocks actually used
  x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order
  x86, mce, therm_throt.c: Fix missing curly braces in error handling logic

509d4486

ioat2: fix performance regression · c50a898f

Dan Williams authored Oct 13, 2010

Commit 07934481 "DMAENGINE: generic channel status v2" changed the interface for
how dma channel progress is retrieved.  It inadvertently exported an internal
helper function ioat_tx_status() instead of ioat_dma_tx_status().  The latter
polls the hardware to get the latest completion state, while the helper just
evaluates the current state without touching hardware.  The effect is that we
end up waiting for completion timeouts or descriptor allocation errors before
the completion state is updated.

iperf (before fix):
[SUM]  0.0-41.3 sec   364 MBytes  73.9 Mbits/sec

iperf (after fix):
[SUM]  0.0- 4.5 sec   499 MBytes   940 Mbits/sec

This is a regression starting with 2.6.35.

Cc: <stable@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Reported-by: Richard Scobie <richard@sauce.co.nz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

c50a898f