Commits · 9f155a9b3d5a5444bcc5e049ec2547bb5107150e · Kirill Smelkov / linux

12 Jun, 2009 40 commits

lguest: allow any process to send interrupts · 9f155a9b

Rusty Russell authored Jun 12, 2009

We currently only allow the Launcher process to send interrupts, but it
as we already send interrupts from the hrtimer, it's a simple matter of
extracting that code into a common set_interrupt routine.

As we switch to a thread per virtqueue, this avoids a bottleneck through the
main Launcher process.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

9f155a9b

lguest: PAE fixes · 92b4d8df

Rusty Russell authored Jun 12, 2009

1) j wasn't initialized in setup_pagetables, so they weren't set up for me
   causing immediate guest crashes.

2) gpte_addr should not re-read the pmd from the Guest.  Especially
   not BUG_ON() based on the value.  If we ever supported SMP guests,
   they could trigger that.  And the Launcher could also trigger it
   (tho currently root-only).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

92b4d8df

lguest: PAE support · acdd0b62

Matias Zabaljauregui authored Jun 12, 2009

This version requires that host and guest have the same PAE status.
NX cap is not offered to the guest, yet.
Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

acdd0b62

lguest: Add support for kvm_hypercall4() · cefcad17

Matias Zabaljauregui authored Jun 12, 2009

Add support for kvm_hypercall4(); PAE wants it.

Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

cefcad17

lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD · ebe0ba84

Matias Zabaljauregui authored May 30, 2009

replace LHCALL_SET_PMD with LHCALL_SET_PGD hypercall name
(That's really what it is, and the confusion gets worse with PAE support)
Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>

ebe0ba84

lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated · 90603d15

Matias Zabaljauregui authored Jun 12, 2009

Some cleanups and replace direct assignment with native_set_* macros which properly handle 64-bit entries when PAE is activated
Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

90603d15

lguest: map switcher with executable page table entries · ed1dc778

Matias Zabaljauregui authored May 30, 2009

Map switcher with executable page table entries.
(This bug didn't matter before PAE and hence NX support -- RR)
Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

ed1dc778

lguest: fix writev returning short on console output · 7b5c806c

Rusty Russell authored Jun 12, 2009

I've never seen it here, but I can't find anywhere that says writev
will write everything.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

7b5c806c

lguest: clean up length-used value in example launcher · e606490c

Rusty Russell authored Jun 12, 2009

The "len" field in the used ring for virtio indicates the number of
bytes *written* to the buffer.  This means the guest doesn't have to
zero the buffers in advance as it always knows the used length.

Erroneously, the console and network example code puts the length
*read* into that field.  The guest ignores it, but it's wrong.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

e606490c

lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition. · f086122b

Matias Zabaljauregui authored Jun 12, 2009

If GDT_ENTRIES were every > 256, this could become a problem.

Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

f086122b

lguest: beyond ARRAY_SIZE of cpu->arch.gdt · 81b79b01

Roel Kluin authored May 20, 2009

Do not go beyond ARRAY_SIZE of cpu->arch.gdt
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

81b79b01

lguest: clean up example launcher compile flags. · 2644f17d

Rusty Russell authored Jun 12, 2009

18 months ago 5bbf89fc changed to loading
bzImages directly, and no longer manually ungzipping them, so we no longer
need libz.

Also, -m32 is useful for those on 64-bit platforms (and harmless on
32-bit).
Reported-by: Ron Minnich <rminnich@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2644f17d

lguest: optimize by coding restore_flags and irq_enable in assembler. · 61f4bc83

Rusty Russell authored Jun 12, 2009

The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.

But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers.  In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.

The results are in the noise, but since it's about the same amount of
code, it's worth applying.

1GB Guest->Host: input(suppressed),output(suppressed)
Before:
	Seconds: 0:16.53
	Packets: 377268,753673
	Interrupts: 22461,24297
	Notifications: 1(5245),21303(732370)
	Net IRQs triggered: 377023(245),42578(711095)

After:
	Seconds: 0:16.48
	Packets: 377289,753673
	Interrupts: 22281,24465
	Notifications: 1(5245),21296(732377)
	Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

61f4bc83

lguest: improve interrupt handling, speed up stream networking · a32a8813

Rusty Russell authored Jun 12, 2009

lguest never checked for pending interrupts when enabling interrupts, and
things still worked.  However, it makes a significant difference to TCP
performance, so it's time we fixed it by introducing a pending_irq flag
and checking it on irq_restore and irq_enable.

These two routines are now too big to patch into the 8/10 bytes
patch space, so we drop that code.

Note: The high latency on interrupt delivery had a very curious
effect: once everything else was optimized, networking without GSO was
faster than networking with GSO, since more interrupts were sent and
hence a greater chance of one getting through to the Guest!

Note2: (Almost) Closing the same loophole for iret doesn't have any
measurable effect, so I'm leaving that patch for the moment.

Before:
	1GB tcpblast Guest->Host:		30.7 seconds
	1GB tcpblast Guest->Host (no GSO):	76.0 seconds

After:
	1GB tcpblast Guest->Host:		6.8 seconds
	1GB tcpblast Guest->Host (no GSO):	27.8 seconds
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

a32a8813

lguest: fix race in halt code · abd41f03

Rusty Russell authored Jun 12, 2009

When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting
that a timer or the Waker will wake_up_process() us.

But we do it in a stupid way, leaving a classic missing wakeup race.

So split maybe_do_interrupt() into interrupt_pending() and
try_deliver_interrupt(), and check maybe_do_interrupt() and the
"break_out" flag before calling schedule.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

abd41f03

lguest: remove invalid interrupt forcing logic. · ebf9a5a9

Rusty Russell authored Jun 12, 2009

20887611 (lguest: notify on empty) introduced
lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on
interrupts all the time.

Because we always process one buffer at a time, the inflight count is always 0
when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from
the Guest.

It should be looking to see if there are more buffers in the Guest's queue:
if it's empty, then we force an interrupt.

This makes little difference, since we usually have an empty queue; but
that's the subject of another patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

ebf9a5a9

lguest: fix lguest wake on guest clock tick, or fd activity · a6c372de

Rusty Russell authored Jun 12, 2009

The Launcher could be inside the Guest on another CPU; wake_up_process
will do nothing because it is "running".  kick_process will knock it
back into our kernel in this case, otherwise we'll miss it until the
next guest exit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

a6c372de

sched: export kick_process · b43e3521

Rusty Russell authored Jun 12, 2009

lguest needs kick_process: wake_up_process() does nothing if a process
is running, which isn't sufficient (we need it in the kernel).

And lguest support is usually modular.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>

b43e3521

lguest: get more serious about wmb() in example Launcher code · f7027c63

Rusty Russell authored Jun 12, 2009

Since the Launcher process runs the Guest, it doesn't have to be very
serious about its barriers: the Guest isn't running while we are (Guest
is UP).

Before we change to use threads to service devices, we need to fix this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

f7027c63

lguest: clean up lguest_init_IRQ · 1028375e

Rusty Russell authored Jun 12, 2009

Copy from arch/x86/kernel/irqinit_32.c: we don't use the vectors beyond
LGUEST_IRQS (if any), but we might as well set them all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

1028375e

lguest: cleanup passing of /dev/lguest fd around example launcher. · 56739c80

Rusty Russell authored Jun 12, 2009

We hand the /dev/lguest fd everywhere; it's far neater to just make it
a global (it already is, in fact, hidden in the waker_fds struct).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

56739c80

lguest: be paranoid about guest playing with device descriptors. · 713b15b3

Rusty Russell authored Jun 12, 2009

We can't trust the values in the device descriptor table once the
guest has booted, so keep local copies.  They could set them to
strange values then cause us to segv (they're 8 bit values, so they
can't make our pointers go too wild).

This becomes more important with the following patches which read them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

713b15b3

block: fix kernel-doc in recent block/ changes · 8ebf9756

Randy Dunlap authored Jun 11, 2009

Fix kernel-doc warnings in recently changed block/ source code.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8ebf9756

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 4b4f1d01

Linus Torvalds authored Jun 11, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (87 commits)
  nilfs2: get rid of bd_mount_sem use from nilfs
  nilfs2: correct exclusion control in nilfs_remount function
  nilfs2: simplify remaining sget() use
  nilfs2: get rid of sget use for checking if current mount is present
  nilfs2: get rid of sget use for acquiring nilfs object
  nilfs2: remove meaningless EBUSY case from nilfs_get_sb function
  remove the call to ->write_super in __sync_filesystem
  nilfs2: call nilfs2_write_super from nilfs2_sync_fs
  jffs2: call jffs2_write_super from jffs2_sync_fs
  ufs: add ->sync_fs
  sysv: add ->sync_fs
  hfsplus: add ->sync_fs
  hfs: add ->sync_fs
  fat: add ->sync_fs
  ext2: add ->sync_fs
  exofs: add ->sync_fs
  bfs: add ->sync_fs
  affs: add ->sync_fs
  sanitize ->fsync() for affs
  repair bfs_write_inode(), switch bfs to simple_fsync()
  ...

4b4f1d01

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 875287ca

Linus Torvalds authored Jun 11, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: remove unecessary include of thread_info.h in entry.S
  m68knommu: enumerate INIT_THREAD fields properly
  headers_check fix: m68k, swab.h
  arch/m68knommu: Convert #ifdef DEBUG printk(KERN_DEBUG to pr_debug(
  m68knommu: remove obsolete reset code
  m68knommu: move CPU reset code for the 5272 ColdFire into its platform code
  m68knommu: move CPU reset code for the 528x ColdFire into its platform code
  m68knommu: move CPU reset code for the 527x ColdFire into its platform code
  m68knommu: move CPU reset code for the 523x ColdFire into its platform code
  m68knommu: move CPU reset code for the 520x ColdFire into its platform code
  m68knommu: add CPU reset code for the 532x ColdFire
  m68knommu: add CPU reset code for the 5249 ColdFire
  m68knommu: add CPU reset code for the 5206e ColdFire
  m68knommu: add CPU reset code for the 5206 ColdFire
  m68knommu: add CPU reset code for the 5407 ColdFire
  m68knommu: add CPU reset code for the 5307 ColdFire
  m68knommu: merge system reset for code ColdFire 523x family
  m68knommu: fix system reset for ColdFire 527x family

875287ca

kvm: remove the duplicated cpumask_clear · aee74f3b

Yinghai Lu authored Jun 11, 2009

zalloc_cpumask_var already cleared it.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aee74f3b

x86: use zalloc_cpumask_var in arch_early_irq_init · 12274e96

Yinghai Lu authored Jun 11, 2009

So we make sure MAXSMP gets a cleared cpumask
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12274e96

perfcounters: remove powerpc definitions of perf_counter_do_pending · e14112d1

Stephen Rothwell authored Jun 12, 2009

Commit 925d519a ("perf_counter:
unify and fix delayed counter wakeup") added global definitions.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e14112d1

nilfs2: get rid of bd_mount_sem use from nilfs · aa7dfb89

Ryusuke Konishi authored Jun 08, 2009

This will remove every bd_mount_sem use in nilfs.

The intended exclusion control was replaced by the previous patch
("nilfs2: correct exclusion control in nilfs_remount function") for
nilfs_remount(), and this patch will replace remains with a new mutex
that this inserts in nilfs object.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

aa7dfb89

nilfs2: correct exclusion control in nilfs_remount function · e59399d0

Ryusuke Konishi authored Jun 08, 2009

nilfs_remount() changes mount state of a superblock instance.  Even
though nilfs accesses other superblock instances during mount or
remount, the mount state was not properly protected in
nilfs_remount().

Moreover, nilfs_remount() has a lock order reversal problem;
nilfs_get_sb() holds:

  1. bdev->bd_mount_sem
  2. sb->s_umount  (sget acquires)

and nilfs_remount() holds:

  1. sb->s_umount  (locked by the caller in vfs)
  2. bdev->bd_mount_sem

To avoid these problems, this patch divides a semaphore protecting
super block instances from nilfs->ns_sem, and applies it to the mount
state protection in nilfs_remount().

With this change, bd_mount_sem use is removed from nilfs_remount() and
the lock order reversal will be resolved.  And the new rw-semaphore,
nilfs->ns_super_sem will properly protect the mount state except the
modification from nilfs_error function.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

e59399d0

nilfs2: simplify remaining sget() use · 6dd47406

Ryusuke Konishi authored Jun 08, 2009

This simplifies the test function passed on the remaining sget()
callsite in nilfs.

Instead of checking mount type (i.e. ro-mount/rw-mount/snapshot mount)
in the test function passed to sget(), this patch first looks up the
nilfs_sb_info struct which the given mount type matches, and then
acquires the super block instance holding the nilfs_sb_info.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6dd47406

nilfs2: get rid of sget use for checking if current mount is present · 3f82ff55

Ryusuke Konishi authored Jun 08, 2009

This stops using sget() for checking if an r/w-mount or an r/o-mount
exists on the device.  This elimination uses a back pointer to the
current mount added to nilfs object.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

3f82ff55

nilfs2: get rid of sget use for acquiring nilfs object · 33c8e57c

Ryusuke Konishi authored Jun 08, 2009

This will change the way to obtain nilfs object in nilfs_get_sb()
function.

Previously, a preliminary sget() call was performed, and the nilfs
object was acquired from a super block instance found by the sget()
call.

This patch, instead, instroduces a new dedicated function
find_or_create_nilfs(); as the name implies, the function finds an
existent nilfs object from a global list or creates a new one if no
object is found on the device.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

33c8e57c

nilfs2: remove meaningless EBUSY case from nilfs_get_sb function · 81fc20bd

Ryusuke Konishi authored Jun 08, 2009

The following EBUSY case in nilfs_get_sb() is meaningless.  Indeed,
this error code is never returned to the caller.

    if (!s->s_root) {
          ...
    } else if (!(s->s_flags & MS_RDONLY)) {
        err = -EBUSY;
    }

This simply removes the else case.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

81fc20bd

remove the call to ->write_super in __sync_filesystem · 0c95ee19

Christoph Hellwig authored Jun 08, 2009

Now that all filesystems provide ->sync_fs methods we can change
__sync_filesystem to only call ->sync_fs.

This gives us a clear separation between periodic writeouts which
are driven by ->write_super and data integrity syncs that go
through ->sync_fs. (modulo file_fsync which is also going away)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

0c95ee19

nilfs2: call nilfs2_write_super from nilfs2_sync_fs · d731e063

Christoph Hellwig authored Jun 08, 2009

The call to ->write_super from __sync_filesystem will go away, so make
sure nilfs2 performs the same actions from inside ->sync_fs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d731e063

jffs2: call jffs2_write_super from jffs2_sync_fs · d579ed00

Christoph Hellwig authored Jun 08, 2009

The call to ->write_super from __sync_filesystem will go away, so make
sure jffs2 performs the same actions from inside ->sync_fs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d579ed00

ufs: add ->sync_fs · 8c800656

Christoph Hellwig authored Jun 08, 2009

Add a ->sync_fs method for data integrity syncs, and reimplement
->write_super ontop of it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

8c800656

sysv: add ->sync_fs · ad43ffde

Christoph Hellwig authored Jun 08, 2009

Add a ->sync_fs method for data integrity syncs, and reimplement
->write_super ontop of it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ad43ffde

hfsplus: add ->sync_fs · 7fbc6df0

Christoph Hellwig authored Jun 08, 2009

Add a ->sync_fs method for data integrity syncs, and reimplement
->write_super ontop of it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7fbc6df0