1. 12 Jun, 2009 40 commits
    • Christof Schmitt's avatar
      [SCSI] zfcp: Update FC pass-through support · dc577d55
      Christof Schmitt authored
      Don't access the block layer request, get the payload length instead
      from the FC job. Simplify access to the zfcp_port, only the d_id is
      required, if the port is no longer accessed later. This is possible
      when the els_handler does not access the port pointer from the ELS
      request.
      Reviewed-by: default avatarSwen Schillig <swen@vnet.ibm.com>
      Signed-off-by: default avatarChristof Schmitt <christof.schmitt@de.ibm.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      dc577d55
    • Sven Schuetz's avatar
      [SCSI] zfcp: Add FC pass-through support · 9d544f2b
      Sven Schuetz authored
      Provide the ability to do fibre channel requests from the userspace to
      our zfcp driver.  Patch builds upon extension to the fibre channel
      tranport class by James Smart and Seokmann Ju.  See here
      http://marc.info/?l=linux-scsi&m=123808882309133&w=2Signed-off-by: default avatarSven Schuetz <sven@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristof Schmitt <christof.schmitt@de.ibm.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      9d544f2b
    • James Smart's avatar
      [SCSI] FC Pass Thru support · 9e4f5e29
      James Smart authored
      Attached is the ELS/CT pass-thru patch for the FC Transport. The patch
      creates a generic framework that lays on top of bsg and the SGIO v4 ioctl
      in order to pass transaction requests to LLDD's.
      
      The interface supports the following operations:
        On an fc_host basis:
          Request login to the specified N_Port_ID, creating an fc_rport.
          Request logout of the specified N_Port_ID, deleting an fc_rport
          Send ELS request to specified N_Port_ID w/o requiring a login, and
            wait for ELS response.
          Send CT request to specified N_Port_ID and wait for CT response.
            Login is required, but LLDD is allowed to manage login and decide
            whether it stays in place after the request is satisfied.
          Vendor-Unique request. Allows a LLDD-specific request to be passed
            to the LLDD, and the passing of a response back to the application.
        On an fc_rport basis:
          Send ELS request to nport and wait for ELS response.
          Send CT request to nport and wait for CT response.
      
      The patch also exports several headers from include/scsi such that
      they can be available to user-space applications:
        include/scsi/scsi.h
        include/scsi/scsi_netlink.h
        include/scsi/scsi_netlink_fc.h
        include/scsi/scsi_bsg_fc.h
      
      For further information, refer to the last RFC:
      http://marc.info/?l=linux-scsi&m=123436574018579&w=2
      
      Note: Documentation is still spotty and will be added later.
      
      [bharrosh@panasas.com: update for new block API]
      Signed-off-by: default avatarJames Smart <james.smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      9e4f5e29
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 · e349792a
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (290 commits)
        ALSA: pcm - Update document about xrun_debug proc file
        ALSA: lx6464es - support standard alsa module parameters
        ALSA: snd_usb_caiaq: set mixername
        ALSA: hda - add quirk for STAC92xx (SigmaTel STAC9205)
        ALSA: use card device as parent for jack input-devices
        ALSA: sound/ps3: Correct existing and add missing annotations
        ALSA: sound/ps3: Restructure driver source
        ALSA: sound/ps3: Fix checkpatch issues
        ASoC: Fix lm4857 control
        ALSA: ctxfi - Clear PCM resources at hw_params and hw_free
        ALSA: ctxfi - Check the presence of SRC instance in PCM pointer callbacks
        ALSA: ctxfi - Add missing start check in atc_pcm_playback_start()
        ALSA: ctxfi - Add use_system_timer module option
        ALSA: usb - Add boot quirk for C-Media 6206 USB Audio
        ALSA: ctxfi - Fix wrong model id for UAA
        ALSA: ctxfi - Clean up probe routines
        ALSA: hda - Fix the previous tagra-8ch patch
        ALSA: hda - Add 7.1 support for MSI GX620
        ALSA: pcm - A helper function to compose PCM stream name for debug prints
        ALSA: emu10k1 - Fix minimum periods for efx playback
        ...
      e349792a
    • Takashi Iwai's avatar
      Merge branch 'topic/ps3' into for-linus · e3f86d3d
      Takashi Iwai authored
      * topic/ps3:
        ALSA: sound/ps3: Correct existing and add missing annotations
        ALSA: sound/ps3: Restructure driver source
        ALSA: sound/ps3: Fix checkpatch issues
      e3f86d3d
    • Takashi Iwai's avatar
      Merge branch 'topic/pcm-jiffies-check' into for-linus · 056c1ebf
      Takashi Iwai authored
      * topic/pcm-jiffies-check:
        ALSA: pcm - Update document about xrun_debug proc file
      056c1ebf
    • Takashi Iwai's avatar
      Merge branch 'topic/misc' into for-linus · be914cf9
      Takashi Iwai authored
      * topic/misc:
        ALSA: use card device as parent for jack input-devices
      be914cf9
    • Takashi Iwai's avatar
      Merge branch 'topic/lx6464es' into for-linus · 31d496aa
      Takashi Iwai authored
      * topic/lx6464es:
        ALSA: lx6464es - support standard alsa module parameters
      31d496aa
    • Takashi Iwai's avatar
      Merge branch 'topic/hda' into for-linus · f8be792d
      Takashi Iwai authored
      * topic/hda:
        ALSA: hda - add quirk for STAC92xx (SigmaTel STAC9205)
      f8be792d
    • Takashi Iwai's avatar
      Merge branch 'topic/caiaq' into for-linus · 80986be4
      Takashi Iwai authored
      * topic/caiaq:
        ALSA: snd_usb_caiaq: set mixername
      80986be4
    • Takashi Iwai's avatar
      Merge branch 'topic/asoc' into for-linus · a6093a24
      Takashi Iwai authored
      * topic/asoc:
        ASoC: Fix lm4857 control
      a6093a24
    • Linus Torvalds's avatar
      Merge branch 'topic/slab/earlyboot-v2' of... · 6d214918
      Linus Torvalds authored
      Merge branch 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
      
      * 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
        slab: setup cpu caches later on when interrupts are enabled
        slab,slub: don't enable interrupts during early boot
        slab: fix gfp flag in setup_cpu_cache()
        x86: make zap_low_mapping could be used early
        irq: slab alloc for default irq_affinity
        memcg: fix page_cgroup fatal error in FLATMEM
      6d214918
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 · c9b8af00
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (154 commits)
        [SCSI] osd: Remove out-of-tree left overs
        [SCSI] libosd: Use REQ_QUIET requests.
        [SCSI] osduld: use filp_open() when looking up an osd-device
        [SCSI] libosd: Define an osd_dev wrapper to retrieve the request_queue
        [SCSI] libosd: osd_req_{read,write} takes a length parameter
        [SCSI] libosd: Let _osd_req_finalize_data_integrity receive number of out_bytes
        [SCSI] libosd: osd_req_{read,write}_kern new API
        [SCSI] libosd: Better printout of OSD target system information
        [SCSI] libosd: OSD2r05: Attribute definitions
        [SCSI] libosd: OSD2r05: Additional command enums
        [SCSI] mpt fusion: fix up doc book comments
        [SCSI] mpt fusion: Added support for Broadcast primitives Event handling
        [SCSI] mpt fusion: Queue full event handling
        [SCSI] mpt fusion: RAID device handling and Dual port Raid support is added
        [SCSI] mpt fusion: Put IOC into ready state if it not already in ready state
        [SCSI] mpt fusion: Code Cleanup patch
        [SCSI] mpt fusion: Rescan SAS topology added
        [SCSI] mpt fusion: SAS topology scan changes, expander events
        [SCSI] mpt fusion: Firmware event implementation using seperate WorkQueue
        [SCSI] mpt fusion: rewrite of ioctl_cmds internal generated function
        ...
      c9b8af00
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/~dwmw2/firmware-2.6 · c59a264c
      Linus Torvalds authored
      * git://git.infradead.org/~dwmw2/firmware-2.6:
        firmware: speed up request_firmware(), v3
      c59a264c
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw · 6cb8a911
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
        GFS2: Remove lock_kernel from gfs2_put_super()
        GFS2: Add tracepoints
      6cb8a911
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest · 7f3591cf
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits)
        lguest: add support for indirect ring entries
        lguest: suppress notifications in example Launcher
        lguest: try to batch interrupts on network receive
        lguest: avoid sending interrupts to Guest when no activity occurs.
        lguest: implement deferred interrupts in example Launcher
        lguest: remove obsolete LHREQ_BREAK call
        lguest: have example Launcher service all devices in separate threads
        lguest: use eventfds for device notification
        eventfd: export eventfd_signal and eventfd_fget for lguest
        lguest: allow any process to send interrupts
        lguest: PAE fixes
        lguest: PAE support
        lguest: Add support for kvm_hypercall4()
        lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD
        lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated
        lguest: map switcher with executable page table entries
        lguest: fix writev returning short on console output
        lguest: clean up length-used value in example launcher
        lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition.
        lguest: beyond ARRAY_SIZE of cpu->arch.gdt
        ...
      7f3591cf
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-virtio · 16ffc3ee
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-virtio:
        virtio: enhance id_matching for virtio drivers
        virtio: fix id_matching for virtio drivers
        virtio: handle short buffers in virtio_rng.
        virtio_blk: add missing __dev{init,exit} markings
        virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
        virtio: teach virtio_has_feature() about transport features
        virtio: expose features in sysfs
        virtio_pci: optional MSI-X support
        virtio_pci: split up vp_interrupt
        virtio: find_vqs/del_vqs virtio operations
        virtio: add names to virtqueue struct, mapping from devices to queues.
        virtio: meet virtio spec by finalizing features before using device
        virtio: fix obsolete documentation on probe function
      16ffc3ee
    • Linus Torvalds's avatar
      Merge branch 'cuse' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · c34752bc
      Linus Torvalds authored
      * 'cuse' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        CUSE: implement CUSE - Character device in Userspace
        fuse: export symbols to be used by CUSE
        fuse: update fuse_conn_init() and separate out fuse_conn_kill()
        fuse: don't use inode in fuse_file_poll
        fuse: don't use inode in fuse_do_ioctl() helper
        fuse: don't use inode in fuse_sync_release()
        fuse: create fuse_do_open() helper for CUSE
        fuse: clean up args in fuse_finish_open() and fuse_release_fill()
        fuse: don't use inode in helpers called by fuse_direct_io()
        fuse: add members to struct fuse_file
        fuse: prepare fuse_direct_io() for CUSE
        fuse: clean up fuse_write_fill()
        fuse: use struct path in release structure
        fuse: misc cleanups
      c34752bc
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param · 65d52cc9
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param:
        module: cleanup FIXME comments about trimming exception table entries.
        module: trim exception table on init free.
        module: merge module_alloc() finally
        uml module: fix uml build process due to this merge
        x86 module: merge the rest functions with macros
        x86 module: merge the same functions in module_32.c and module_64.c
        uvesafb: improve parameter handling.
        module_param: allow 'bool' module_params to be bool, not just int.
        module_param: add __same_type convenience wrapper for __builtin_types_compatible_p
        module_param: split perm field into flags and perm
        module_param: invbool should take a 'bool', not an 'int'
        cyber2000fb.c: use proper method for stopping unload if CONFIG_ARCH_SHARK
      65d52cc9
    • Linus Torvalds's avatar
      Merge branch 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 · d614aec4
      Linus Torvalds authored
      * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (29 commits)
        ide: re-implement ide_pci_init_one() on top of ide_pci_init_two()
        ide: unexport ide_find_dma_mode()
        ide: fix PowerMac bootup oops
        ide: skip probe if there are no devices on the port (v2)
        sl82c105: add printk() logging facility
        ide-tape: fix proc warning
        ide: add IDE_DFLAG_NIEN_QUIRK device flag
        ide: respect quirk_drives[] list on all controllers
        hpt366: enable all quirks for devices on quirk_drives[] list
        hpt366: sync quirk_drives[] list with pdc202xx_{new,old}.c
        ide: remove superfluous SELECT_MASK() call from do_rw_taskfile()
        ide: remove superfluous SELECT_MASK() call from ide_driveid_update()
        icside: remove superfluous ->maskproc method
        ide-tape: fix IDE_AFLAG_* atomic accesses
        ide-tape: change IDE_AFLAG_IGNORE_DSC non-atomically
        pdc202xx_old: kill resetproc() method
        pdc202xx_old: don't call pdc202xx_reset() on IRQ timeout
        pdc202xx_old: use ide_dma_test_irq()
        ide: preserve Host Protected Area by default (v2)
        ide-gd: implement block device ->set_capacity method (v2)
        ...
      d614aec4
    • Linus Torvalds's avatar
      Merge branch 'x86-fixes-for-linus' of... · db8e7f10
      Linus Torvalds authored
      Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        x86: Provide _sdata in the vmlinux.lds.S file
        x86: handle initrd that extends into unusable memory
      db8e7f10
    • Pekka Enberg's avatar
      slab: setup cpu caches later on when interrupts are enabled · 8429db5c
      Pekka Enberg authored
      Fixes the following boot-time warning:
      
        [    0.000000] ------------[ cut here ]------------
        [    0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x56/0x1bc()
        [    0.000000] Hardware name:
        [    0.000000] Modules linked in:
        [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #492
        [    0.000000] Call Trace:
        [    0.000000]  [<ffffffff8149e021>] ? _spin_unlock+0x4f/0x5c
        [    0.000000]  [<ffffffff8108f11b>] ? smp_call_function_many+0x56/0x1bc
        [    0.000000]  [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9
        [    0.000000]  [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16
        [    0.000000]  [<ffffffff8108f11b>] smp_call_function_many+0x56/0x1bc
        [    0.000000]  [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54
        [    0.000000]  [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54
        [    0.000000]  [<ffffffff8108f2be>] smp_call_function+0x3d/0x68
        [    0.000000]  [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54
        [    0.000000]  [<ffffffff81066fd8>] on_each_cpu+0x31/0x7c
        [    0.000000]  [<ffffffff810f64f5>] do_tune_cpucache+0x119/0x454
        [    0.000000]  [<ffffffff81087080>] ? lockdep_init_map+0x94/0x10b
        [    0.000000]  [<ffffffff818133b0>] ? kmem_cache_init+0x421/0x593
        [    0.000000]  [<ffffffff810f69cf>] enable_cpucache+0x68/0xad
        [    0.000000]  [<ffffffff818133c3>] kmem_cache_init+0x434/0x593
        [    0.000000]  [<ffffffff8180987c>] ? mem_init+0x156/0x161
        [    0.000000]  [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9
        [    0.000000]  [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae
        [    0.000000]  [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8
        [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      8429db5c
    • Pekka Enberg's avatar
      slab,slub: don't enable interrupts during early boot · 7e85ee0c
      Pekka Enberg authored
      As explained by Benjamin Herrenschmidt:
      
        Oh and btw, your patch alone doesn't fix powerpc, because it's missing
        a whole bunch of GFP_KERNEL's in the arch code... You would have to
        grep the entire kernel for things that check slab_is_available() and
        even then you'll be missing some.
      
        For example, slab_is_available() didn't always exist, and so in the
        early days on powerpc, we used a mem_init_done global that is set form
        mem_init() (not perfect but works in practice). And we still have code
        using that to do the test.
      
      Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
      in early boot code to avoid enabling interrupts.
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      7e85ee0c
    • Pekka Enberg's avatar
      slab: fix gfp flag in setup_cpu_cache() · eb91f1d0
      Pekka Enberg authored
      Fixes the following warning during bootup when compiling with CONFIG_SLAB:
      
        [    0.000000] ------------[ cut here ]------------
        [    0.000000] WARNING: at kernel/lockdep.c:2282 lockdep_trace_alloc+0x91/0xb9()
        [    0.000000] Hardware name:
        [    0.000000] Modules linked in:
        [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #491
        [    0.000000] Call Trace:
        [    0.000000]  [<ffffffff81087d84>] ? lockdep_trace_alloc+0x91/0xb9
        [    0.000000]  [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9
        [    0.000000]  [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16
        [    0.000000]  [<ffffffff81087d84>] lockdep_trace_alloc+0x91/0xb9
        [    0.000000]  [<ffffffff810f5b03>] kmem_cache_alloc_node_notrace+0x26/0xdf
        [    0.000000]  [<ffffffff81487f4e>] ? setup_cpu_cache+0x7e/0x210
        [    0.000000]  [<ffffffff81487fe3>] setup_cpu_cache+0x113/0x210
        [    0.000000]  [<ffffffff810f73ff>] kmem_cache_create+0x409/0x486
        [    0.000000]  [<ffffffff818131c1>] kmem_cache_init+0x232/0x593
        [    0.000000]  [<ffffffff8180987c>] ? mem_init+0x156/0x161
        [    0.000000]  [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9
        [    0.000000]  [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae
        [    0.000000]  [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8
        [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      eb91f1d0
    • James Bottomley's avatar
      [SCSI] Merge branch 'linus' · 82681a31
      James Bottomley authored
      Conflicts:
      	drivers/message/fusion/mptsas.c
      
      fixed up conflict between req->data_len accessors and mptsas driver updates.
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      82681a31
    • Mark McLoughlin's avatar
      lguest: add support for indirect ring entries · d1f0132e
      Mark McLoughlin authored
      Support the VIRTIO_RING_F_INDIRECT_DESC feature.
      
      This is a simple matter of changing the descriptor walking
      code to operate on a struct vring_desc* and supplying it
      with an indirect table if detected.
      Signed-off-by: default avatarMark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      d1f0132e
    • Rusty Russell's avatar
      lguest: suppress notifications in example Launcher · b60da13f
      Rusty Russell authored
      The Guest only really needs to tell us about activity when we're going
      to listen to the eventfd: normally, we don't want to know.
      
      So if there are no available buffers, turn on notifications, re-check,
      then wait for the Guest to notify us via the eventfd, then turn
      notifications off again.
      
      There's enough else going on that the differences are in the noise.
      
      Before:				Secs	RxKicks	TxKicks
       1G TCP Guest->Host:		3.94	  4686	  32815
       1M normal pings:		104	142862	1000010
       1M 1k pings (-l 120):		57	142026	1000007
      
      After:
       1G TCP Guest->Host:		3.76	  4691	  32811
       1M normal pings:		111	142859	 997467
       1M 1k pings (-l 120):		55	 19648	 501549
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      b60da13f
    • Rusty Russell's avatar
      lguest: try to batch interrupts on network receive · 4a8962e2
      Rusty Russell authored
      Rather than triggering an interrupt every time, we only trigger an
      interrupt when there are no more incoming packets (or the recv queue
      is full).
      
      However, the overhead of doing the select to figure this out is
      measurable: 1M pings goes from 98 to 104 seconds, and 1G Guest->Host
      TCP goes from 3.69 to 3.94 seconds.  It's close to the noise though.
      
      I tested various timeouts, including reducing it as the number of
      pending packets increased, timing a 1 gigabyte TCP send from Guest ->
      Host and Host -> Guest (GSO disabled, to increase packet rate).
      
      // time tcpblast -o -s 65536 -c 16k 192.168.2.1:9999 > /dev/null
      
      Timeout		Guest->Host	Pkts/irq	Host->Guest	Pkts/irq
      Before		11.3s		1.0		6.3s		1.0
      0		11.7s		1.0		6.6s		23.5
      1		17.1s		8.8		8.6s		26.0
      1/pending	13.4s		1.9		6.6s		23.8
      2/pending	13.6s		2.8		6.6s		24.1
      5/pending	14.1s		5.0		6.6s		24.4
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      4a8962e2
    • Rusty Russell's avatar
      lguest: avoid sending interrupts to Guest when no activity occurs. · 95c517c0
      Rusty Russell authored
      If we track how many buffers we've used, we can tell whether we really
      need to interrupt the Guest.  This happens as a side effect of
      spurious notifications.
      
      Spurious notifications happen because it can take a while before the
      Host thread wakes up and sets the VRING_USED_F_NO_NOTIFY flag, and
      meanwhile the Guest can more notifications.
      
      A real fix would be to use wake counts, rather than a suppression
      flag, but the practical difference is generally in the noise: the
      interrupt is usually coalesced into a pending one anyway so we just
      save a system call which isn't clearly measurable.
      
      				Secs	Spurious IRQS
      1G TCP Guest->Host:		3.93	58
      1M normal pings:		100	72
      1M 1k pings (-l 120):		57	492904
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      95c517c0
    • Rusty Russell's avatar
      lguest: implement deferred interrupts in example Launcher · 38bc2b8c
      Rusty Russell authored
      Rather than sending an interrupt on every buffer, we only send an interrupt
      when we're about to wait for the Guest to send us a new one.  The console
      input and network input still send interrupts manually, but the block device,
      network and console output queues can simply rely on this logic to send
      interrupts to the Guest at the right time.
      
      The patch is cluttered by moving trigger_irq() higher in the code.
      
      In practice, two factors make this optimization less interesting:
      (1) we often only get one input at a time, even for networking,
      (2) triggering an interrupt rapidly tends to get coalesced anyway.
      
      Before:				Secs	RxIRQS	TxIRQs
       1G TCP Guest->Host:		3.72	32784	32771
       1M normal pings:		99	1000004	995541
       100,000 1k pings (-l 120):	5	49510	49058
      
      After:
       1G TCP Guest->Host:		3.69	32809	32769
       1M normal pings:		99	1000004	996196
       100,000 1k pings (-l 120):	5	52435	52361
      
      (Note the interrupt count on 100k pings goes *up*: see next patch).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      38bc2b8c
    • Rusty Russell's avatar
      lguest: remove obsolete LHREQ_BREAK call · 5dac051b
      Rusty Russell authored
      We no longer need an efficient mechanism to force the Guest back into
      host userspace, as each device is serviced without bothering the main
      Guest process (aka. the Launcher).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      5dac051b
    • Rusty Russell's avatar
      lguest: have example Launcher service all devices in separate threads · 659a0e66
      Rusty Russell authored
      Currently lguest has three threads: the main Launcher thread, a Waker
      thread, and a thread for the block device (because synchronous block
      was simply too painful to bear).
      
      The Waker selects() on all the input file descriptors (eg. stdin, net
      devices, pipe to the block thread) and when one becomes readable it calls
      into the kernel to kick the Launcher thread out into userspace, which
      repeats the poll, services the device(s), and then tells the kernel to
      release the Waker before re-entering the kernel to run the Guest.
      
      Also, to make a slightly-decent network transmit routine, the Launcher
      would suppress further network interrupts while it set a timer: that
      signal handler would write to a pipe, which would rouse the Waker
      which would prod the Launcher out of the kernel to check the network
      device again.
      
      Now we can convert all our virtqueues to separate threads: each one has
      a separate eventfd for when the Guest pokes the device, and can trigger
      interrupts in the Guest directly.
      
      The linecount shows how much this simplifies, but to really bring it
      home, here's an strace analysis of single Guest->Host ping before:
      
      * Guest sends packet, notifies xmit vq, return control to Launcher
      * Launcher clears notification flag on xmit ring
      * Launcher writes packet to TUN device
      	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
      * Launcher sets up interrupt for Guest (xmit ring is empty)
      	write(10, "\2\0\0\0\3\0\0\0", 8) = 0
      * Launcher sets up timer for interrupt mitigation
      	setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0
      * Launcher re-runs guest
      	pread64(10, 0xbfa5f4d4, 4, 0) ...
      * Waker notices reply packet in tun device (it was in select)
      	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4])
      * Waker kicks Launcher out of guest:
      	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
      * Launcher returns from running guest:
      	... = -1 EAGAIN (Resource temporarily unavailable)
      * Launcher looks at input fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
      * Launcher reads pong from tun device:
      	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108
      * Launcher injects guest notification:
      	write(10, "\2\0\0\0\2\0\0\0", 8) = 0
      * Launcher rechecks fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
      * Launcher clears Waker:
      	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
      * Launcher reruns Guest:
      	pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted)
      * Signal comes in, uses pipe to wake up Launcher:
      	--- SIGALRM (Alarm clock) @ 0 (0) ---
      	write(8, "\0", 1)       = 1
      	sigreturn()             = ? (mask now [])
      * Waker sees write on pipe:
      	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6])
      * Waker kicks Launcher out of Guest:
      	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
      * Launcher exits from kernel:
      	pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable)
      * Launcher looks to see what fd woke it:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0})
      * Launcher reads timeout fd, sets notification flag on xmit ring
      	read(6, "\0", 32)       = 1
      * Launcher rechecks fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
      * Launcher clears Waker:
      	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
      * Launcher resumes Guest:
      	pread64(10, "\0p\0\4", 4, 0) ....
      
      strace analysis of single Guest->Host ping after:
      
      * Guest sends packet, notifies xmit vq, creates event on eventfd.
      * Network xmit thread wakes from read on eventfd:
      	read(7, "\1\0\0\0\0\0\0\0", 8)          = 8
      * Network xmit thread writes packet to TUN device
      	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
      * Network recv thread wakes up from read on tunfd:
      	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108
      * Network recv thread sets up interrupt for the Guest
      	write(6, "\2\0\0\0\2\0\0\0", 8) = 0
      * Network recv thread goes back to reading tunfd
      	13:39:42.460285 readv(4,  <unfinished ...>
      * Network xmit thread sets up interrupt for Guest (xmit ring is empty)
      	write(6, "\2\0\0\0\3\0\0\0", 8) = 0
      * Network xmit thread goes back to reading from eventfd
      	read(7, <unfinished ...>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      659a0e66
    • Rusty Russell's avatar
      lguest: use eventfds for device notification · df60aeef
      Rusty Russell authored
      Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
      an address: the main Launcher process returns with this address, and figures
      out what device to run.
      
      A far nicer model is to let processes bind an eventfd to an address: if we
      find one, we simply signal the eventfd.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      df60aeef
    • Rusty Russell's avatar
      eventfd: export eventfd_signal and eventfd_fget for lguest · 5718607b
      Rusty Russell authored
      lguest wants to attach eventfds to guest notifications, and lguest is
      usually a module.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      To: Davide Libenzi <davidel@xmailserver.org>
      5718607b
    • Rusty Russell's avatar
      lguest: allow any process to send interrupts · 9f155a9b
      Rusty Russell authored
      We currently only allow the Launcher process to send interrupts, but it
      as we already send interrupts from the hrtimer, it's a simple matter of
      extracting that code into a common set_interrupt routine.
      
      As we switch to a thread per virtqueue, this avoids a bottleneck through the
      main Launcher process.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      9f155a9b
    • Rusty Russell's avatar
      lguest: PAE fixes · 92b4d8df
      Rusty Russell authored
      1) j wasn't initialized in setup_pagetables, so they weren't set up for me
         causing immediate guest crashes.
      
      2) gpte_addr should not re-read the pmd from the Guest.  Especially
         not BUG_ON() based on the value.  If we ever supported SMP guests,
         they could trigger that.  And the Launcher could also trigger it
         (tho currently root-only).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      92b4d8df
    • Matias Zabaljauregui's avatar
      lguest: PAE support · acdd0b62
      Matias Zabaljauregui authored
      This version requires that host and guest have the same PAE status.
      NX cap is not offered to the guest, yet.
      Signed-off-by: default avatarMatias Zabaljauregui <zabaljauregui@gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      acdd0b62
    • Matias Zabaljauregui's avatar
      lguest: Add support for kvm_hypercall4() · cefcad17
      Matias Zabaljauregui authored
      Add support for kvm_hypercall4(); PAE wants it.
      
      Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      cefcad17
    • Matias Zabaljauregui's avatar
      lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD · ebe0ba84
      Matias Zabaljauregui authored
      replace LHCALL_SET_PMD with LHCALL_SET_PGD hypercall name
      (That's really what it is, and the confusion gets worse with PAE support)
      Signed-off-by: default avatarMatias Zabaljauregui <zabaljauregui@gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Reported-by: default avatarJeremy Fitzhardinge <jeremy@goop.org>
      ebe0ba84
    • Matias Zabaljauregui's avatar
      lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated · 90603d15
      Matias Zabaljauregui authored
      Some cleanups and replace direct assignment with native_set_* macros which properly handle 64-bit entries when PAE is activated
      Signed-off-by: default avatarMatias Zabaljauregui <zabaljauregui@gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      90603d15