1. 08 May, 2015 6 commits
  2. 07 May, 2015 14 commits
    • Paolo Bonzini's avatar
      KVM: x86: dump VMCS on invalid entry · 4eb64dce
      Paolo Bonzini authored
      Code and format roughly based on Xen's vmcs_dump_vcpu.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4eb64dce
    • Marcelo Tosatti's avatar
      x86: kvmclock: drop rdtsc_barrier() · a3eb97bd
      Marcelo Tosatti authored
      Drop unnecessary rdtsc_barrier(), as has been determined empirically,
      see 057e6a8c for details.
      
      Noticed by Andy Lutomirski.
      
      Improves clock_gettime() by approximately 15% on
      Intel i7-3520M @ 2.90GHz.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a3eb97bd
    • Julia Lawall's avatar
      KVM: x86: drop unneeded null test · d90e3a35
      Julia Lawall authored
      If the null test is needed, the call to cancel_delayed_work_sync would have
      already crashed.  Normally, the destroy function should only be called
      if the init function has succeeded, in which case ioapic is not null.
      
      Problem found using Coccinelle.
      Suggested-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d90e3a35
    • Radim Krčmář's avatar
      KVM: x86: fix initial PAT value · 74545705
      Radim Krčmář authored
      PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT.
      VMX used a wrong value (host's PAT) and while SVM used the right one,
      it never got to arch.pat.
      
      This is not an issue with QEMU as it will force the correct value.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74545705
    • Rik van Riel's avatar
      kvm,x86: load guest FPU context more eagerly · 653f52c3
      Rik van Riel authored
      Currently KVM will clear the FPU bits in CR0.TS in the VMCS, and trap to
      re-load them every time the guest accesses the FPU after a switch back into
      the guest from the host.
      
      This patch copies the x86 task switch semantics for FPU loading, with the
      FPU loaded eagerly after first use if the system uses eager fpu mode,
      or if the guest uses the FPU frequently.
      
      In the latter case, after loading the FPU for 255 times, the fpu_counter
      will roll over, and we will revert to loading the FPU on demand, until
      it has been established that the guest is still actively using the FPU.
      
      This mirrors the x86 task switch policy, which seems to work.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      653f52c3
    • James Sullivan's avatar
      kvm: x86: Deliver MSI IRQ to only lowest prio cpu if msi_redir_hint is true · d1ebdbf9
      James Sullivan authored
      An MSI interrupt should only be delivered to the lowest priority CPU
      when it has RH=1, regardless of the delivery mode. Modified
      kvm_is_dm_lowest_prio() to check for either irq->delivery_mode == APIC_DM_LOWPRI
      or irq->msi_redir_hint.
      
      Moved kvm_is_dm_lowest_prio() into lapic.h and renamed to
      kvm_lowest_prio_delivery().
      
      Changed a check in kvm_irq_delivery_to_apic_fast() from
      irq->delivery_mode == APIC_DM_LOWPRI to kvm_is_dm_lowest_prio().
      Signed-off-by: default avatarJames Sullivan <sullivan.james.f@gmail.com>
      Reviewed-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d1ebdbf9
    • James Sullivan's avatar
      kvm: x86: Extended struct kvm_lapic_irq with msi_redir_hint for MSI delivery · 93bbf0b8
      James Sullivan authored
      Extended struct kvm_lapic_irq with bool msi_redir_hint, which will
      be used to determine if the delivery of the MSI should target only
      the lowest priority CPU in the logical group specified for delivery.
      (In physical dest mode, the RH bit is not relevant). Initialized the value
      of msi_redir_hint to true when RH=1 in kvm_set_msi_irq(), and initialized
      to false in all other cases.
      
      Added value of msi_redir_hint to a debug message dump of an IRQ in
      apic_send_ipi().
      Signed-off-by: default avatarJames Sullivan <sullivan.james.f@gmail.com>
      Reviewed-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      93bbf0b8
    • Paolo Bonzini's avatar
      KVM: x86: tweak types of fields in kvm_lapic_irq · b7cb2231
      Paolo Bonzini authored
      Change to u16 if they only contain data in the low 16 bits.
      
      Change the level field to bool, since we assign 1 sometimes, but
      just mask icr_low with APIC_INT_ASSERT in apic_send_ipi.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b7cb2231
    • Nadav Amit's avatar
      KVM: x86: INIT and reset sequences are different · d28bc9dd
      Nadav Amit authored
      x86 architecture defines differences between the reset and INIT sequences.
      INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
      MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
      
      References (from Intel SDM):
      
      "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
      to a specific processor or system wide) do not cause the MP protocol to be
      repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
      
      [Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
      
      "If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
      changed." [9.2: X87 FPU INITIALIZATION]
      
      "The state of the local APIC following an INIT reset is the same as it is after
      a power-up or hardware reset, except that the APIC ID and arbitration ID
      registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
      ("Wait-for-SIPI" State)]
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d28bc9dd
    • Nadav Amit's avatar
      KVM: x86: Support for disabling quirks · 90de4a18
      Nadav Amit authored
      Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were previous
      created in order to overcome QEMU issues. Those issue were mostly result of
      invalid VM BIOS.  Currently there are two quirks that can be disabled:
      
      1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot
      2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot
      
      These two issues are already resolved in recent releases of QEMU, and would
      therefore be disabled by QEMU.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Message-Id: <1428879221-29996-1-git-send-email-namit@cs.technion.ac.il>
      [Report capability from KVM_CHECK_EXTENSION too. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      90de4a18
    • Paolo Bonzini's avatar
      KVM: booke: use __kvm_guest_exit · e233d54d
      Paolo Bonzini authored
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e233d54d
    • Christian Borntraeger's avatar
      KVM: arm/mips/x86/power use __kvm_guest_{enter|exit} · ccf73aaf
      Christian Borntraeger authored
      Use __kvm_guest_{enter|exit} instead of kvm_guest_{enter|exit}
      where interrupts are disabled.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ccf73aaf
    • Christian Borntraeger's avatar
      KVM: provide irq_unsafe kvm_guest_{enter|exit} · 0097d12e
      Christian Borntraeger authored
      Several kvm architectures disable interrupts before kvm_guest_enter.
      kvm_guest_enter then uses local_irq_save/restore to disable interrupts
      again or for the first time. Lets provide underscore versions of
      kvm_guest_{enter|exit} that assume being called locked.
      kvm_guest_enter now disables interrupts for the full function and
      thus we can remove the check for preemptible.
      
      This patch then adopts s390/kvm to use local_irq_disable/enable calls
      which are slighty cheaper that local_irq_save/restore and call these
      new functions.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0097d12e
    • Luiz Capitulino's avatar
      kvmclock: set scheduler clock stable · ff7bbb9c
      Luiz Capitulino authored
      If you try to enable NOHZ_FULL on a guest today, you'll get
      the following error when the guest tries to deactivate the
      scheduler tick:
      
       WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290()
       NO_HZ FULL will not work with unstable sched clock
       CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb7 #204
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       Workqueue: events flush_to_ldisc
        ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002
        ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8
        0000000000000000 0000000000000003 0000000000000001 0000000000000001
       Call Trace:
        <IRQ>  [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b
        [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0
        [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50
        [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290
        [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0
        [<ffffffff810511c5>] irq_exit+0xc5/0x130
        [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60
        [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80
        <EOI>  [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60
        [<ffffffff8108bbc8>] __wake_up+0x48/0x60
        [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0
        [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70
        [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20
        [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120
        [<ffffffff81064d05>] process_one_work+0x1d5/0x540
        [<ffffffff81064c81>] ? process_one_work+0x151/0x540
        [<ffffffff81065191>] worker_thread+0x121/0x470
        [<ffffffff81065070>] ? process_one_work+0x540/0x540
        [<ffffffff8106b4df>] kthread+0xef/0x110
        [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
        [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70
        [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
       ---[ end trace 06e3507544a38866 ]---
      
      However, it turns out that kvmclock does provide a stable
      sched_clock callback. So, let the scheduler know this which
      in turn makes NOHZ_FULL work in the guest.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ff7bbb9c
  3. 04 May, 2015 3 commits
    • Linus Torvalds's avatar
      Linux 4.1-rc2 · 5ebe6afa
      Linus Torvalds authored
      5ebe6afa
    • Linus Torvalds's avatar
      Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 8663da2c
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Some miscellaneous bug fixes and some final on-disk and ABI changes
        for ext4 encryption which provide better security and performance"
      
      * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix growing of tiny filesystems
        ext4: move check under lock scope to close a race.
        ext4: fix data corruption caused by unwritten and delayed extents
        ext4 crypto: remove duplicated encryption mode definitions
        ext4 crypto: do not select from EXT4_FS_ENCRYPTION
        ext4 crypto: add padding to filenames before encrypting
        ext4 crypto: simplify and speed up filename encryption
      8663da2c
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 101a6fd3
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "One intel fix, one rockchip fix, and a bunch of radeon fixes for some
        regressions from audio rework and vm stability"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/i915/chv: Implement WaDisableShadowRegForCpd
        drm/radeon: fix userptr return value checking (v2)
        drm/radeon: check new address before removing old one
        drm/radeon: reset BOs address after clearing it.
        drm/radeon: fix lockup when BOs aren't part of the VM on release
        drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
        drm/radeon: adjust pll when audio is not enabled
        drm/radeon: only enable audio streams if the monitor supports it
        drm/radeon: only mark audio as connected if the monitor supports it (v3)
        drm/radeon/audio: don't enable packets until the end
        drm/radeon: drop dce6_dp_enable
        drm/radeon: fix ordering of AVI packet setup
        drm/radeon: Use drm_calloc_ab for CS relocs
        drm/rockchip: fix error check when getting irq
        MAINTAINERS: add entry for Rockchip drm drivers
      101a6fd3
  4. 03 May, 2015 8 commits
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel into drm-fixes · 71aee819
      Dave Airlie authored
      Just a single intel fix
      * tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel:
        drm/i915/chv: Implement WaDisableShadowRegForCpd
      71aee819
    • Dave Airlie's avatar
      Merge branch 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip into drm-fixes · df9ebeb2
      Dave Airlie authored
      one fix and maintainers update
      * 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip:
        drm/rockchip: fix error check when getting irq
        MAINTAINERS: add entry for Rockchip drm drivers
      df9ebeb2
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 61f06db0
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is three logical fixes (as 5 patches).
      
        The 3ware class of drivers were causing an oops with multiqueue by
        tearing down the command mappings after completing the command (where
        the variables in the command used to tear down the mapping were
        no-longer valid).  There's also a fix for the qnap iscsi target which
        was choking on us sending it commands that were too long and a fix for
        the reworked aha1542 allocating GFP_KERNEL under a lock"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        3w-9xxx: fix command completion race
        3w-xxxx: fix command completion race
        3w-sas: fix command completion race
        aha1542: Allocate memory before taking a lock
        SCSI: add 1024 max sectors black list flag
      61f06db0
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma · 33332224
      Linus Torvalds authored
      Pull slave dmaengine fixes from Vinod Koul:
       "Here are the fixes in dmaengine subsystem for rc2:
      
         - privatecnt fix for slave dma request API by Christopher
      
         - warn fix for PM ifdef in usb-dmac by Geert
      
         - fix hardware dependency for xgene by Jean"
      
      * 'next' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: increment privatecnt when using dma_get_any_slave_channel
        dmaengine: xgene: Set hardware dependency
        dmaengine: usb-dmac: Protect PM-only functions to kill warning
      33332224
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux · 180d89f6
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       - build fix for SMP=n in book3s_xics.c
       - fix for Daniel's pci_controller_ops on powernv.
       - revert the TM syscall abort patch for now.
       - CPU affinity fix from Nathan.
       - two EEH fixes from Gavin.
       - fix for CR corruption from Sam.
       - selftest build fix.
      
      * tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
        powerpc/powernv: Restore non-volatile CRs after nap
        powerpc/eeh: Delay probing EEH device during hotplug
        powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
        powerpc/pseries: Correct cpu affinity for dlpar added cpus
        selftests/powerpc: Fix the pmu install rule
        Revert "powerpc/tm: Abort syscalls in active transactions"
        powerpc/powernv: Fix early pci_controller_ops loading.
        powerpc/kvm: Fix SMP=n build error in book3s_xics.c
      180d89f6
    • Jan Kara's avatar
      ext4: fix growing of tiny filesystems · 2c869b26
      Jan Kara authored
      The estimate of necessary transaction credits in ext4_flex_group_add()
      is too pessimistic. It reserves credit for sb, resize inode, and resize
      inode dindirect block for each group added in a flex group although they
      are always the same block and thus it is enough to account them only
      once. Also the number of modified GDT block is overestimated since we
      fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.
      
      Make the estimation more precise. That reduces number of requested
      credits enough that we can grow 20 MB filesystem (which has 1 MB
      journal, 79 reserved GDT blocks, and flex group size 16 by default).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      2c869b26
    • Davide Italiano's avatar
      ext4: move check under lock scope to close a race. · 280227a7
      Davide Italiano authored
      fallocate() checks that the file is extent-based and returns
      EOPNOTSUPP in case is not. Other tasks can convert from and to
      indirect and extent so it's safe to check only after grabbing
      the inode mutex.
      Signed-off-by: default avatarDavide Italiano <dccitaliano@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      280227a7
    • Lukas Czerner's avatar
      ext4: fix data corruption caused by unwritten and delayed extents · d2dc317d
      Lukas Czerner authored
      Currently it is possible to lose whole file system block worth of data
      when we hit the specific interaction with unwritten and delayed extents
      in status extent tree.
      
      The problem is that when we insert delayed extent into extent status
      tree the only way to get rid of it is when we write out delayed buffer.
      However there is a limitation in the extent status tree implementation
      so that when inserting unwritten extent should there be even a single
      delayed block the whole unwritten extent would be marked as delayed.
      
      At this point, there is no way to get rid of the delayed extents,
      because there are no delayed buffers to write out. So when a we write
      into said unwritten extent we will convert it to written, but it still
      remains delayed.
      
      When we try to write into that block later ext4_da_map_blocks() will set
      the buffer new and delayed and map it to invalid block which causes
      the rest of the block to be zeroed loosing already written data.
      
      For now we can fix this by simply not allowing to set delayed status on
      written extent in the extent status tree. Also add WARN_ON() to make
      sure that we notice if this happens in the future.
      
      This problem can be easily reproduced by running the following xfs_io.
      
      xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
                -c "falloc 0 131072" \
                -c "pwrite -S 0xbb 65536 2048" \
                -c "fsync" /mnt/test/fff
      
      echo 3 > /proc/sys/vm/drop_caches
      xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff
      
      This can be theoretically also reproduced by at random by running fsx,
      but it's not very reliable, though on machines with bigger page size
      (like ppc) this can be seen more often (especially xfstest generic/127)
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d2dc317d
  5. 02 May, 2015 7 commits
  6. 01 May, 2015 2 commits
    • Ilya Dryomov's avatar
      rbd: end I/O the entire obj_request on error · 082a75da
      Ilya Dryomov authored
      When we end I/O struct request with error, we need to pass
      obj_request->length as @nr_bytes so that the entire obj_request worth
      of bytes is completed.  Otherwise block layer ends up confused and we
      trip on
      
          rbd_assert(more ^ (which == img_request->obj_request_count));
      
      in rbd_img_obj_callback() due to more being true no matter what.  We
      already do it in most cases but we are missing some, in particular
      those where we don't even get a chance to submit any obj_requests, due
      to an early -ENOMEM for example.
      
      A number of obj_request->xferred assignments seem to be redundant but
      I haven't touched any of obj_request->xferred stuff to keep this small
      and isolated.
      
      Cc: Alex Elder <elder@linaro.org>
      Cc: stable@vger.kernel.org # 3.10+
      Reported-by: default avatarShawn Edwards <lesser.evil@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      082a75da
    • Theodore Ts'o's avatar
      ext4 crypto: add padding to filenames before encrypting · a44cd7a0
      Theodore Ts'o authored
      This obscures the length of the filenames, to decrease the amount of
      information leakage.  By default, we pad the filenames to the next 4
      byte boundaries.  This costs nothing, since the directory entries are
      aligned to 4 byte boundaries anyway.  Filenames can also be padded to
      8, 16, or 32 bytes, which will consume more directory space.
      
      Change-Id: Ibb7a0fb76d2c48e2061240a709358ff40b14f322
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a44cd7a0