1. 20 Aug, 2013 3 commits
    • David Vrabel's avatar
      xen/events: mask events when changing their VCPU binding · 4704fe4f
      David Vrabel authored
      When a event is being bound to a VCPU there is a window between the
      EVTCHNOP_bind_vpcu call and the adjustment of the local per-cpu masks
      where an event may be lost.  The hypervisor upcalls the new VCPU but
      the kernel thinks that event is still bound to the old VCPU and
      ignores it.
      
      There is even a problem when the event is being bound to the same VCPU
      as there is a small window beween the clear_bit() and set_bit() calls
      in bind_evtchn_to_cpu().  When scanning for pending events, the kernel
      may read the bit when it is momentarily clear and ignore the event.
      
      Avoid this by masking the event during the whole bind operation.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      CC: stable@vger.kernel.org
      4704fe4f
    • David Vrabel's avatar
      xen/events: initialize local per-cpu mask for all possible events · 84ca7a8e
      David Vrabel authored
      The sizeof() argument in init_evtchn_cpu_bindings() is incorrect
      resulting in only the first 64 (or 32 in 32-bit guests) ports having
      their bindings being initialized to VCPU 0.
      
      In most cases this does not cause a problem as request_irq() will set
      the irq affinity which will set the correct local per-cpu mask.
      However, if the request_irq() is called on a VCPU other than 0, there
      is a window between the unmasking of the event and the affinity being
      set were an event may be lost because it is not locally unmasked on
      any VCPU. If request_irq() is called on VCPU 0 then local irqs are
      disabled during the window and the race does not occur.
      
      Fix this by initializing all NR_EVENT_CHANNEL bits in the local
      per-cpu masks.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: stable@vger.kernel.org
      84ca7a8e
    • David Vrabel's avatar
      x86/xen: do not identity map UNUSABLE regions in the machine E820 · 3bc38cbc
      David Vrabel authored
      If there are UNUSABLE regions in the machine memory map, dom0 will
      attempt to map them 1:1 which is not permitted by Xen and the kernel
      will crash.
      
      There isn't anything interesting in the UNUSABLE region that the dom0
      kernel needs access to so we can avoid making the 1:1 mapping and
      treat it as RAM.
      
      We only do this for dom0, as that is where tboot case shows up.
      A PV domU could have an UNUSABLE region in its pseudo-physical map
      and would need to be handled in another patch.
      
      This fixes a boot failure on hosts with tboot.
      
      tboot marks a region in the e820 map as unusable and the dom0 kernel
      would attempt to map this region and Xen does not permit unusable
      regions to be mapped by guests.
      
        (XEN)  0000000000000000 - 0000000000060000 (usable)
        (XEN)  0000000000060000 - 0000000000068000 (reserved)
        (XEN)  0000000000068000 - 000000000009e000 (usable)
        (XEN)  0000000000100000 - 0000000000800000 (usable)
        (XEN)  0000000000800000 - 0000000000972000 (unusable)
      
      tboot marked this region as unusable.
      
        (XEN)  0000000000972000 - 00000000cf200000 (usable)
        (XEN)  00000000cf200000 - 00000000cf38f000 (reserved)
        (XEN)  00000000cf38f000 - 00000000cf3ce000 (ACPI data)
        (XEN)  00000000cf3ce000 - 00000000d0000000 (reserved)
        (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
        (XEN)  00000000fe000000 - 0000000100000000 (reserved)
        (XEN)  0000000100000000 - 0000000630000000 (usable)
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      [v1: Altered the patch and description with domU's with UNUSABLE regions]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3bc38cbc
  2. 05 Aug, 2013 2 commits
  3. 30 Jul, 2013 2 commits
    • Stefano Stabellini's avatar
      xen/tmem: do not allow XEN_TMEM on ARM64 · 741ddbcf
      Stefano Stabellini authored
      tmem is not supported on arm or arm64 yet. Will revert this
      once the Xen hypervisor supports it.
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      741ddbcf
    • David Vrabel's avatar
      xen/evtchn: avoid a deadlock when unbinding an event channel · 179fbd5a
      David Vrabel authored
      Unbinding an event channel (either with the ioctl or when the evtchn
      device is closed) may deadlock because disable_irq() is called with
      port_user_lock held which is also locked by the interrupt handler.
      
      Think of the IOCTL_EVTCHN_UNBIND is being serviced, the routine has
      just taken the lock, and an interrupt happens. The evtchn_interrupt
      is invoked, tries to take the lock and spins forever.
      
      A quick glance at the code shows that the spinlock is a local IRQ
      variant. Unfortunately that does not help as "disable_irq() waits for
      the interrupt handler on all CPUs to stop running.  If the irq occurs
      on another VCPU, it tries to take port_user_lock and can't because
      the unbind ioctl is holding it." (from David). Hence we cannot
      depend on the said spinlock to protect us. We could make it a system
      wide IRQ disable spinlock but there is a better way.
      
      We can piggyback on the fact that the existence of the spinlock is
      to make get_port_user() checks be up-to-date. And we can alter those
      checks to not depend on the spin lock (as it's protected by u->bind_mutex
      in the ioctl) and can remove the unnecessary locking (this is
      IOCTL_EVTCHN_UNBIND) path.
      
      In the interrupt handler we cannot use the mutex, but we do not
      need it.
      
      "The unbind disables the irq before making the port user stale, so when
      you clear it you are guaranteed that the interrupt handler that might
      use that port cannot be running." (from David).
      
      Hence this patch removes the spinlock usage on the teardown path
      and piggybacks on disable_irq happening before we muck with the
      get_port_user() data. This ensures that the interrupt handler will
      never run on stale data.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v1: Expanded the commit description a bit]
      179fbd5a
  4. 29 Jul, 2013 3 commits
  5. 28 Jun, 2013 3 commits
  6. 14 Jun, 2013 1 commit
    • Konrad Rzeszutek Wilk's avatar
      xen/pcifront: Deal with toolstack missing 'XenbusStateClosing' state. · 098b1aea
      Konrad Rzeszutek Wilk authored
      There are two tool-stack that can instruct the Xen PCI frontend
      and backend to change states: 'xm' (Python code with a daemon),
      and 'xl' (C library - does not keep state changes).
      
      With the 'xm', the path to disconnect a single PCI device (xm pci-detach
      <guest> <BDF>) is:
      
      4(Connected)->7(Reconfiguring*)-> 8(Reconfigured)-> 4(Connected)->5(Closing*).
      
      The * is for states that the tool-stack sets. For 'xl', it is similar:
      
      4(Connected)->7(Reconfiguring*)-> 8(Reconfigured)-> 4(Connected)
      
      Both of them also tear down the XenBus structure, so the backend
      state ends up going in the 3(Initialised) and calls pcifront_xenbus_remove.
      
      When a PCI device is plugged back in (xm pci-attach <guest> <BDF>)
      both of them follow the same pattern:
      
      2(InitWait*), 3(Initialized*), 4(Connected*)->4(Connected).
      
      [xen-pcifront ignores the 2,3 state changes and only acts when
      4 (Connected) has been reached]
      
      Note that this is for a _single_ PCI device. If there were two
      PCI devices and only one was disconnected 'xm' would show the same
      state changes.
      
      The problem is that git commit 3d925320
      ("xen/pcifront: Use Xen-SWIOTLB when initting if required") introduced
      a mechanism to initialize the SWIOTLB when the Xen PCI front moves to
      Connected state. It also had some aggressive seatbelt code check that
      would warn the user if one tried to change to Connected state without
      hitting first the Closing state:
      
       pcifront pci-0: PCI frontend already installed!
      
      However, that code can be relaxed and we can continue on working
      even if the frontend is instructed to be the 'Connected' state with
      no devices and then gets tickled to be in 'Connected' state again.
      
      In other words, this 4(Connected)->5(Closing)->4(Connected) state
      was expected, while 4(Connected)->.... anything but 5(Closing)->4(Connected)
      was not. This patch removes that aggressive check and allows
      Xen pcifront to work with the 'xl' toolstack (for one or more
      PCI devices) and with 'xm' toolstack (for more than two PCI
      devices).
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: linux-pci@vger.kernel.org
      Cc: stable@vger.kernel.org
      [v2: Added in the description about two PCI devices]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      098b1aea
  7. 10 Jun, 2013 10 commits
    • Konrad Rzeszutek Wilk's avatar
      xen/time: Free onlined per-cpu data structure if we want to online it again. · 09e99da7
      Konrad Rzeszutek Wilk authored
      If the per-cpu time data structure has been onlined already and
      we are trying to online it again, then free the previous copy
      before blindly over-writting it.
      
      A developer naturally should not call this function multiple times
      but just in case.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      09e99da7
    • Konrad Rzeszutek Wilk's avatar
      xen/time: Check that the per_cpu data structure has data before freeing. · a05e2c37
      Konrad Rzeszutek Wilk authored
      We don't check whether the per_cpu data structure has actually
      been freed in the past. This checks it and if it has been freed
      in the past then just continues on without double-freeing.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a05e2c37
    • Konrad Rzeszutek Wilk's avatar
      xen/time: Don't leak interrupt name when offlining. · c9d76a24
      Konrad Rzeszutek Wilk authored
      When the user does:
          echo 0 > /sys/devices/system/cpu/cpu1/online
          echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      One of the leaks is from xen/time:
      
      unreferenced object 0xffff88003fa51280 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          74 69 6d 65 72 31 00 00 00 00 00 00 00 00 00 00  timer1..........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81041ec1>] xen_setup_timer+0x51/0xf0
          [<ffffffff8166339f>] xen_cpu_up+0x5f/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This patch fixes it by stashing away the 'name' in the per-cpu
      data structure and freeing it when offlining the CPU.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c9d76a24
    • Konrad Rzeszutek Wilk's avatar
      xen/time: Encapsulate the struct clock_event_device in another structure. · 31620a19
      Konrad Rzeszutek Wilk authored
      We don't do any code movement. We just encapsulate the struct clock_event_device
      in a new structure which contains said structure and a pointer to
      a char *name. The 'name' will be used in 'xen/time: Don't leak interrupt
      name when offlining'.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      31620a19
    • Konrad Rzeszutek Wilk's avatar
      xen/spinlock: Don't leak interrupt name when offlining. · 354e7b76
      Konrad Rzeszutek Wilk authored
      When the user does:
      echo 0 > /sys/devices/system/cpu/cpu1/online
      echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      unreferenced object 0xffff88003fa51260 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          73 70 69 6e 6c 6f 63 6b 31 00 00 00 00 00 00 00  spinlock1.......
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81663789>] xen_init_lock_cpu+0x61/0xbe
          [<ffffffff816633a6>] xen_cpu_up+0x66/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Instead of doing it like the "xen/smp: Don't leak interrupt name when offlining"
      patch did (which has a per-cpu structure which contains both the
      IRQ number and char*) we use a per-cpu pointers to a *char.
      
      The reason is that the "__this_cpu_read(lock_kicker_irq);" macro
      blows up with "__bad_size_call_parameter()" as the size of the
      returned structure is not within the parameters of what it expects
      and optimizes for.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      354e7b76
    • Konrad Rzeszutek Wilk's avatar
      xen/smp: Don't leak interrupt name when offlining. · b85fffec
      Konrad Rzeszutek Wilk authored
      When the user does:
      echo 0 > /sys/devices/system/cpu/cpu1/online
      echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      unreferenced object 0xffff88003fa51240 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          72 65 73 63 68 65 64 31 00 00 00 00 00 00 00 00  resched1........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81047ed1>] xen_smp_intr_init+0x41/0x2c0
          [<ffffffff816636d3>] xen_cpu_up+0x393/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This patch fixes some of it by using the 'struct xen_common_irq->name'
      field to stash away the char so that it can be freed when
      the interrupt line is destroyed.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b85fffec
    • Konrad Rzeszutek Wilk's avatar
      xen/smp: Set the per-cpu IRQ number to a valid default. · ee336e10
      Konrad Rzeszutek Wilk authored
      When we free it we want to make sure to set it to a default
      value of -1 so that we don't double-free it (in case somebody
      calls us twice).
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ee336e10
    • Konrad Rzeszutek Wilk's avatar
      xen/smp: Introduce a common structure to contain the IRQ name and interrupt line. · 9547689f
      Konrad Rzeszutek Wilk authored
      This patch adds a new structure to contain the common two things
      that each of the per-cpu interrupts need:
       - an interrupt number,
       - and the name of the interrupt (to be added in 'xen/smp: Don't leak
         interrupt name when offlining').
      
      This allows us to carry the tuple of the per-cpu interrupt data structure
      and expand it as we need in the future.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9547689f
    • Konrad Rzeszutek Wilk's avatar
      xen/smp: Coalesce the free_irq calls in one function. · 53b94fdc
      Konrad Rzeszutek Wilk authored
      There are two functions that do a bunch of 'free_irq' on
      the per_cpu IRQ. Instead of having duplicate code just move
      it to one function.
      
      This is just code movement.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      53b94fdc
    • Wei Yongjun's avatar
      xen-pciback: fix error return code in pcistub_irq_handler_switch() · 405010df
      Wei Yongjun authored
      Fix to return -ENOENT in the pcistub_device_find() and pci_get_drvdata()
      error handling case instead of 0(overwrite to 0 by str_to_slot()), as done
      elsewhere in this function.
      Acked-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      405010df
  8. 29 May, 2013 5 commits
  9. 28 May, 2013 1 commit
  10. 20 May, 2013 10 commits