1. 09 Sep, 2009 3 commits
    • Yang Xiaowei's avatar
      xen: use stronger barrier after unlocking lock · 2496afbf
      Yang Xiaowei authored
      We need to have a stronger barrier between releasing the lock and
      checking for any waiting spinners.  A compiler barrier is not sufficient
      because the CPU's ordering rules do not prevent the read xl->spinners
      from happening before the unlock assignment, as they are different
      memory locations.
      
      We need to have an explicit barrier to enforce the write-read ordering
      to different memory locations.
      
      Because of it, I can't bring up > 4 HVM guests on one SMP machine.
      
      [ Code and commit comments expanded -J ]
      
      [ Impact: avoid deadlock when using Xen PV spinlocks ]
      Signed-off-by: default avatarYang Xiaowei <xiaowei.yang@intel.com>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      2496afbf
    • Jeremy Fitzhardinge's avatar
      xen: only enable interrupts while actually blocking for spinlock · 4d576b57
      Jeremy Fitzhardinge authored
      Where possible we enable interrupts while waiting for a spinlock to
      become free, in order to reduce big latency spikes in interrupt handling.
      
      However, at present if we manage to pick up the spinlock just before
      blocking, we'll end up holding the lock with interrupts enabled for a
      while.  This will cause a deadlock if we recieve an interrupt in that
      window, and the interrupt handler tries to take the lock too.
      
      Solve this by shrinking the interrupt-enabled region to just around the
      blocking call.
      
      [ Impact: avoid race/deadlock when using Xen PV spinlocks ]
      Reported-by: default avatar"Yang, Xiaowei" <xiaowei.yang@intel.com>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      4d576b57
    • Jeremy Fitzhardinge's avatar
      xen: make -fstack-protector work under Xen · 577eebea
      Jeremy Fitzhardinge authored
      -fstack-protector uses a special per-cpu "stack canary" value.
      gcc generates special code in each function to test the canary to make
      sure that the function's stack hasn't been overrun.
      
      On x86-64, this is simply an offset of %gs, which is the usual per-cpu
      base segment register, so setting it up simply requires loading %gs's
      base as normal.
      
      On i386, the stack protector segment is %gs (rather than the usual kernel
      percpu %fs segment register).  This requires setting up the full kernel
      GDT and then loading %gs accordingly.  We also need to make sure %gs is
      initialized when bringing up secondary cpus too.
      
      To keep things consistent, we do the full GDT/segment register setup on
      both architectures.
      
      Because we need to avoid -fstack-protected code before setting up the GDT
      and because there's no way to disable it on a per-function basis, several
      files need to have stack-protector inhibited.
      
      [ Impact: allow Xen booting with stack-protector enabled ]
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      577eebea
  2. 05 Sep, 2009 31 commits
  3. 04 Sep, 2009 6 commits
    • Sunil Mushran's avatar
      ocfs2: ocfs2_write_begin_nolock() should handle len=0 · 8379e7c4
      Sunil Mushran authored
      Bug introduced by mainline commit e7432675
      The bug causes ocfs2_write_begin_nolock() to oops when len=0.
      Signed-off-by: default avatarSunil Mushran <sunil.mushran@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      8379e7c4
    • Mikulas Patocka's avatar
      dm snapshot: fix on disk chunk size validation · ae0b7448
      Mikulas Patocka authored
      Fix some problems seen in the chunk size processing when activating a
      pre-existing snapshot.
      
      For a new snapshot, the chunk size can either be supplied by the creator
      or a default value can be used.  For an existing snapshot, the
      chunk size in the snapshot header on disk should always be used.
      
      If someone attempts to load an existing snapshot and has the 'default
      chunk size' option set, the kernel uses its default value even when it
      is incorrect for the snapshot being loaded.  This patch ensures the
      correct on-disk value is always used.
      
      Secondly, when the code does use the chunk size stored on the disk it is
      prudent to revalidate it, so the code can exit cleanly if it got
      corrupted as happened in
      https://bugzilla.redhat.com/show_bug.cgi?id=461506 .
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      ae0b7448
    • Mikulas Patocka's avatar
      dm exception store: split set_chunk_size · 2defcc3f
      Mikulas Patocka authored
      Break the function set_chunk_size to two functions in preparation for
      the fix in the following patch.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      2defcc3f
    • Mikulas Patocka's avatar
      dm snapshot: fix header corruption race on invalidation · 61578dcd
      Mikulas Patocka authored
      If a persistent snapshot fills up, a race can corrupt the on-disk header
      which causes a crash on any future attempt to activate the snapshot
      (typically while booting).  This patch fixes the race.
      
      When the snapshot overflows, __invalidate_snapshot is called, which calls
      snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
      calls write_header. write_header constructs the new header in the "area"
      location.
      
      Concurrently, an existing kcopyd job may finish, call copy_callback
      and commit_exception method, that goes to persistent_commit_exception.
      persistent_commit_exception doesn't do locking, relying on the fact that
      callbacks are single-threaded, but it can race with snapshot invalidation and
      overwrite the header that is just being written while the snapshot is being
      invalidated.
      
      The result of this race is a corrupted header being written that can
      lead to a crash on further reactivation (if chunk_size is zero in the
      corrupted header).
      
      The fix is to use separate memory areas for each.
      
      See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      61578dcd
    • Mikulas Patocka's avatar
      dm snapshot: refactor zero_disk_area to use chunk_io · 02d2fd31
      Mikulas Patocka authored
      Refactor chunk_io to prepare for the fix in the following patch.
      
      Pass an area pointer to chunk_io and simplify zero_disk_area to use
      chunk_io.  No functional change.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      02d2fd31
    • Jonathan Brassow's avatar
      dm log: userspace add luid to distinguish between concurrent log instances · 7ec23d50
      Jonathan Brassow authored
      Device-mapper userspace logs (like the clustered log) are
      identified by a universally unique identifier (UUID).  This
      identifier is used to associate requests from the kernel to
      a specific log in userspace.  The UUID must be unique everywhere,
      since multiple machines may use this identifier when communicating
      about a particular log, as is the case for cluster logs.
      
      Sometimes, device-mapper/LVM may re-use a UUID.  This is the
      case during pvmoves, when moving from one segment of an LV
      to another, or when resizing a mirror, etc.  In these cases,
      a new log is created with the same UUID and loaded in the
      "inactive" slot.  When a device-mapper "resume" is issued,
      the "live" table is deactivated and the new "inactive" table
      becomes "live".  (The "inactive" table can also be removed
      via a device-mapper 'clear' command.)
      
      The above two issues were colliding.  More than one log was being
      created with the same UUID, and there was no way to distinguish
      between them.  So, sometimes the wrong log would be swapped
      out during the exchange.
      
      The solution is to create a locally unique identifier,
      'luid', to go along with the UUID.  This new identifier is used
      to determine exactly which log is being referenced by the kernel
      when the log exchange is made.  The identifier is not
      universally safe, but it does not need to be, since
      create/destroy/suspend/resume operations are bound to a specific
      machine; and these are the operations that make up the exchange.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      7ec23d50