1. 31 Jul, 2012 2 commits
  2. 26 Jul, 2012 1 commit
  3. 25 Jul, 2012 3 commits
  4. 24 Jul, 2012 1 commit
  5. 21 Jul, 2012 7 commits
  6. 20 Jul, 2012 15 commits
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · d75e2c9a
      Linus Torvalds authored
      Pull late MIPS fixes from Ralf Baechle:
       "This fixes a number of lose ends in the MIPS code and various bug
        fixes.
      
        Aside of dropping some patch that should not be in this pull request
        everything has sat in -next for quite a while and there are no known
        issues.
      
        The biggest patch in this patch set moves the allocation of an array
        that is aliased to a function (for runtime generated code) to
        assembler code.  This avoids an issue with certain toolchains when
        building for microMIPS."
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (35 commits)
        MIPS: PCI: Move fixups from __init to __devinit.
        MIPS: Fix bug.h MIPS build regression
        MIPS: sync-r4k: remove redundant irq operation
        MIPS: smp: Warn on too early irq enable
        MIPS: call set_cpu_online() on cpu being brought up with irq disabled
        MIPS: call ->smp_finish() a little late
        MIPS: Yosemite: delay irq enable to ->smp_finish()
        MIPS: SMTC: delay irq enable to ->smp_finish()
        MIPS: BMIPS: delay irq enable to ->smp_finish()
        MIPS: Octeon: delay enable irq to ->smp_finish()
        MIPS: Oprofile: Fix build as a module.
        MIPS: BCM63XX: Fix BCM6368 IPSec clock bit
        MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()
        MIPS: Fix Magic SysRq L kernel crash.
        MIPS: BMIPS: Fix duplicate header inclusion.
        mips: mark const init data with __initconst instead of __initdata
        MIPS: cmpxchg.h: Add missing include
        MIPS: Malta may also be equipped with MIPS64 R2 processors.
        MIPS: Fix typo multipy -> multiply
        MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.
        ...
      d75e2c9a
    • Linus Torvalds's avatar
      Merge tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm · 93517374
      Linus Torvalds authored
      Pull device-mapper discard fixes from Alasdair G Kergon:
        - avoid a crash in dm-raid1 when discards coincide with mirror
          recovery;
        - avoid discarding shared data that's still needed in dm-thin;
        - don't guarantee that discarded blocks will be wiped in dm-raid1.
      
      * tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
        dm raid1: set discard_zeroes_data_unsupported
        dm thin: do not send discards to shared blocks
        dm raid1: fix crash with mirror recovery and discard
      93517374
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd · ce9f8d6b
      Linus Torvalds authored
      Pull pnfs/ore fixes from Boaz Harrosh:
       "These are catastrophic fixes to the pnfs objects-layout that were just
        discovered.  They are also destined for @stable.
      
        I have found these and worked on them at around RC1 time but
        unfortunately went to the hospital for kidney stones and had a very
        slow recovery.  I refrained from sending them as is, before proper
        testing, and surly I have found a bug just yesterday.
      
        So now they are all well tested, and have my sign-off.  Other then
        fixing the problem at hand, and assuming there are no bugs at the new
        code, there is low risk to any surrounding code.  And in anyway they
        affect only these paths that are now broken.  That is RAID5 in pnfs
        objects-layout code.  It does also affect exofs (which was not broken)
        but I have tested exofs and it is lower priority then objects-layout
        because no one is using exofs, but objects-layout has lots of users."
      
      * 'for-linus' of git://git.open-osd.org/linux-open-osd:
        pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
        pnfs-obj: don't leak objio_state if ore_write/read fails
        ore: Unlock r4w pages in exact reverse order of locking
        ore: Remove support of partial IO request (NFS crash)
        ore: Fix NFS crash by supporting any unaligned RAID IO
      ce9f8d6b
    • Linus Torvalds's avatar
      Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs · 17934162
      Linus Torvalds authored
      Pull UBIFS free space fix-up bugfix from Artem Bityutskiy:
       "It's been reported already twice recently:
      
          http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
          http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html
      
        and we finally have the fix.  I am quite confident the fix is correct
        because I could reproduce the problem with nandsim and verify the fix.
        It was also verified by Iwo (the reporter).
      
        I am also confident that this is OK to merge the fix so late because
        this patch affects only the fixup functionality, which is not used by
        most users."
      
      * tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs:
        UBIFS: fix a bug in empty space fix-up
      17934162
    • Mikulas Patocka's avatar
      dm raid1: set discard_zeroes_data_unsupported · 7c8d3a42
      Mikulas Patocka authored
      We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if
      the underlying disks support zero on discard.  So this patch sets
      ti->discard_zeroes_data_unsupported.
      
      For example, if the mirror is in the process of resynchronizing, it may
      happen that kcopyd reads a piece of data, then discard is sent on the
      same area and then kcopyd writes the piece of data to another leg.
      Consequently, the data is not zeroed.
      
      The flag was made available by commit 983c7db3
      (dm crypt: always disable discard_zeroes_data).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      7c8d3a42
    • Mikulas Patocka's avatar
      dm thin: do not send discards to shared blocks · 650d2a06
      Mikulas Patocka authored
      When process_discard receives a partial discard that doesn't cover a
      full block, it sends this discard down to that block. Unfortunately, the
      block can be shared and the discard would corrupt the other snapshots
      sharing this block.
      
      This patch detects block sharing and ends the discard with success when
      sending it to the shared block.
      
      The above change means that if the device supports discard it can't be
      guaranteed that a discard request zeroes data. Therefore, we set
      ti->discard_zeroes_data_unsupported.
      
      Thin target discard support with this bug arrived in commit
      104655fd (dm thin: support discards).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      650d2a06
    • Mikulas Patocka's avatar
      dm raid1: fix crash with mirror recovery and discard · 751f188d
      Mikulas Patocka authored
      This patch fixes a crash when a discard request is sent during mirror
      recovery.
      
      Firstly, some background.  Generally, the following sequence happens during
      mirror synchronization:
      - function do_recovery is called
      - do_recovery calls dm_rh_recovery_prepare
      - dm_rh_recovery_prepare uses a semaphore to limit the number
        simultaneously recovered regions (by default the semaphore value is 1,
        so only one region at a time is recovered)
      - dm_rh_recovery_prepare calls __rh_recovery_prepare,
        __rh_recovery_prepare asks the log driver for the next region to
        recover. Then, it sets the region state to DM_RH_RECOVERING. If there
        are no pending I/Os on this region, the region is added to
        quiesced_regions list. If there are pending I/Os, the region is not
        added to any list. It is added to the quiesced_regions list later (by
        dm_rh_dec function) when all I/Os finish.
      - when the region is on quiesced_regions list, there are no I/Os in
        flight on this region. The region is popped from the list in
        dm_rh_recovery_start function. Then, a kcopyd job is started in the
        recover function.
      - when the kcopyd job finishes, recovery_complete is called. It calls
        dm_rh_recovery_end. dm_rh_recovery_end adds the region to
        recovered_regions or failed_recovered_regions list (depending on
        whether the copy operation was successful or not).
      
      The above mechanism assumes that if the region is in DM_RH_RECOVERING
      state, no new I/Os are started on this region. When I/O is started,
      dm_rh_inc_pending is called, which increases reg->pending count. When
      I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
      If the count is zero and the region was in DM_RH_RECOVERING state,
      dm_rh_dec adds it to the quiesced_regions list.
      
      Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
      in DM_RH_RECOVERING state, it could be added to quiesced_regions list
      multiple times or it could be added to this list when kcopyd is copying
      data (it is assumed that the region is not on any list while kcopyd does
      its jobs). This results in memory corruption and crash.
      
      There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
      do not belong to any region, so they are always added to the sync list
      in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
      requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
      requests. These bypasses avoid the crash possibility described above.
      
      These bypasses were improperly implemented for REQ_DISCARD when
      the mirror target gained discard support in commit
      5fc2ffea (dm raid1: support discard).
      
      In do_writes, REQ_DISCARD requests is always added to the sync queue and
      immediately dispatched (even if the region is in DM_RH_RECOVERING).  However,
      dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts.  So it violates the
      rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
      corruption described above.
      
      This patch changes it so that REQ_DISCARD requests follow the same path
      as REQ_FLUSH. This avoids the crash.
      
      Reference: https://bugzilla.redhat.com/837607Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      751f188d
    • Boaz Harrosh's avatar
      pnfs-obj: Fix __r4w_get_page when offset is beyond i_size · c999ff68
      Boaz Harrosh authored
      It is very common for the end of the file to be unaligned on
      stripe size. But since we know it's beyond file's end then
      the XOR should be preformed with all zeros.
      
      Old code used to just read zeros out of the OSD devices, which is a great
      waist. But what scares me more about this situation is that, we now have
      pages attached to the file's mapping that are beyond i_size. I don't
      like the kind of bugs this calls for.
      
      Fix both birds, by returning a global zero_page, if offset is beyond
      i_size.
      
      TODO:
      	Change the API to ->__r4w_get_page() so a NULL can be
      	returned without being considered as error, since XOR API
      	treats NULL entries as zero_pages.
      
      [Bug since 3.2. Should apply the same way to all Kernels since]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      c999ff68
    • Boaz Harrosh's avatar
      pnfs-obj: don't leak objio_state if ore_write/read fails · 9909d45a
      Boaz Harrosh authored
      [Bug since 3.2 Kernel]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      9909d45a
    • Boaz Harrosh's avatar
      ore: Unlock r4w pages in exact reverse order of locking · 537632e0
      Boaz Harrosh authored
      The read-4-write pages are locked in address ascending order.
      But where unlocked in a way easiest for coding. Fix that,
      locks should be released in opposite order of locking, .i.e
      descending address order.
      
      I have not hit this dead-lock. It was found by inspecting the
      dbug print-outs. I suspect there is an higher lock at caller that
      protects us, but fix it regardless.
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      537632e0
    • Boaz Harrosh's avatar
      ore: Remove support of partial IO request (NFS crash) · 62b62ad8
      Boaz Harrosh authored
      Do to OOM situations the ore might fail to allocate all resources
      needed for IO of the full request. If some progress was possible
      it would proceed with a partial/short request, for the sake of
      forward progress.
      
      Since this crashes NFS-core and exofs is just fine without it just
      remove this contraption, and fail.
      
      TODO:
      	Support real forward progress with some reserved allocations
      	of resources, such as mem pools and/or bio_sets
      
      [Bug since 3.2 Kernel]
      CC: Stable Tree <stable@kernel.org>
      CC: Benny Halevy <bhalevy@tonian.com>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      62b62ad8
    • Boaz Harrosh's avatar
      ore: Fix NFS crash by supporting any unaligned RAID IO · 9ff19309
      Boaz Harrosh authored
      In RAID_5/6 We used to not permit an IO that it's end
      byte is not stripe_size aligned and spans more than one stripe.
      .i.e the caller must check if after submission the actual
      transferred bytes is shorter, and would need to resubmit
      a new IO with the remainder.
      
      Exofs supports this, and NFS was supposed to support this
      as well with it's short write mechanism. But late testing has
      exposed a CRASH when this is used with none-RPC layout-drivers.
      
      The change at NFS is deep and risky, in it's place the fix
      at ORE to lift the limitation is actually clean and simple.
      So here it is below.
      
      The principal here is that in the case of unaligned IO on
      both ends, beginning and end, we will send two read requests
      one like old code, before the calculation of the first stripe,
      and also a new site, before the calculation of the last stripe.
      If any "boundary" is aligned or the complete IO is within a single
      stripe. we do a single read like before.
      
      The code is clean and simple by splitting the old _read_4_write
      into 3 even parts:
      1._read_4_write_first_stripe
      2. _read_4_write_last_stripe
      3. _read_4_write_execute
      
      And calling 1+3 at the same place as before. 2+3 before last
      stripe, and in the case of all in a single stripe then 1+2+3
      is preformed additively.
      
      Why did I not think of it before. Well I had a strike of
      genius because I have stared at this code for 2 years, and did
      not find this simple solution, til today. Not that I did not try.
      
      This solution is much better for NFS than the previous supposedly
      solution because the short write was dealt  with out-of-band after
      IO_done, which would cause for a seeky IO pattern where as in here
      we execute in order. At both solutions we do 2 separate reads, only
      here we do it within a single IO request. (And actually combine two
      writes into a single submission)
      
      NFS/exofs code need not change since the ORE API communicates the new
      shorter length on return, what will happen is that this case would not
      occur anymore.
      
      hurray!!
      
      [Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      9ff19309
    • Artem Bityutskiy's avatar
      UBIFS: fix a bug in empty space fix-up · c6727932
      Artem Bityutskiy authored
      UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
      limitations of dumb flasher programs. Namely, of those flashers that are unable
      to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
      the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
      relatively new (introduced in v3.0).
      
      The fix-up routine (fixup_free_space()) is executed only once at the very first
      mount if the superblock has the 'space_fixup' flag set (can be done with -F
      option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
      writes it back to the same LEB. The routine assumes the image is pristine and
      does not have anything in the journal.
      
      There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
      All but one LEB of the log of a pristine file-system are empty. And one
      contains just a commit start node. And 'fixup_free_space()' just unmapped this
      LEB, which resulted in wiping the commit start node. As a result, some users
      were unable to mount the file-system next time with the following symptom:
      
      UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
      UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0
      
      The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
      that the beginning of empty space in the log head (c->lhead_offs) was known
      on mount. However, it is not the case - it was always 0. UBIFS does not store
      in it the master node and finds out by scanning the log on every mount.
      
      The fix is simple - just pass commit start node size instead of 0 to
      'fixup_leb()'.
      Signed-off-by: default avatarArtem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
      Cc: stable@vger.kernel.org [v3.0+]
      Reported-by: default avatarIwo Mergler <Iwo.Mergler@netcommwireless.com>
      Tested-by: default avatarIwo Mergler <Iwo.Mergler@netcommwireless.com>
      Reported-by: default avatarJames Nute <newten82@gmail.com>
      c6727932
    • Len Brown's avatar
      tools/power: turbostat: fix large c1% issue · c3ae331d
      Len Brown authored
      Under some conditions, c1% was displayed as very large number,
      much higher than 100%.
      
      c1% is not measured, it is derived as "that, which is left over"
      from other counters.  However, the other counters are not collected
      atomically, and so it is possible for c1% to be calaculagted as
      a small negative number -- displayed as very large positive.
      
      There was a check for mperf vs tsc for this already,
      but it needed to also include the other counters
      that are used to calculate c1.
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      c3ae331d
    • Len Brown's avatar
      tools/power: turbostat v2 - re-write for efficiency · c98d5d94
      Len Brown authored
      Measuring large profoundly-idle configurations
      requires turbostat to be more lightweight.
      Otherwise, the operation of turbostat itself
      can interfere with the measurements.
      
      This re-write makes turbostat topology aware.
      Hardware is accessed in "topology order".
      Redundant hardware accesses are deleted.
      Redundant output is deleted.
      Also, output is buffered and
      local RDTSC use replaces remote MSR access for TSC.
      
      From a feature point of view, the output
      looks different since redundant figures are absent.
      Also, there are now -c and -p options -- to restrict
      output to the 1st thread in each core, and the 1st
      thread in each package, respectively.  This is helpful
      to reduce output on big systems, where more detail
      than the "-s" system summary is desired.
      Finally, periodic mode output is now on stdout, not stderr.
      
      Turbostat v2 is also slightly more robust in
      handling run-time CPU online/offline events,
      as it now checks the actual map of on-line cpus rather
      than just the total number of on-line cpus.
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      c98d5d94
  7. 19 Jul, 2012 11 commits