1. 27 Jun, 2013 26 commits
  2. 20 Jun, 2013 14 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.4.50 · 8861fd33
      Greg Kroah-Hartman authored
      8861fd33
    • Benjamin Herrenschmidt's avatar
      powerpc: Fix missing/delayed calls to irq_work · 05266fa3
      Benjamin Herrenschmidt authored
      commit 230b3034 upstream.
      
      When replaying interrupts (as a result of the interrupt occurring
      while soft-disabled), in the case of the decrementer, we are exclusively
      testing for a pending timer target. However we also use decrementer
      interrupts to trigger the new "irq_work", which in this case would
      be missed.
      
      This change the logic to force a replay in both cases of a timer
      boundary reached and a decrementer interrupt having actually occurred
      while disabled. The former test is still useful to catch cases where
      a CPU having been hard-disabled for a long time completely misses the
      interrupt due to a decrementer rollover.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05266fa3
    • Michael Ellerman's avatar
      powerpc: Fix stack overflow crash in resume_kernel when ftracing · 0031e5e3
      Michael Ellerman authored
      commit 0e37739b upstream.
      
      It's possible for us to crash when running with ftrace enabled, eg:
      
        Bad kernel stack pointer bffffd12 at c00000000000a454
        cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
            pc: c00000000000a454: resume_kernel+0x34/0x60
            lr: c00000000000335c: performance_monitor_common+0x15c/0x180
            sp: bffffd12
           msr: 8000000000001032
           dar: bffffd12
         dsisr: 42000000
      
      If we look at current's stack (paca->__current->stack) we see it is
      equal to c0000002ecab0000. Our stack is 16K, and comparing to
      paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
      kernel stack. This leads to us writing over our struct thread_info, and
      in this case we have corrupted thread_info->flags and set
      _TIF_EMULATE_STACK_STORE.
      
      Dumping the stack we see:
      
        3:mon> t c0000002ecab0000
        [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
        [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
        --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
        [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
        [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
        [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
        [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
        [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
        [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
        [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
        --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
        [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
        [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
        [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
        [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
        [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
        [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
        --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
      
        ... and so on
      
      __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
      path. At that point the irq state is not consistent, ie. interrupts are
      hard disabled (by the exception entry), but the paca soft-enabled flag
      may be out of sync.
      
      This leads to the local_irq_restore() in trace_graph_entry() actually
      enabling interrupts, which we do not want. Because we have not yet
      reprogrammed the decrementer we immediately take another decrementer
      exception, and recurse.
      
      The fix is twofold. Firstly make sure we call DISABLE_INTS before
      calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
      the irq state in the paca with the hardware, making it safe again to
      call local_irq_save/restore().
      
      Although that should be sufficient to fix the bug, we also mark the
      runlatch routines as notrace. They are called very early in the
      exception entry and we are asking for trouble tracing them. They are
      also fairly uninteresting and tracing them just adds unnecessary
      overhead.
      
      [ This regression was introduced by fe1952fc
        "powerpc: Rework runlatch code" by myself --BenH
      ]
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0031e5e3
    • Sage Weil's avatar
      ceph: fix statvfs fr_size · 999899ad
      Sage Weil authored
      commit 92a49fb0 upstream.
      
      Different versions of glibc are broken in different ways, but the short of
      it is that for the time being, frsize should == bsize, and be used as the
      multiple for the blocks, free, and available fields.  This mirrors what is
      done for NFS.  The previous reporting of the page size for frsize meant
      that newer glibc and df would report a very small value for the fs size.
      
      Fixes http://tracker.ceph.com/issues/3793.
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Reviewed-by: default avatarGreg Farnum <greg@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      999899ad
    • Sage Weil's avatar
      libceph: wrap auth methods in a mutex · 7f259658
      Sage Weil authored
      commit e9966076 upstream.
      
      The auth code is called from a variety of contexts, include the mon_client
      (protected by the monc's mutex) and the messenger callbacks (currently
      protected by nothing).  Avoid chaos by protecting all auth state with a
      mutex.  Nothing is blocking, so this should be simple and lightweight.
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f259658
    • Sage Weil's avatar
      libceph: wrap auth ops in wrapper functions · aa80dd9d
      Sage Weil authored
      commit 27859f97 upstream.
      
      Use wrapper functions that check whether the auth op exists so that callers
      do not need a bunch of conditional checks.  Simplifies the external
      interface.
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa80dd9d
    • Sage Weil's avatar
      libceph: add update_authorizer auth method · 29c65a27
      Sage Weil authored
      commit 0bed9b5c upstream.
      
      Currently the messenger calls out to a get_authorizer con op, which will
      create a new authorizer if it doesn't yet have one.  In the meantime, when
      we rotate our service keys, the authorizer doesn't get updated.  Eventually
      it will be rejected by the server on a new connection attempt and get
      invalidated, and we will then rebuild a new authorizer, but this is not
      ideal.
      
      Instead, if we do have an authorizer, call a new update_authorizer op that
      will verify that the current authorizer is using the latest secret.  If it
      is not, we will build a new one that does.  This avoids the transient
      failure.
      
      This fixes one of the sorry sequence of events for bug
      
      	http://tracker.ceph.com/issues/4282Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29c65a27
    • Sage Weil's avatar
      libceph: fix authorizer invalidation · aacd9c36
      Sage Weil authored
      commit 4b8e8b5d upstream.
      
      We were invalidating the authorizer by removing the ticket handler
      entirely.  This was effective in inducing us to request a new authorizer,
      but in the meantime it mean that any authorizer we generated would get a
      new and initialized handler with secret_id=0, which would always be
      rejected by the server side with a confusing error message:
      
       auth: could not find secret_id=0
       cephx: verify_authorizer could not get service secret for service osd secret_id=0
      
      Instead, simply clear the validity field.  This will still induce the auth
      code to request a new secret, but will let us continue to use the old
      ticket in the meantime.  The messenger code will probably continue to fail,
      but the exponential backoff will kick in, and eventually the we will get a
      new (hopefully more valid) ticket from the mon and be able to continue.
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aacd9c36
    • Sage Weil's avatar
      libceph: clear messenger auth_retry flag when we authenticate · af53bc4d
      Sage Weil authored
      commit 20e55c4c upstream.
      
      We maintain a counter of failed auth attempts to allow us to retry once
      before failing.  However, if the second attempt succeeds, the flag isn't
      cleared, which makes us think auth failed again later when the connection
      resets for other reasons (like a socket error).
      
      This is one part of the sorry sequence of events in bug
      
      	http://tracker.ceph.com/issues/4282Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af53bc4d
    • Kees Cook's avatar
      x86: Fix typo in kexec register clearing · 5c7a854c
      Kees Cook authored
      commit c8a22d19 upstream.
      
      Fixes a typo in register clearing code. Thanks to PaX Team for fixing
      this originally, and James Troup for pointing it out.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net
      Cc: PaX Team <pageexec@freemail.hu>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c7a854c
    • Naoya Horiguchi's avatar
      mm: migration: add migrate_entry_wait_huge() · f517dfe6
      Naoya Horiguchi authored
      commit 30dad309 upstream.
      
      When we have a page fault for the address which is backed by a hugepage
      under migration, the kernel can't wait correctly and do busy looping on
      hugepage fault until the migration finishes.  As a result, users who try
      to kick hugepage migration (via soft offlining, for example) occasionally
      experience long delay or soft lockup.
      
      This is because pte_offset_map_lock() can't get a correct migration entry
      or a correct page table lock for hugepage.  This patch introduces
      migration_entry_wait_huge() to solve this.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f517dfe6
    • Alex Lyakas's avatar
      md/raid1: consider WRITE as successful only if at least one non-Faulty and... · 0938e135
      Alex Lyakas authored
      md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.
      
      commit 3056e3ae upstream.
      
      Without that fix, the following scenario could happen:
      
      - RAID1 with drives A and B; drive B was freshly-added and is rebuilding
      - Drive A fails
      - WRITE request arrives to the array. It is failed by drive A, so
      r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
      succeeds in writing it, so the same r1_bio is marked as
      R1BIO_Uptodate.
      - r1_bio arrives to handle_write_finished, badblocks are disabled,
      md_error()->error() does nothing because we don't fail the last drive
      of raid1
      - raid_end_bio_io()  calls call_bio_endio()
      - As a result, in call_bio_endio():
              if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
                      clear_bit(BIO_UPTODATE, &bio->bi_flags);
      this code doesn't clear the BIO_UPTODATE flag, and the whole master
      WRITE succeeds, back to the upper layer.
      
      So we returned success to the upper layer, even though we had written
      the data onto the rebuilding drive only. But when we want to read the
      data back, we would not read from the rebuilding drive, so this data
      is lost.
      
      [neilb - applied identical change to raid10 as well]
      
      This bug can result in lost data, so it is suitable for any
      -stable kernel.
      Signed-off-by: default avatarAlex Lyakas <alex@zadarastorage.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      0938e135
    • Rafael Aquini's avatar
      swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion · c09c35b2
      Rafael Aquini authored
      commit cbab0e4e upstream.
      
      read_swap_cache_async() can race against get_swap_page(), and stumble
      across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought
      into the swapcache yet.
      
      This transient swap_map state is expected to be transitory, but the
      actual placement of discard at scan_swap_map() inserts a wait for I/O
      completion thus making the thread at read_swap_cache_async() to loop
      around its -EEXIST case, while the other end at get_swap_page() is
      scheduled away at scan_swap_map().  This can leave the system deadlocked
      if the I/O completion happens to be waiting on the CPU waitqueue where
      read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.
      
      This patch introduces a cond_resched() call to make the aforementioned
      read_swap_cache_async() busy loop condition to bail out when necessary,
      thus avoiding the subtle race window.
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Shaohua Li <shli@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c09c35b2
    • Daniel Vetter's avatar
      drm/i915: prefer VBT modes for SVDO-LVDS over EDID · 508056cc
      Daniel Vetter authored
      commit c3456fb3 upstream.
      
      In
      
      commit 53d3b4d7
      Author: Egbert Eich <eich@suse.de>
      Date:   Tue Jun 4 17:13:21 2013 +0200
      
          drm/i915/sdvo: Use &intel_sdvo->ddc instead of intel_sdvo->i2c for DDC
      
      Egbert Eich fixed a long-standing bug where we simply used a
      non-working i2c controller to read the EDID for SDVO-LVDS panels.
      Unfortunately some machines seem to not be able to cope with the mode
      provided in the EDID. Specifically they seem to not be able to cope
      with a 4x pixel mutliplier instead of a 2x one, which seems to have
      been worked around by slightly changing the panels native mode in the
      VBT so that the dotclock is just barely above 50MHz.
      
      Since it took forever to notice the breakage it's fairly safe to
      assume that at least for SDVO-LVDS panels the VBT contains fairly sane
      data. So just switch around the order and use VBT modes first.
      
      v2: Also add EDID modes just in case, and spell Egbert correctly.
      
      v3: Elaborate a bit more about what's going on on Chris' machine.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65524Reported-and-tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Egbert Eich <eich@suse.de>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      508056cc