1. 20 Feb, 2015 10 commits
  2. 19 Feb, 2015 30 commits
    • David Vrabel's avatar
      x86: pte_protnone() and pmd_protnone() must check entry is not present · e3a1f6ca
      David Vrabel authored
      Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if
      _PAGE_PRESENT is clear.  Make pte_protnone() and pmd_protnone() check
      for this.
      
      This fixes a 64-bit Xen PV guest regression introduced by 8a0516ed
      ("mm: convert p[te|md]_numa users to p[te|md]_protnone_numa").  Any
      userspace process would endlessly fault.
      
      In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set
      by the hypervisor.  This meant that any fault on a present userspace
      entry (e.g., a write to a read-only mapping) would be misinterpreted as
      a NUMA hinting fault and the fault would not be correctly handled,
      resulting in the access endlessly faulting.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e3a1f6ca
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 2b9fb532
      Linus Torvalds authored
      Pull btrfs updates from Chris Mason:
       "This pull is mostly cleanups and fixes:
      
         - The raid5/6 cleanups from Zhao Lei fixup some long standing warts
           in the code and add improvements on top of the scrubbing support
           from 3.19.
      
         - Josef has round one of our ENOSPC fixes coming from large btrfs
           clusters here at FB.
      
         - Dave Sterba continues a long series of cleanups (thanks Dave), and
           Filipe continues hammering on corner cases in fsync and others
      
        This all was held up a little trying to track down a use-after-free in
        btrfs raid5/6.  It's not clear yet if this is just made easier to
        trigger with this pull or if its a new bug from the raid5/6 cleanups.
        Dave Sterba is the only one to trigger it so far, but he has a
        consistent way to reproduce, so we'll get it nailed shortly"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (68 commits)
        Btrfs: don't remove extents and xattrs when logging new names
        Btrfs: fix fsync data loss after adding hard link to inode
        Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group
        Btrfs: account for large extents with enospc
        Btrfs: don't set and clear delalloc for O_DIRECT writes
        Btrfs: only adjust outstanding_extents when we do a short write
        btrfs: Fix out-of-space bug
        Btrfs: scrub, fix sleep in atomic context
        Btrfs: fix scheduler warning when syncing log
        Btrfs: Remove unnecessary placeholder in btrfs_err_code
        btrfs: cleanup init for list in free-space-cache
        btrfs: delete chunk allocation attemp when setting block group ro
        btrfs: clear bio reference after submit_one_bio()
        Btrfs: fix scrub race leading to use-after-free
        Btrfs: add missing cleanup on sysfs init failure
        Btrfs: fix race between transaction commit and empty block group removal
        btrfs: add more checks to btrfs_read_sys_array
        btrfs: cleanup, rename a few variables in btrfs_read_sys_array
        btrfs: add checks for sys_chunk_array sizes
        btrfs: more superblock checks, lower bounds on devices and sectorsize/nodesize
        ...
      2b9fb532
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 4533f6e2
      Linus Torvalds authored
      Pull Ceph changes from Sage Weil:
       "On the RBD side, there is a conversion to blk-mq from Christoph,
        several long-standing bug fixes from Ilya, and some cleanup from
        Rickard Strandqvist.
      
        On the CephFS side there is a long list of fixes from Zheng, including
        improved session handling, a few IO path fixes, some dcache management
        correctness fixes, and several blocking while !TASK_RUNNING fixes.
      
        The core code gets a few cleanups and Chaitanya has added support for
        TCP_NODELAY (which has been used on the server side for ages but we
        somehow missed on the kernel client).
      
        There is also an update to MAINTAINERS to fix up some email addresses
        and reflect that Ilya and Zheng are doing most of the maintenance for
        RBD and CephFS these days.  Do not be surprised to see a pull request
        come from one of them in the future if I am unavailable for some
        reason"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
        MAINTAINERS: update Ceph and RBD maintainers
        libceph: kfree() in put_osd() shouldn't depend on authorizer
        libceph: fix double __remove_osd() problem
        rbd: convert to blk-mq
        ceph: return error for traceless reply race
        ceph: fix dentry leaks
        ceph: re-send requests when MDS enters reconnecting stage
        ceph: show nocephx_require_signatures and notcp_nodelay options
        libceph: tcp_nodelay support
        rbd: do not treat standalone as flatten
        ceph: fix atomic_open snapdir
        ceph: properly mark empty directory as complete
        client: include kernel version in client metadata
        ceph: provide seperate {inode,file}_operations for snapdir
        ceph: fix request time stamp encoding
        ceph: fix reading inline data when i_size > PAGE_SIZE
        ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
        ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)
        ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
        rbd: fix error paths in rbd_dev_refresh()
        ...
      4533f6e2
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 89d3fa45
      Linus Torvalds authored
      Pull thermal managament updates from Zhang Rui:
       "Specifics:
      
         - Abstract the code and introduce helper functions for all int340x
           thermal drivers.  From: Srinivas Pandruvada.
      
         - Reorganize the ACPI LPAT table support code so that it can be
           shared for both ACPI PMIC driver and int340x thermal driver.
      
         - Add support for Braswell in intel_soc_dts thermal driver.
      
         - a couple of small fixes/cleanups for step_wise governor and int340x
           thermal driver"
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
        Thermal/int340x_thermal: remove unused uuids.
        thermal: step_wise: spelling fixes
        thermal: int340x: fix sparse warning
        Thermal/int340x: LPAT conversion for temperature
        ACPI / PMIC: Use common LPAT table handling functions
        ACPI / LPAT: Common table processing functions
        thermal: Intel SoC DTS: Add Braswell support
        Thermal/int340x/int3402: Provide notification support
        Thermal/int340x/processor_thermal: Add thermal zone support
        Thermal/int340x/int3403: Use int340x thermal API
        Thermal/int340x/int3402: Use int340x thermal API
        Thermal/int340x: Add common thermal zone handler
      89d3fa45
    • Linus Torvalds's avatar
      Merge tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · 477ea116
      Linus Torvalds authored
      Pull two EDAC fixes from Borislav Petkov:
      
       - A fix to sb_edac for proper detection on SNB machines
      
       - A fix to amd64_edac to not explode on Numascale machines with more
         than 16 memory controllers, from Daniel J Blueman.
      
      * tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
        EDAC, amd64_edac: Prevent OOPS with >16 memory controllers
        sb_edac: Fix detection on SNB machines
      477ea116
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v3.20-1' of... · 6ed3e57f
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v3.20-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
      
      Pull platform driver update from Darren Hart:
       "This includes a significant update to the toshiba_acpi driver,
        bringing it to feature parity with the Windows driver, followed by
        some needed cleanups.
      
        The other changes are mostly minor updates, quirks, sparse fixes, or
        cleanups.
      
        Details:
      
         - toshiba_acpi:
             Add support for missing features from the Windows driver, bump the
             sysfs version, and clean up the driver.
      
         - thinkpad_acpi:
             BIOS string versions, unhandled hkey events.
      
         - msamsung-laptop:
             Add native backlight quirk, enable better lid handling.
      
         - intel_scu_ipc:
             Read resources from PCI configuration
      
         - other:
             Fix sparse warnings, general cleanups"
      
      * tag 'platform-drivers-x86-v3.20-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (34 commits)
        toshiba_acpi: Cleanup GPL header
        toshiba_acpi: Cleanup comment blocks and capitalization
        toshiba_acpi: Make use of DEVICE_ATTR_{RO, RW} macros
        toshiba_acpi: Drop the toshiba_ prefix from sysfs function names
        toshiba_acpi: Move sysfs function and struct declarations further down
        Documentation/ABI: Add file describing the sysfs entries for toshiba_acpi
        toshiba_acpi: Clean file according to coding style
        toshiba_acpi: Bump version number to 0.21
        toshiba_acpi: Add support to enable/disable USB 3
        toshiba_acpi: Add support for Panel Power ON
        toshiba_acpi: Add support for Keyboard functions mode
        toshiba_acpi: Add fan entry to sysfs
        toshiba_acpi: Add version entry to sysfs
        thinkpad_acpi: support new BIOS version string pattern
        thinkpad_acpi: unhandled hkey event
        toshiba_acpi: Make toshiba_eco_mode_available more robust
        classmate-laptop: Fix sparse warning (0 as NULL)
        Sony-laptop: Fix sparse warning (make undeclared var static)
        thinkpad_acpi.c: Fix sparse warning (make undeclared var static)
        samsung-laptop.c: Prefer kstrtoint over single variable sscanf
        ...
      6ed3e57f
    • Linus Torvalds's avatar
      Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · b11a2783
      Linus Torvalds authored
      Pull kconfig updates from Michal Marek:
       "Yann E Morin was supposed to take over kconfig maintainership, but
        this hasn't happened.  So I'm sending a few kconfig patches that I
        collected:
      
         - Fix for missing va_end in kconfig
         - merge_config.sh displays used if given too few arguments
         - s/boolean/bool/ in Kconfig files for consistency, with the plan to
           only support bool in the future"
      
      * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kconfig: use va_end to match corresponding va_start
        merge_config.sh: Display usage if given too few arguments
        kconfig: use bool instead of boolean for type definition attributes
      b11a2783
    • Linus Torvalds's avatar
      Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 77343343
      Linus Torvalds authored
      Pull misc kbuild changes from Michal Marek:
       "Just a few non-critical kbuild changes:
      
         - builddeb adds the actual distribution name in the changelog
         - documentation fixes"
      
      * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kbuild: trivial - fix the help doc of CONFIG_CC_OPTIMIZE_FOR_SIZE
        kbuild: Update documentation of clean-files and clean-dirs
        builddeb: Try to determine distribution
        builddeb: Update year and git repository URL in debian/copyright
      77343343
    • Sage Weil's avatar
      MAINTAINERS: update Ceph and RBD maintainers · 0f5417ce
      Sage Weil authored
      - add Ilya, drop Yehuda as an RBD maintainer
      - add Zheng as a Ceph maintainer
      - update Yehuda and Sage's emails
      Signed-off-by: default avatarSage Weil <sage@redhat.com>
      0f5417ce
    • Linus Torvalds's avatar
      Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 27a22ee4
      Linus Torvalds authored
      Pull kbuild updates from Michal Marek:
      
       - several cleanups in kbuild
      
       - serialize multiple *config targets so that 'make defconfig kvmconfig'
         works
      
       - The cc-ifversion macro got support for an else-branch
      
      * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kbuild,gcov: simplify kernel/gcov/Makefile more
        kbuild: allow cc-ifversion to have the argument for false condition
        kbuild,gcov: simplify kernel/gcov/Makefile
        kbuild,gcov: remove unnecessary workaround
        kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion
        kbuild: fix cc-ifversion macro
        kbuild: drop $(version_h) from MRPROPER_FILES
        kbuild: use mixed-targets when two or more config targets are given
        kbuild: remove redundant line from bounds.h/asm-offsets.h
        kbuild: merge bounds.h and asm-offsets.h rules
        kbuild: Drop support for clean-rule
      27a22ee4
    • Ilya Dryomov's avatar
      libceph: kfree() in put_osd() shouldn't depend on authorizer · b28ec2f3
      Ilya Dryomov authored
      a255651d ("ceph: ensure auth ops are defined before use") made
      kfree() in put_osd() conditional on the authorizer.  A mechanical
      mistake most likely - fix it.
      
      Cc: Alex Elder <elder@linaro.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      b28ec2f3
    • Ilya Dryomov's avatar
      libceph: fix double __remove_osd() problem · 7eb71e03
      Ilya Dryomov authored
      It turns out it's possible to get __remove_osd() called twice on the
      same OSD.  That doesn't sit well with rb_erase() - depending on the
      shape of the tree we can get a NULL dereference, a soft lockup or
      a random crash at some point in the future as we end up touching freed
      memory.  One scenario that I was able to reproduce is as follows:
      
                  <osd3 is idle, on the osd lru list>
      <con reset - osd3>
      con_fault_finish()
        osd_reset()
                                    <osdmap - osd3 down>
                                    ceph_osdc_handle_map()
                                      <takes map_sem>
                                      kick_requests()
                                        <takes request_mutex>
                                        reset_changed_osds()
                                          __reset_osd()
                                            __remove_osd()
                                        <releases request_mutex>
                                      <releases map_sem>
          <takes map_sem>
          <takes request_mutex>
          __kick_osd_requests()
            __reset_osd()
              __remove_osd() <-- !!!
      
      A case can be made that osd refcounting is imperfect and reworking it
      would be a proper resolution, but for now Sage and I decided to fix
      this by adding a safe guard around __remove_osd().
      
      Fixes: http://tracker.ceph.com/issues/8087
      
      Cc: Sage Weil <sage@redhat.com>
      Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc5: libceph: assert both regular and lingering lists in __remove_osd()
      Cc: stable@vger.kernel.org # 3.9+: cc9f1f51: libceph: change from BUG to WARN for __remove_osd() asserts
      Cc: stable@vger.kernel.org # 3.9+
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      7eb71e03
    • Christoph Hellwig's avatar
      rbd: convert to blk-mq · 7ad18afa
      Christoph Hellwig authored
      This converts the rbd driver to use the blk-mq infrastructure.  Except
      for switching to a per-request work item this is almost mechanical.
      
      This was tested by Alexandre DERUMIER in November, and found to give
      him 120000 iops, although the only comparism available was an old
      3.10 kernel which gave 80000iops.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      [idryomov@gmail.com: context, blk_mq_init_queue() EH]
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      7ad18afa
    • Yan, Zheng's avatar
      ceph: return error for traceless reply race · 4d41cef2
      Yan, Zheng authored
      When we receives traceless reply for request that created new inode,
      we re-send a lookup request to MDS get information of the newly created
      inode. (VFS expects FS' callback return an inode in create case)
      This breaks one request into two requests. Other client may modify or
      move to the new inode in the middle.
      
      When the race happens, ceph_handle_notrace_create() unconditionally
      links the dentry for 'create' operation to the inode returned by lookup.
      This may confuse VFS when the inode is a directory (VFS does not allow
      multiple linkages for directory inode).
      
      This patch makes ceph_handle_notrace_create() when it detect a race.
      This event should be rare and it happens only when we talk to old MDS.
      Recent MDS does not send traceless reply for request that creates new
      inode.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      4d41cef2
    • Yan, Zheng's avatar
      ceph: fix dentry leaks · 5cba372c
      Yan, Zheng authored
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      5cba372c
    • Yan, Zheng's avatar
      ceph: re-send requests when MDS enters reconnecting stage · 3de22be6
      Yan, Zheng authored
      So that MDS can check if any request is already completed and process
      completed requests in clientreplay stage. When completed requests are
      processed in clientreplay stage, MDS can avoid sending traceless
      replies.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      3de22be6
    • Ilya Dryomov's avatar
    • Chaitanya Huilgol's avatar
      libceph: tcp_nodelay support · ba988f87
      Chaitanya Huilgol authored
      TCP_NODELAY socket option set on connection sockets,
      disables Nagle’s algorithm and improves latency characteristics.
      tcp_nodelay(default)/notcp_nodelay option flags provided to
      enable/disable setting the socket option.
      Signed-off-by: default avatarChaitanya Huilgol <chaitanya.huilgol@sandisk.com>
      [idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments]
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      ba988f87
    • Ilya Dryomov's avatar
      rbd: do not treat standalone as flatten · cf32bd9c
      Ilya Dryomov authored
      If the clone is resized down to 0, it becomes standalone.  If such
      resize is carried over while an image is mapped we would detect this
      and call rbd_dev_parent_put() which means "let go of all parent state,
      including the spec(s) of parent images(s)".  This leads to a mismatch
      between "rbd info" and sysfs parent fields, so a fix is in order.
      
          # rbd create --image-format 2 --size 1 foo
          # rbd snap create foo@snap
          # rbd snap protect foo@snap
          # rbd clone foo@snap bar
          # DEV=$(rbd map bar)
          # rbd resize --allow-shrink --size 0 bar
          # rbd resize --size 1 bar
          # rbd info bar | grep parent
                  parent: rbd/foo@snap
      
      Before:
      
          # cat /sys/bus/rbd/devices/0/parent
          (no parent image)
      
      After:
      
          # cat /sys/bus/rbd/devices/0/parent
          pool_id 0
          pool_name rbd
          image_id 10056b8b4567
          image_name foo
          snap_id 2
          snap_name snap
          overlap 0
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      cf32bd9c
    • Yan, Zheng's avatar
      ceph: fix atomic_open snapdir · bf91c315
      Yan, Zheng authored
      ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value
      and creates snapdir inode if it's -ENOENT
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      bf91c315
    • Yan, Zheng's avatar
      ceph: properly mark empty directory as complete · 2f92b3d0
      Yan, Zheng authored
      ceph_add_cap() calls __check_cap_issue(), which clears directory
      inode' complete flag. so we should set the complete flag for empty
      directory should be set after calling ceph_add_cap().
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      2f92b3d0
    • Yan, Zheng's avatar
      client: include kernel version in client metadata · a6a5ce4f
      Yan, Zheng authored
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      a6a5ce4f
    • Yan, Zheng's avatar
      ceph: provide seperate {inode,file}_operations for snapdir · 38c48b5f
      Yan, Zheng authored
      remove all unsupported operations from {inode,file}_operations.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      38c48b5f
    • Yan, Zheng's avatar
      ceph: fix request time stamp encoding · 1f041a89
      Yan, Zheng authored
      struct timespec uses 'long' to present second and nanosecond. 'long'
      is 64 bits on 64bits machine. ceph MDS expects time stamp to be
      encoded as struct ceph_timespec, which uses 'u32' to present second
      and nanosecond.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      1f041a89
    • Yan, Zheng's avatar
      ceph: fix reading inline data when i_size > PAGE_SIZE · fcc02d2a
      Yan, Zheng authored
      when inode has inline data but its size > PAGE_SIZE (it was truncated
      to larger size), previous direct read code return -EIO. This patch adds
      code to return zeros for data whose offset > PAGE_SIZE.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      fcc02d2a
    • Yan, Zheng's avatar
      ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) · 86d8f67b
      Yan, Zheng authored
      use an atomic variable to track number of sessions, this can avoid block
      operation inside wait loops.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      86d8f67b
    • Yan, Zheng's avatar
      ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) · c4d4a582
      Yan, Zheng authored
      we should not do block operation in wait_event_interruptible()'s condition
      check function, but reading inline data can block. so move the read inline
      data code to ceph_get_caps()
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      c4d4a582
    • Yan, Zheng's avatar
      ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) · d3383a8e
      Yan, Zheng authored
      check_cap_flush() calls mutex_lock(), which may block. So we can't
      use it as condition check function for wait_event();
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      d3383a8e
    • Ilya Dryomov's avatar
      rbd: fix error paths in rbd_dev_refresh() · 73e39e4d
      Ilya Dryomov authored
      header_rwsem should be released on errors.  Also remove useless
      rbd_dev->mapping.size != rbd_dev->header.image_size test.
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      73e39e4d
    • Yan, Zheng's avatar
      ceph: improve reference tracking for snaprealm · 982d6011
      Yan, Zheng authored
      When snaprealm is created, its initial reference count is zero.
      But in some rare cases, the newly created snaprealm is not referenced
      by anyone. This causes snaprealm with zero reference count not freed.
      
      The fix is set reference count of newly snaprealm to 1. The reference
      is return the function who requests to create the snaprealm. When the
      function finishes its job, it releases the reference.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      982d6011