1. 22 Jan, 2014 1 commit
    • Dongmao Zhang's avatar
      dm log userspace: allow mark requests to piggyback on flush requests · 5066a4df
      Dongmao Zhang authored
      In the cluster evironment, cluster write has poor performance because
      userspace_flush() has to contact a userspace program (cmirrord) for
      clear/mark/flush requests.  But both mark and flush requests require
      cmirrord to communicate the message to all the cluster nodes for each
      flush call.  This behaviour is really slow.
      
      To address this we now merge mark and flush requests together to reduce
      the kernel-userspace-kernel time.  We allow a new directive,
      "integrated_flush" that can be used to instruct the kernel log code to
      combine flush and mark requests when directed by userspace.  If not
      directed by userspace (due to an older version of the userspace code
      perhaps), the kernel will function as it did previously - preserving
      backwards compatibility.  Additionally, flush requests are performed
      lazily when only clear requests exist.
      Signed-off-by: default avatarDongmao Zhang <dmzhang@suse.com>
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      5066a4df
  2. 21 Jan, 2014 1 commit
    • Joe Thornber's avatar
      dm space map metadata: fix bug in resizing of thin metadata · fca02843
      Joe Thornber authored
      This bug was introduced in commit 7e664b3d ("dm space map metadata:
      fix extending the space map").
      
      When extending a dm-thin metadata volume we:
      
      - Switch the space map into a simple bootstrap mode, which allocates
        all space linearly from the newly added space.
      - Add new bitmap entries for the new space
      - Increment the reference counts for those newly allocated bitmap
        entries
      - Commit changes to disk
      - Switch back out of bootstrap mode.
      
      But, the disk commit may allocate space itself, if so this fact will be
      lost when switching out of bootstrap mode.
      
      The bug exhibited itself as an error when the bitmap_root, with an
      erroneous ref count of 0, was subsequently decremented as part of a
      later disk commit.  This would cause the disk commit to fail, and thinp
      to enter read_only mode.  The metadata was not damaged (thin_check
      passed).
      
      The fix is to put the increments + commit into a loop, running until
      the commit has not allocated extra space.  In practise this loop only
      runs twice.
      
      With this fix the following device mapper testsuite test passes:
       dmtest run --suite thin-provisioning -n thin_remove_works_after_resize
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # depends on commit 7e664b3d
      fca02843
  3. 16 Jan, 2014 2 commits
  4. 15 Jan, 2014 3 commits
    • Mikulas Patocka's avatar
      dm sysfs: fix a module unload race · 2995fa78
      Mikulas Patocka authored
      This reverts commit be35f486 ("dm: wait until embedded kobject is
      released before destroying a device") and provides an improved fix.
      
      The kobject release code that calls the completion must be placed in a
      non-module file, otherwise there is a module unload race (if the process
      calling dm_kobject_release is preempted and the DM module unloaded after
      the completion is triggered, but before dm_kobject_release returns).
      
      To fix this race, this patch moves the completion code to dm-builtin.c
      which is always compiled directly into the kernel if BLK_DEV_DM is
      selected.
      
      The patch introduces a new dm_kobject_holder structure, its purpose is
      to keep the completion and kobject in one place, so that it can be
      accessed from non-module code without the need to export the layout of
      struct mapped_device to that code.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2995fa78
    • Mikulas Patocka's avatar
      dm snapshot: use dm-bufio prefetch · 55b082e6
      Mikulas Patocka authored
      This patch modifies dm-snapshot so that it prefetches the buffers when
      loading the exceptions.
      
      The number of buffers read ahead is specified in the DM_PREFETCH_CHUNKS
      macro.  The current value for DM_PREFETCH_CHUNKS (12) was found to
      provide the best performance on a single 15k SCSI spindle.  In the
      future we may modify this default or make it configurable.
      
      Also, introduce the function dm_bufio_set_minimum_buffers to setup
      bufio's number of internal buffers before freeing happens.  dm-bufio may
      hold more buffers if enough memory is available.  There is no guarantee
      that the specified number of buffers will be available - if you need a
      guarantee, use the argument reserved_buffers for
      dm_bufio_client_create.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      55b082e6
    • Mikulas Patocka's avatar
      dm snapshot: use dm-bufio · 55494bf2
      Mikulas Patocka authored
      Use dm-bufio for initial loading of the exceptions.
      Introduce a new function dm_bufio_forget that frees the given buffer.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      55494bf2
  5. 14 Jan, 2014 2 commits
  6. 10 Jan, 2014 1 commit
    • Mike Snitzer's avatar
      dm cache: add block sizes and total cache blocks to status output · 6a388618
      Mike Snitzer authored
      Improve cache_status to emit:
      <metadata block size> <#used metadata blocks>/<#total metadata blocks>
      <cache block size> <#used cache blocks>/<#total cache blocks>
      ...
      
      Adding the block sizes allows for easier calculation of the overall size
      of both the metadata and cache devices.  Adding <#total cache blocks>
      provides useful context for how much of the cache is used.
      
      Unfortunately these additions to the status will require updates to
      users' scripts that monitor the cache status.  But these changes help
      provide more comprehensive information about the cache device and will
      simplify tools that are being developed to manage dm-cache devices --
      because they won't need to issue 3 operations to cobble together the
      information that we can easily provide via a single status ioctl.
      
      While updating the status documentation in cache.txt spaces were
      tabify'd.
      Requested-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      6a388618
  7. 09 Jan, 2014 1 commit
  8. 08 Jan, 2014 3 commits
    • Joe Thornber's avatar
      dm space map metadata: fix extending the space map · 7e664b3d
      Joe Thornber authored
      When extending a metadata space map we should do the first commit whilst
      still in bootstrap mode -- a mode where all blocks get allocated in the
      new area.
      
      That way the commit overhead is allocated from the newly added space.
      Otherwise we risk running out of space.
      
      With this fix, and the previous commit "dm space map common: make sure
      new space is used during extend", the following device mapper testsuite
      test passes:
       dmtest run --suite thin-provisioning -n /resize_metadata_no_io/
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      7e664b3d
    • Joe Thornber's avatar
      dm space map common: make sure new space is used during extend · 12c91a5c
      Joe Thornber authored
      When extending a low level space map we should update nr_blocks at
      the start so the new space is used for the index entries.
      
      Otherwise extend can fail, e.g.: sm_metadata_extend call sequence
      that fails:
       -> sm_ll_extend
          -> dm_tm_new_block -> dm_sm_new_block -> sm_bootstrap_new_block
          => returns -ENOSPC because smm->begin == smm->ll.nr_blocks
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      12c91a5c
    • Mikulas Patocka's avatar
      dm: wait until embedded kobject is released before destroying a device · be35f486
      Mikulas Patocka authored
      There may be other parts of the kernel holding a reference on the dm
      kobject.  We must wait until all references are dropped before
      deallocating the mapped_device structure.
      
      The dm_kobject_release method signals that all references are dropped
      via completion.  But dm_kobject_release doesn't free the kobject (which
      is embedded in the mapped_device structure).
      
      This is the sequence of operations:
      * when destroying a DM device, call kobject_put from dm_sysfs_exit
      * wait until all users stop using the kobject, when it happens the
        release method is called
      * the release method signals the completion and should return without
        delay
      * the dm device removal code that waits on the completion continues
      * the dm device removal code drops the dm_mod reference the device had
      * the dm device removal code frees the mapped_device structure that
        contains the kobject
      
      Using kobject this way should avoid the module unload race that was
      mentioned at the beginning of this thread:
      https://lkml.org/lkml/2014/1/4/83Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      be35f486
  9. 07 Jan, 2014 21 commits
    • Mikulas Patocka's avatar
      dm: remove pointless kobject comparison in dm_get_from_kobject · 1ddd641d
      Mikulas Patocka authored
      The comparison is always true and the compiler optimizes it out anyway.
      
      Milan offered additional context relative to the original commit
      784aae73 ("dm: add name and uuid to sysfs") which introduced the code:
      "I think it is just relict of some experiments before I committed this
      simple embedded sysfs kobj handling".
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      1ddd641d
    • Chuansheng Liu's avatar
      dm snapshot: call destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() · c1a64160
      Chuansheng Liu authored
      In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
      call destroy_work_on_stack() which frees the debug object to pair
      with INIT_WORK_ONSTACK().
      Signed-off-by: default avatarLiu, Chuansheng <chuansheng.liu@intel.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      c1a64160
    • Joe Thornber's avatar
      dm cache policy mq: introduce three promotion threshold tunables · 78e03d69
      Joe Thornber authored
      Internally the mq policy maintains a promotion threshold variable.  If
      the hit count of a block not in the cache goes above this threshold it
      gets promoted to the cache.
      
      This patch introduces three new tunables that allow you to tweak the
      promotion threshold by adding a small value.  These adjustments depend
      on the io type:
      
         read_promote_adjustment:    READ io, default 4
         write_promote_adjustment:   WRITE io, default 8
         discard_promote_adjustment: READ/WRITE io to a discarded block, default 1
      
      If you're trying to quickly warm a new cache device you may wish to
      reduce these to encourage promotion.  Remember to switch them back to
      their defaults after the cache fills though.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      78e03d69
    • Wei Yongjun's avatar
    • Mike Snitzer's avatar
      dm thin: fix set_pool_mode exposed pool operation races · 8b64e881
      Mike Snitzer authored
      The pool mode must not be switched until after the corresponding pool
      process_* methods have been established.  Otherwise, because
      set_pool_mode() isn't interlocked with the IO path for performance
      reasons, the IO path can end up executing process_* operations that
      don't match the mode.  This patch eliminates problems like the following
      (as seen on really fast PCIe SSD storage when transitioning the pool's
      mode from PM_READ_ONLY to PM_WRITE):
      
      kernel: device-mapper: thin: 253:2: reached low water mark for data device: sending event.
      kernel: device-mapper: thin: 253:2: no free data space available.
      kernel: device-mapper: thin: 253:2: switching pool to read-only mode
      kernel: device-mapper: thin: 253:2: switching pool to write mode
      kernel: ------------[ cut here ]------------
      kernel: WARNING: CPU: 11 PID: 7564 at drivers/md/dm-thin.c:995 handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]()
      ...
      kernel: Workqueue: dm-thin do_worker [dm_thin_pool]
      kernel: 00000000000003e3 ffff880308831cc8 ffffffff8152ebcb 00000000000003e3
      kernel: 0000000000000000 ffff880308831d08 ffffffff8104c46c ffff88032502a800
      kernel: ffff880036409000 ffff88030ec7ce00 0000000000000001 00000000ffffffc3
      kernel: Call Trace:
      kernel: [<ffffffff8152ebcb>] dump_stack+0x49/0x5e
      kernel: [<ffffffff8104c46c>] warn_slowpath_common+0x8c/0xc0
      kernel: [<ffffffff8104c4ba>] warn_slowpath_null+0x1a/0x20
      kernel: [<ffffffffa001e2c6>] handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]
      kernel: [<ffffffffa001f276>] process_bio_read_only+0x136/0x180 [dm_thin_pool]
      kernel: [<ffffffffa0020b75>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
      kernel: [<ffffffffa0020d31>] do_worker+0x51/0x60 [dm_thin_pool]
      kernel: [<ffffffff81067823>] process_one_work+0x183/0x490
      kernel: [<ffffffff81068c70>] worker_thread+0x120/0x3a0
      kernel: [<ffffffff81068b50>] ? manage_workers+0x160/0x160
      kernel: [<ffffffff8106e86e>] kthread+0xce/0xf0
      kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
      kernel: [<ffffffff8153b3ec>] ret_from_fork+0x7c/0xb0
      kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
      kernel: ---[ end trace 3f00528e08ffa55c ]---
      kernel: device-mapper: thin: pool mode is PM_WRITE not PM_READ_ONLY like expected!?
      
      dm-thin.c:995 was the WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
      at the top of handle_unserviceable_bio().  And as the additional
      debugging I had conveys: the pool mode was _not_ PM_READ_ONLY like
      expected, it was already PM_WRITE, yet pool->process_bio was still set
      to process_bio_read_only().
      
      Also, while fixing this up, reduce logging of redundant pool mode
      transitions by checking new_mode is different from old_mode.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      8b64e881
    • Mike Snitzer's avatar
      dm thin: eliminate the no_free_space flag · 6d16202b
      Mike Snitzer authored
      The pool's error_if_no_space flag can easily serve the same purpose that
      no_free_space did, namely: control whether handle_unserviceable_bio()
      will error a bio or requeue it.
      
      This is cleaner since error_if_no_space is established when the pool's
      features are processed during table load.  So it avoids managing the
      no_free_space flag by taking the pool's spinlock.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      6d16202b
    • Mike Snitzer's avatar
      dm thin: add error_if_no_space feature · 787a996c
      Mike Snitzer authored
      If the pool runs out of data or metadata space, the pool can either
      queue or error the IO destined to the data device.  The default is to
      queue the IO until more space is added.
      
      An admin may now configure the pool to error IO when no space is
      available by setting the 'error_if_no_space' feature when loading the
      thin-pool table.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      787a996c
    • Mike Snitzer's avatar
      dm thin: requeue bios to DM core if no_free_space and in read-only mode · 8c0f0e8c
      Mike Snitzer authored
      Now that we switch the pool to read-only mode when the data device runs
      out of space it causes active writers to get IO errors once we resume
      after resizing the data device.
      
      If no_free_space is set, save bios to the 'retry_on_resume_list' and
      requeue them on resume (once the data or metadata device may have been
      resized).
      
      With this patch the resize_io test passes again (on slower storage):
       dmtest run --suite thin-provisioning -n /resize_io/
      
      Later patches fix some subtle races associated with the pool mode
      transitions done as part of the pool's -ENOSPC handling.  These races
      are exposed on fast storage (e.g. PCIe SSD).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      8c0f0e8c
    • Mike Snitzer's avatar
      dm thin: cleanup and improve no space handling · 399caddf
      Mike Snitzer authored
      Factor out_of_data_space() out of alloc_data_block().  Eliminate the use
      of 'no_free_space' as a latch in alloc_data_block() -- this is no longer
      needed now that we switch to read-only mode when we run out of data or
      metadata space.  In a later patch, the 'no_free_space' flag will be
      eliminated entirely (in favor of checking metadata rather than relying
      on a transient flag).
      
      Move no metdata space handling into metdata_operation_failed().  Set
      no_free_space when metadata space is exhausted too.  This is useful,
      because it offers consistency, for the following patch that will requeue
      data IOs if no_free_space.
      
      Also, rename no_space() to retry_bios_on_resume().
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      399caddf
    • Mike Snitzer's avatar
    • Joe Thornber's avatar
      dm thin: handle metadata failures more consistently · b5330655
      Joe Thornber authored
      Introduce metadata_operation_failed() wrappers, around set_pool_mode(),
      to assist with improving the consistency of how metadata failures are
      handled.  Logging is improved and metadata operation failures trigger
      read-only mode immediately.
      
      Also, eliminate redundant set_pool_mode() calls in the two
      alloc_data_block() caller's error paths.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      b5330655
    • Joe Thornber's avatar
      dm thin: factor out check_low_water_mark and use bools · 88a6621b
      Joe Thornber authored
      Factor check_low_water_mark() out of alloc_data_block().
      Change a couple unsigned flags in the pool structure to bool.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      88a6621b
    • Mike Snitzer's avatar
      dm thin: add mappings to end of prepared_* lists · daec338b
      Mike Snitzer authored
      Mappings could be processed in descending logical block order,
      particularly if buffered IO is used.  This could adversely affect the
      latency of IO processing.  Fix this by adding mappings to the end of the
      'prepared_mappings' and 'prepared_discards' lists.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      daec338b
    • Joe Thornber's avatar
    • Mike Snitzer's avatar
      dm thin: use bool rather than unsigned for flags in structures · 7f214665
      Mike Snitzer authored
      Also, move 'err' member in dm_thin_new_mapping structure to eliminate 4
      byte hole (reduces size from 88 bytes to 80).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      7f214665
    • Mike Snitzer's avatar
      dm persistent data: cleanup dm-thin specific references in text · 10343180
      Mike Snitzer authored
      DM's persistent-data library is now used my multiple targets so
      exclusive references to "pool" or "thin provisioning" need to be
      cleaned up.  Adjust Kconfig's DM_DEBUG_BLOCK_STACK_TRACING text
      and remove "pool" from a block manager error message.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      10343180
    • Mike Snitzer's avatar
      dm space map metadata: limit errors in sm_metadata_new_block · c46985e2
      Mike Snitzer authored
      The "unable to allocate new metadata block" error can be a particularly
      verbose error if there is a systemic issue with the metadata device.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      c46985e2
    • Mikulas Patocka's avatar
      dm delay: use per-bio data instead of a mempool and slab cache · 42065460
      Mikulas Patocka authored
      Starting with commit c0820cf5 ("dm: introduce per_bio_data"),
      device mapper has the capability to pre-allocate a target-specific
      structure with the bio.
      
      This patch changes dm-delay to use this facility instead of a slab cache
      and mempool.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      42065460
    • Mikulas Patocka's avatar
      dm table: remove unused buggy code that extends the targets array · 57a2f238
      Mikulas Patocka authored
      A device mapper table is allocated in the following way:
      * The function dm_table_create is called, it gets the number of targets
        as an argument -- it allocates a targets array accordingly.
      * For each target, we call dm_table_add_target.
      
      If we add more targets than were specified in dm_table_create, the
      function dm_table_add_target reallocates the targets array.  However,
      this reallocation code is wrong - it moves the targets array to a new
      location, while some target constructors hold pointers to the array in
      the old location.
      
      The following DM target drivers save the pointer to the target
      structure, so they corrupt memory if the target array is moved:
      multipath, raid, mirror, snapshot, stripe, switch, thin, verity.
      
      Under normal circumstances, the reallocation function is not called
      (because dm_table_create is called with the correct number of targets),
      so the buggy reallocation code is not used.
      
      Prior to the fix "dm table: fail dm_table_create on dm_round_up
      overflow", the reallocation code could only be used in case the user
      specifies too large a value in param->target_count, such as 0xffffffff.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      57a2f238
    • Joe Thornber's avatar
      dm thin: fix discard support to a previously shared block · 19fa1a67
      Joe Thornber authored
      If a snapshot is created and later deleted the origin dm_thin_device's
      snapshotted_time will have been updated to reflect the snapshot's
      creation time.  The 'shared' flag in the dm_thin_lookup_result struct
      returned from dm_thin_find_block() is an approximation based on
      snapshotted_time -- this is done to avoid 0(n), or worse, time
      complexity.  In this case, the shared flag would be true.
      
      But because the 'shared' flag reflects an approximation a block can be
      incorrectly assumed to be shared (e.g. false positive for 'shared'
      because the snapshot no longer exists).  This could result in discards
      issued to a thin device not being passed down to the pool's underlying
      data device.
      
      To fix this we double check that a thin block is really still in-use
      after a mapping is removed using dm_pool_block_is_used().  If the
      reference count for a block is now zero the discard is allowed to be
      passed down.
      
      Also add a 'definitely_not_shared' member to the dm_thin_new_mapping
      structure -- reflects that the 'shared' flag in the response from
      dm_thin_find_block() can only be held as definitive if false is
      returned.
      
      Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1043527Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      19fa1a67
    • Mike Snitzer's avatar
      dm thin: initialize dm_thin_new_mapping returned by get_next_mapping · 16961b04
      Mike Snitzer authored
      As additional members are added to the dm_thin_new_mapping structure
      care should be taken to make sure they get initialized before use.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      16961b04
  10. 15 Dec, 2013 5 commits
    • Linus Torvalds's avatar
      Linux 3.13-rc4 · 319e2e3f
      Linus Torvalds authored
      319e2e3f
    • Matias Bjorling's avatar
      null_blk: mem garbage on NUMA systems during init · 57053d8c
      Matias Bjorling authored
      For NUMA systems, initializing the blk-mq layer and using per node hctx.
      We initialize submit queues to 1, while blk-mq nr_hw_queues is
      initialized to the number of NUMA nodes.
      
      This makes the null_init_hctx function overwrite memory outside of what
      it allocated.  In my case it lead to writing garbage into struct
      request_queue's mq_map.
      Signed-off-by: default avatarMatias Bjorling <m@bjorling.me>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57053d8c
    • Sergey Senozhatsky's avatar
      radeon_pm: fix oops in hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh() · e4158f1b
      Sergey Senozhatsky authored
      Since commit ec39f64b ("drm/radeon/dpm: Convert to use
      devm_hwmon_register_with_groups") radeon_hwmon_init() is using
      hwmon_device_register_with_groups(), which sets `rdev' as a device
      private driver_data, while hwmon_attributes_visible() and
      radeon_hwmon_show_temp_thresh() are still waiting for `drm_device'.
      
      Fix them by using dev_get_drvdata(), in order to avoid this oops:
      
        BUG: unable to handle kernel paging request at 0000000000001e28
        IP: [<ffffffffa02ae8b4>] hwmon_attributes_visible+0x18/0x3d [radeon]
        PGD 15057e067 PUD 151a8e067 PMD 0
        Oops: 0000 [#1] PREEMPT SMP
        Call Trace:
          internal_create_group+0x114/0x1d9
          sysfs_create_group+0xe/0x10
          sysfs_create_groups+0x22/0x5f
          device_add+0x34f/0x501
          device_register+0x15/0x18
          hwmon_device_register_with_groups+0xb5/0xed
          radeon_hwmon_init+0x56/0x7c [radeon]
          radeon_pm_init+0x134/0x7e5 [radeon]
          radeon_modeset_init+0x75f/0x8ed [radeon]
          radeon_driver_load_kms+0xc6/0x187 [radeon]
          drm_dev_register+0xf9/0x1b4 [drm]
          drm_get_pci_dev+0x98/0x129 [drm]
          radeon_pci_probe+0xa3/0xac [radeon]
          pci_device_probe+0x6e/0xcf
          driver_probe_device+0x98/0x1c4
          __driver_attach+0x5c/0x7e
          bus_for_each_dev+0x7b/0x85
          driver_attach+0x19/0x1b
          bus_add_driver+0x104/0x1ce
          driver_register+0x89/0xc5
          __pci_register_driver+0x58/0x5b
          drm_pci_init+0x86/0xea [drm]
          radeon_init+0x97/0x1000 [radeon]
          do_one_initcall+0x7f/0x117
          load_module+0x1583/0x1bb4
          SyS_init_module+0xa0/0xaf
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Alexander Deucher <Alexander.Deucher@amd.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e4158f1b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4a251dd2
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Revert CHECKSUM_COMPLETE optimization in pskb_trim_rcsum(), I can't
          figure out why it breaks things.
      
       2) Fix comparison in netfilter ipset's hash_netnet4_data_equal(), it
          was basically doing "x == x", from Dave Jones.
      
       3) Freescale FEC driver was DMA mapping the wrong number of bytes, from
          Sebastian Siewior.
      
       4) Blackhole and prohibit routes in ipv6 were not doing the right thing
          because their ->input and ->output methods were not being assigned
          correctly.  Now they behave properly like their ipv4 counterparts.
          From Kamala R.
      
       5) Several drivers advertise the NETIF_F_FRAGLIST capability, but
          really do not support this feature and will send garbage packets if
          fed fraglist SKBs.  From Eric Dumazet.
      
       6) Fix long standing user triggerable BUG_ON over loopback in RDS
          protocol stack, from Venkat Venkatsubra.
      
       7) Several not so common code paths can potentially try to invoke
          packet scheduler actions that might be NULL without checking.  Shore
          things up by either 1) defining a method as mandatory and erroring
          on registration if that method is NULL 2) defininig a method as
          optional and the registration function hooks up a default
          implementation when NULL is seen.  From Jamal Hadi Salim.
      
       8) Fix fragment detection in xen-natback driver, from Paul Durrant.
      
       9) Kill dangling enter_memory_pressure method in cg_proto ops, from
          Eric W Biederman.
      
      10) SKBs that traverse namespaces should have their local_df cleared,
          from Hannes Frederic Sowa.
      
      11) IOCB file position is not being updated by macvtap_aio_read() and
          tun_chr_aio_read().  From Zhi Yong Wu.
      
      12) Don't free virtio_net netdev before releasing all of the NAPI
          instances.  From Andrey Vagin.
      
      13) Procfs entry leak in xt_hashlimit, from Sergey Popovich.
      
      14) IPv6 routes that are no cached routes should not count against the
          garbage collection limits.  We had this almost right, but were
          missing handling addrconf generated routes properly.  From Hannes
          Frederic Sowa.
      
      15) fib{4,6}_rule_suppress() have to consider potentially seeing NULL
          route info when they are called, from Stefan Tomanek.
      
      16) TUN and MACVTAP have had truncated packet signalling for some time,
          fix from Jason Wang.
      
      17) Fix use after frrr in __udp4_lib_rcv(), from Eric Dumazet.
      
      18) xen-netback does not interpret the NAPI budget properly for TX work,
          fix from Paul Durrant.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
        igb: Fix for issue where values could be too high for udelay function.
        i40e: fix null dereference
        xen-netback: fix gso_prefix check
        net: make neigh_priv_len in struct net_device 16bit instead of 8bit
        drivers: net: cpsw: fix for cpsw crash when build as modules
        xen-netback: napi: don't prematurely request a tx event
        xen-netback: napi: fix abuse of budget
        sch_tbf: use do_div() for 64-bit divide
        udp: ipv4: must add synchronization in udp_sk_rx_dst_set()
        net:fec: remove duplicate lines in comment about errata ERR006358
        Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature"
        8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature
        xen-netback: make sure skb linear area covers checksum field
        net: smc91x: Fix device tree based configuration so it's usable
        udp: ipv4: fix potential use after free in udp_v4_early_demux()
        macvtap: signal truncated packets
        tun: unbreak truncated packet signalling
        net: sched: htb: fix the calculation of quantum
        net: sched: tbf: fix the calculation of max_size
        micrel: add support for KSZ8041RNLI
        ...
      4a251dd2
    • Linus Torvalds's avatar
      Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 908bfda7
      Linus Torvalds authored
      Pull x86 fixes from Peter Anvin:
       "This is a pretty small batch:
      
        The biggest single change is to stop using EFI time services on 32-bit
        platforms.  This matches our current behavior on 64-bit platforms as
        we already had ruled them out there as being too unreliable.  Turns
        out that affects 32-bit platforms, too.
      
        One NULL pointer fix for SGI UV.
      
        Two minor build fixes, one of which only affects icc and the other
        which affects icc and future versions or nonstandard default settings
        of gcc"
      
      * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, efi: Don't use (U)EFI time services on 32 bit
        x86, build, icc: Remove uninitialized_var() from compiler-intel.h
        x86/UV: Fix NULL pointer dereference in uv_flush_tlb_others() if the 'nobau' boot option is used
        x86, build: Pass in additional -mno-mmx, -mno-sse options
      908bfda7