1. 23 Mar, 2013 18 commits
    • Kent Overstreet's avatar
      block: Add bio_copy_data() · 16ac3d63
      Kent Overstreet authored
      This gets open coded quite a bit and it's tricky to get right, so make a
      generic version and convert some existing users over to it instead.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      16ac3d63
    • Kent Overstreet's avatar
      raid1: Refactor narrow_write_error() to not use bi_idx · b783863f
      Kent Overstreet authored
      More bi_idx removal. This code was just open coding bio_clone(). This
      could probably be further improved by using bio_advance() instead of
      skipping over null pages, but that'd be a larger rework.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      b783863f
    • Kent Overstreet's avatar
      raid5: use bio_reset() · 2f6db2a7
      Kent Overstreet authored
      Had to shuffle the code around a bit (where bi_rw and bi_end_io were
      set), but shouldn't really be anything tricky here
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      2f6db2a7
    • Kent Overstreet's avatar
      raid1: use bio_reset() · 2aabaa65
      Kent Overstreet authored
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      2aabaa65
    • Kent Overstreet's avatar
      raid10: Use bio_reset() · 8be185f2
      Kent Overstreet authored
      More prep work for immutable bio vecs, mainly getting rid of references
      to bi_idx.
      
      bio_reset was being open coded in a few places. The one in sync_request
      was a bit nontrivial to convert, so could use some extra eyeballs.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      Acked-by: default avatarNeilBrown <neilb@suse.de>
      8be185f2
    • Kent Overstreet's avatar
      block: Add submit_bio_wait(), remove from md · 9e882242
      Kent Overstreet authored
      Random cleanup - this code was duplicated and it's not really specific
      to md.
      
      Also added the ability to return the actual error code.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      9e882242
    • Kent Overstreet's avatar
      block: Remove some unnecessary bi_vcnt usage · 2f477877
      Kent Overstreet authored
      More prep work for immutable bvecs/effecient bio splitting - usage of
      bi_vcnt has to be auditing, so getting rid of all the unnecessary usage
      makes that easier.
      
      Plus, bio_segments() is really what this code wanted, as it respects the
      current value of bi_idx.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Eric Moore <Eric.Moore@lsi.com>
      CC: "James E.J. Bottomley" <JBottomley@parallels.com>
      CC: linux-scsi@vger.kernel.org
      2f477877
    • Kent Overstreet's avatar
      block: Remove bi_idx references · 4f2ac93c
      Kent Overstreet authored
      For immutable bvecs, all bi_idx usage needs to be audited - so here
      we're removing all the unnecessary uses.
      
      Most of these are places where it was being initialized on a bio that
      was just allocated, a few others are conversions to standard macros.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      4f2ac93c
    • Kent Overstreet's avatar
      block: Change bio_split() to respect the current value of bi_idx · 5b83636a
      Kent Overstreet authored
      In the current code bio_split() won't be seeing partially completed bios
      so this doesn't change any behaviour, but this makes the code a bit
      clearer as to what bio_split() actually requires.
      
      The immediate purpose of the patch is removing unnecessary bi_idx
      references, but the end goal is to allow partial completed bios to be
      submitted, which along with immutable biovecs enables effecient bio
      splitting.
      
      Some of the callers were (double) checking that bios could be split, so
      update their checks too.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
      CC: Neil Brown <neilb@suse.de>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      5b83636a
    • Kent Overstreet's avatar
      block: Use bio_sectors() more consistently · aa8b57aa
      Kent Overstreet authored
      Bunch of places in the code weren't using it where they could be -
      this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
      into a struct bvec_iter.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: "Ed L. Cashin" <ecashin@coraid.com>
      CC: Nick Piggin <npiggin@kernel.dk>
      CC: Jiri Kosina <jkosina@suse.cz>
      CC: Jim Paris <jim@jtan.com>
      CC: Geoff Levand <geoff@infradead.org>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Neil Brown <neilb@suse.de>
      CC: Steven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarEd Cashin <ecashin@coraid.com>
      aa8b57aa
    • Kent Overstreet's avatar
      block: Add bio_end_sector() · f73a1c7d
      Kent Overstreet authored
      Just a little convenience macro - main reason to add it now is preparing
      for immutable bio vecs, it'll reduce the size of the patch that puts
      bi_sector/bi_size/bi_idx into a struct bvec_iter.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
      CC: Jiri Kosina <jkosina@suse.cz>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Neil Brown <neilb@suse.de>
      CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
      CC: Heiko Carstens <heiko.carstens@de.ibm.com>
      CC: linux-s390@vger.kernel.org
      CC: Chris Mason <chris.mason@fusionio.com>
      CC: Steven Whitehouse <swhiteho@redhat.com>
      Acked-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      f73a1c7d
    • Kent Overstreet's avatar
      md: Convert md_trim_bio() to use bio_advance() · fb9e3534
      Kent Overstreet authored
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      Acked-by: default avatarNeilBrown <neilb@suse.de>
      fb9e3534
    • Kent Overstreet's avatar
      block: Refactor blk_update_request() · f79ea416
      Kent Overstreet authored
      Converts it to use bio_advance(), simplifying it quite a bit in the
      process.
      
      Note that req_bio_endio() now always calls bio_advance() - which means
      it always loops over the biovec, not just on partial completions. Don't
      expect it to affect performance, but worth noting.
      
      Tested it by forcing partial updates, and dumping before and after on
      various bio/bvec fields when doing a partial update.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      f79ea416
    • Kent Overstreet's avatar
      block: Add bio_advance() · 054bdf64
      Kent Overstreet authored
      This is prep work for immutable bio vecs; we first want to centralize
      where bvecs are modified.
      
      Next two patches convert some existing code to use this function.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      054bdf64
    • Kent Overstreet's avatar
      block: Convert integrity to bvec_alloc_bs() · 9f060e22
      Kent Overstreet authored
      This adds a pointer to the bvec array to struct bio_integrity_payload,
      instead of the bvecs always being inline; then the bvecs are allocated
      with bvec_alloc_bs().
      
      Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
      mempool instead of the bioset, so that bio integrity can use a different
      mempool for its bvecs, and thus avoid a potential deadlock.
      
      This is eventually for immutable bio vecs - immutable bvecs aren't
      useful if we still have to copy them, hence the need for the pointer.
      Less code is always nice too, though.
      
      Also, bio_integrity_alloc() was using fs_bio_set if no bio_set was
      specified. This was wrong - using the bio_set doesn't protect us from
      memory allocation failures, because we just used kmalloc for the
      bio_integrity_payload. But it does introduce the possibility of
      deadlock, if for some reason we weren't supposed to be using fs_bio_set.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      9f060e22
    • Kent Overstreet's avatar
      block: Fix a buffer overrun in bio_integrity_split() · 6fda981c
      Kent Overstreet authored
      bio_integrity_split() seemed to be confusing pointers and arrays -
      bip_vec in bio_integrity_payload was an array appended to the end of the
      payload, so the bio_vecs in struct bio_pair should have come after the
      bio_integrity_payload they're for.
      
      Fix it by making bip_vec a pointer to the inline vecs - a later patch is
      going to make more use of this pointer.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      6fda981c
    • Kent Overstreet's avatar
      block: Avoid deadlocks with bio allocation by stacking drivers · df2cb6da
      Kent Overstreet authored
      Previously, if we ever try to allocate more than once from the same bio
      set while running under generic_make_request() (i.e. a stacking block
      driver), we risk deadlock.
      
      This is because of the code in generic_make_request() that converts
      recursion to iteration; any bios we submit won't actually be submitted
      (so they can complete and eventually be freed) until after we return -
      this means if we allocate a second bio, we're blocking the first one
      from ever being freed.
      
      Thus if enough threads call into a stacking block driver at the same
      time with bios that need multiple splits, and the bio_set's reserve gets
      used up, we deadlock.
      
      This can be worked around in the driver code - we could check if we're
      running under generic_make_request(), then mask out __GFP_WAIT when we
      go to allocate a bio, and if the allocation fails punt to workqueue and
      retry the allocation.
      
      But this is tricky and not a generic solution. This patch solves it for
      all users by inverting the previously described technique. We allocate a
      rescuer workqueue for each bio_set, and then in the allocation code if
      there are bios on current->bio_list we would be blocking, we punt them
      to the rescuer workqueue to be submitted.
      
      This guarantees forward progress for bio allocations under
      generic_make_request() provided each bio is submitted before allocating
      the next, and provided the bios are freed after they complete.
      
      Note that this doesn't do anything for allocation from other mempools.
      Instead of allocating per bio data structures from a mempool, code
      should use bio_set's front_pad.
      
      Tested it by forcing the rescue codepath to be taken (by disabling the
      first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
      of arbitrary bio splitting) and verified that the rescuer was being
      invoked.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarMuthukumar Ratty <muthur@gmail.com>
      df2cb6da
    • Kent Overstreet's avatar
      block: Reorder struct bio_set · 57fb233f
      Kent Overstreet authored
      This is prep work for the next patch, which embeds a struct bio_list in
      struct bio_set.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      57fb233f
  2. 17 Mar, 2013 4 commits
    • Linus Torvalds's avatar
      Linux 3.9-rc3 · a937536b
      Linus Torvalds authored
      a937536b
    • David Rientjes's avatar
      perf,x86: fix link failure for non-Intel configs · 6c4d3bc9
      David Rientjes authored
      Commit 1d9d8639 ("perf,x86: fix kernel crash with PEBS/BTS after
      suspend/resume") introduces a link failure since
      perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:
      
      	arch/x86/power/built-in.o: In function `restore_processor_state':
      	(.text+0x45c): undefined reference to `perf_restore_debug_store'
      
      Fix it by defining the dummy function appropriately.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c4d3bc9
    • Linus Torvalds's avatar
      perf,x86: fix wrmsr_on_cpu() warning on suspend/resume · 2a6e06b2
      Linus Torvalds authored
      Commit 1d9d8639 ("perf,x86: fix kernel crash with PEBS/BTS after
      suspend/resume") fixed a crash when doing PEBS performance profiling
      after resuming, but in using init_debug_store_on_cpu() to restore the
      DS_AREA mtrr it also resulted in a new WARN_ON() triggering.
      
      init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
      cross-calls to do the MSR update.  Which is not really valid at the
      early resume stage, and the warning is quite reasonable.  Now, it all
      happens to _work_, for the simple reason that smp_call_function_single()
      ends up just doing the call directly on the CPU when the CPU number
      matches, but we really should just do the wrmsr() directly instead.
      
      This duplicates the wrmsr() logic, but hopefully we can just remove the
      wrmsr_on_cpu() version eventually.
      Reported-and-tested-by: default avatarParag Warudkar <parag.lkml@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a6e06b2
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 08637024
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "Eric's rcu barrier patch fixes a long standing problem with our
        unmount code hanging on to devices in workqueue helpers.  Liu Bo
        nailed down a difficult assertion for in-memory extent mappings."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix warning of free_extent_map
        Btrfs: fix warning when creating snapshots
        Btrfs: return as soon as possible when edquot happens
        Btrfs: return EIO if we have extent tree corruption
        btrfs: use rcu_barrier() to wait for bdev puts at unmount
        Btrfs: remove btrfs_try_spin_lock
        Btrfs: get better concurrency for snapshot-aware defrag work
      08637024
  3. 16 Mar, 2013 8 commits
    • Liu Bo's avatar
      Btrfs: fix warning of free_extent_map · 3b277594
      Liu Bo authored
      Users report that an extent map's list is still linked when it's actually
      going to be freed from cache.
      
      The story is that
      
      a) when we're going to drop an extent map and may split this large one into
      smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
      that it's on the list to be logged, then the smaller ems split from it will also
      be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.
      
      b) we'll keep ems from unlinking the list and freeing when they are flagged with
      EXTENT_FLAG_LOGGING, because the log code holds one reference.
      
      The end result is the warning, but the truth is that we set the flag
      EXTENT_FLAG_LOGGING only during fsync.
      
      So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.
      Reported-by: default avatarJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reported-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      3b277594
    • Linus Torvalds's avatar
      Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · e2043785
      Linus Torvalds authored
      Pull kbuild fix from Michal Marek:
       "One fix for for make headers_install/headers_check to not require make
        3.81.  The requirement has been accidentally introduced in 3.7."
      
      * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kbuild: fix make headers_check with make 3.80
      e2043785
    • Linus Torvalds's avatar
      Merge tag 'for-3.9-rc3' of git://openrisc.net/jonas/linux · 23659587
      Linus Torvalds authored
      Pull OpenRISC bug fixes from Jonas Bonn:
      
       - The GPIO descriptor work has exposed how broken the non-GPIOLIB bits
         for OpenRISC were.  We now require GPIOLIB as this is the preferred
         way forward.
      
       - The system.h split introduced a bug in llist.h for arches using
         asm-generic/cmpxchg.h directly, which is currently only OpenRISC.
         The patch here moves two defines from asm-generic/atomic.h to
         asm-generic/cmpxchg.h to make things work as they should.
      
       - The VIRT_TO_BUS selector was added for OpenRISC, but OpenRISC does
         not have the virt_to_bus methods, so there's a patch to remove it
         again.
      
      * tag 'for-3.9-rc3' of git://openrisc.net/jonas/linux:
        openrisc: remove HAVE_VIRT_TO_BUS
        asm-generic: move cmpxchg*_local defs to cmpxchg.h
        openrisc: require gpiolib
      23659587
    • Linus Torvalds's avatar
      Merge tag 'char-misc-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 9e1a0aab
      Linus Torvalds authored
      Pull char/misc fixes from Greg Kroah-Hartman:
       "Here are some tiny fixes for the w1 drivers and the final removal
        patch for getting rid of CONFIG_EXPERIMENTAL (all users of it are now
        gone from your tree, this just drops the Kconfig item itself.)
      
        All have been in the linux-next tree for a while"
      
      * tag 'char-misc-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        final removal of CONFIG_EXPERIMENTAL
        w1: fix oops when w1_search is called from netlink connector
        w1-gpio: fix unused variable warning
        w1-gpio: remove erroneous __exit and __exit_p()
        ARM: w1-gpio: fix erroneous gpio requests
      9e1a0aab
    • Linus Torvalds's avatar
      Merge tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 5cd8846c
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes, as expected for the middle rc:
         - A couple of fixes for potential NULL dereferences and out-of-range
           array accesses revealed by static code parsers
         - A fix for the wrong error handling detected by trinity
         - A regression fix for missing audio on some MacBooks
         - CA0132 DSP loader fixes
         - Fix for EAPD control of IDT codecs on machines w/o speaker
         - Fix a regression in the HD-audio widget list parser code
         - Workaround for the NuForce UDH-100 USB audio"
      
      * tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Fix missing EAPD/GPIO setup for Cirrus codecs
        sound: sequencer: cap array index in seq_chn_common_event()
        ALSA: hda/ca0132 - Remove extra setting of dsp_state.
        ALSA: hda/ca0132 - Check download state of DSP.
        ALSA: hda/ca0132 - Check if dspload_image succeeded.
        ALSA: hda - Disable IDT eapd_switch if there are no internal speakers
        ALSA: hda - Fix snd_hda_get_num_raw_conns() to return a correct value
        ALSA: usb-audio: add a workaround for the NuForce UDH-100
        ALSA: asihpi - fix potential NULL pointer dereference
        ALSA: seq: Fix missing error handling in snd_seq_timer_open()
      5cd8846c
    • Linus Torvalds's avatar
      Merge branch 'fixes-for-3.9' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping · c7f17deb
      Linus Torvalds authored
      Pull DMA-mapping fix from Marek Szyprowski:
       "An important fix for all ARM architectures which use ZONE_DMA.
        Without it dma_alloc_* calls with GFP_ATOMIC flag might have allocated
        buffers outsize DMA zone."
      
      * 'fixes-for-3.9' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
        ARM: DMA-mapping: add missing GFP_DMA flag for atomic buffer allocation
      c7f17deb
    • Linus Torvalds's avatar
      Merge tag 'mfd-fixes-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes · de1893f6
      Linus Torvalds authored
      Pull MFD fixes from Samuel Ortiz:
       "This is the first batch of MFD fixes for 3.9.
      
        With this one we have:
      
         - An ab8500 build failure fix.
         - An ab8500 device tree parsing fix.
         - A fix for twl4030_madc remove routine to work properly (when
           built-in).
         - A fix for properly registering palmas interrupt handler.
         - A fix for omap-usb init routine to actually write into the
           hostconfig register.
         - A couple of warning fixes for ab8500-gpadc and tps65912"
      
      * tag 'mfd-fixes-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes:
        mfd: twl4030-madc: Remove __exit_p annotation
        mfd: ab8500: Kill "reg" property from binding
        mfd: ab8500-gpadc: Complain if we fail to enable vtvout LDO
        mfd: wm831x: Don't forward declare enum wm831x_auxadc
        mfd: twl4030-audio: Fix argument type for twl4030_audio_disable_resource()
        mfd: tps65912: Declare and use tps65912_irq_exit()
        mfd: palmas: Provide irq flags through DT/platform data
        mfd: Make AB8500_CORE select POWER_SUPPLY to fix build error
        mfd: omap-usb-host: Actually update hostconfig
      de1893f6
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 92fbb1c9
      Linus Torvalds authored
      Pull hwmon fixes from Guenter Roeck:
       "Bug fixes for pmbus, ltc2978, and lineage-pem drivers
      
        Added specific maintainer for some hwmon drivers"
      
      * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (pmbus/ltc2978) Fix temperature reporting
        hwmon: (pmbus) Fix krealloc() misuse in pmbus_add_attribute()
        hwmon: (lineage-pem) Add missing terminating entry for pem_[input|fan]_attributes
        MAINTAINERS: Add maintainer for MAX6697, INA209, and INA2XX drivers
      92fbb1c9
  4. 15 Mar, 2013 8 commits
  5. 14 Mar, 2013 2 commits
    • Linus Torvalds's avatar
      Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · f4846e52
      Linus Torvalds authored
      Pull fix for hlist_entry_safe() regression from Paul McKenney:
       "This contains a single commit that fixes a regression in
        hlist_entry_safe().  This macro references its argument twice, which
        can cause NULL-pointer errors.  This commit applies a gcc statement
        expression, creating a temporary variable to avoid the double
        reference.  This has been posted to LKML at
      
          https://lkml.org/lkml/2013/3/9/75.
      
        Kudos to CAI Qian, whose testing uncovered this, to Eric Dumazet, who
        spotted root cause, and to Li Zefan, who tested this commit."
      
      * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        list: Fix double fetch of pointer in hlist_entry_safe()
      f4846e52
    • Paul E. McKenney's avatar
      list: Fix double fetch of pointer in hlist_entry_safe() · f65846a1
      Paul E. McKenney authored
      The current version of hlist_entry_safe() fetches the pointer twice,
      once to test for NULL and the other to compute the offset back to the
      enclosing structure.  This is OK for normal lock-based use because in
      that case, the pointer cannot change.  However, when the pointer is
      protected by RCU (as in "rcu_dereference(p)"), then the pointer can
      change at any time.  This use case can result in the following sequence
      of events:
      
      1.	CPU 0 invokes hlist_entry_safe(), fetches the RCU-protected
      	pointer as sees that it is non-NULL.
      
      2.	CPU 1 invokes hlist_del_rcu(), deleting the entry that CPU 0
      	just fetched a pointer to.  Because this is the last entry
      	in the list, the pointer fetched by CPU 0 is now NULL.
      
      3.	CPU 0 refetches the pointer, obtains NULL, and then gets a
      	NULL-pointer crash.
      
      This commit therefore applies gcc's "({ })" statement expression to
      create a temporary variable so that the specified pointer is fetched
      only once, avoiding the above sequence of events.  Please note that
      it is the caller's responsibility to use rcu_dereference() as needed.
      This allows RCU-protected uses to work correctly without imposing
      any additional overhead on the non-RCU case.
      
      Many thanks to Eric Dumazet for spotting root cause!
      Reported-by: default avatarCAI Qian <caiqian@redhat.com>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarLi Zefan <lizefan@huawei.com>
      f65846a1