1. 19 Jun, 2013 10 commits
    • David Howells's avatar
      FS-Cache: Don't use spin_is_locked() in assertions · dcfae32f
      David Howells authored
      Under certain circumstances, spin_is_locked() is hardwired to 0 - even when the
      code would normally be in a locked section where it should return 1.  This
      means it cannot be used for an assertion that checks that a spinlock is locked.
      
      Remove such usages from FS-Cache.
      
      The following oops might otherwise be observed:
      
      FS-Cache: Assertion failed
      BUG: failure at fs/fscache/operation.c:270/fscache_start_operations()!
      Kernel panic - not syncing: BUG!
      CPU: 0 PID: 10 Comm: kworker/u2:1 Not tainted 3.10.0-rc1-00133-ge7ebb75 #2
      Workqueue: fscache_operation fscache_op_work_func [fscache]
      7f091c48 603c8947 7f090000 7f9b1361 7f25f080 00000001 7f26d440 7f091c90
      60299eb8 7f091d90 602951c5 7f26d440 3000000008 7f091da0 7f091cc0 7f091cd0
      00000007 00000007 00000006 7f091ae0 00000010 0000010e 7f9af330 7f091ae0
      Call Trace:
      7f091c88: [<60299eb8>] dump_stack+0x17/0x19
      7f091c98: [<602951c5>] panic+0xf4/0x1e9
      7f091d38: [<6002b10e>] set_signals+0x1e/0x40
      7f091d58: [<6005b89e>] __wake_up+0x4e/0x70
      7f091d98: [<7f9aa003>] fscache_start_operations+0x43/0x50 [fscache]
      7f091da8: [<7f9aa1e3>] fscache_op_complete+0x1d3/0x220 [fscache]
      7f091db8: [<60082985>] unlock_page+0x55/0x60
      7f091de8: [<7fb25bb0>] cachefiles_read_copier+0x250/0x330 [cachefiles]
      7f091e58: [<7f9ab03c>] fscache_op_work_func+0xac/0x120 [fscache]
      7f091e88: [<6004d5b0>] process_one_work+0x250/0x3a0
      7f091ef8: [<6004edc7>] worker_thread+0x177/0x2a0
      7f091f38: [<6004ec50>] worker_thread+0x0/0x2a0
      7f091f58: [<60054418>] kthread+0xd8/0xe0
      7f091f68: [<6005bb27>] finish_task_switch.isra.64+0x37/0xa0
      7f091fd8: [<600185cf>] new_thread_handler+0x8f/0xb0
      Reported-by: default avatarMilosz Tanski <milosz@adfin.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-and-tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      dcfae32f
    • David Howells's avatar
      FS-Cache: The retrieval remaining-pages counter needs to be atomic_t · 1bb4b7f9
      David Howells authored
      struct fscache_retrieval contains a count of the number of pages that still
      need some processing (n_pages).  This is decremented as the pages are
      processed.
      
      However, this needs to be atomic as fscache_retrieval_complete() (I think) just
      occasionally may be called from cachefiles_read_backing_file() and
      cachefiles_read_copier() simultaneously.
      
      This happens when an fscache_read_or_alloc_pages() request containing a lot of
      pages (say a couple of hundred) is being processed.  The read on each backing
      page is dispatched individually because we need to insert a monitor into the
      waitqueue to catch when the read completes.  However, under low-memory
      conditions, we might be forced to wait in the allocator - and this gives the
      I/O on the backing page a chance to complete first.
      
      When the I/O completes, fscache_enqueue_retrieval() chucks the retrieval onto
      the workqueue without waiting for the operation to finish the initial I/O
      dispatch (we want to release any pages we can as soon as we can), thus both can
      end up running simultaneously and potentially attempting to partially complete
      the retrieval simultaneously (ENOMEM may occur, backing pages may already be in
      the page cache).
      
      This was demonstrated by parallelling the non-atomic counter with an atomic
      counter and printing both of them when the assertion fails.  At this point, the
      atomic counter has reached zero, but the non-atomic counter has not.
      
      To fix this, make the counter an atomic_t.
      
      This results in the following bug appearing
      
      	FS-Cache: Assertion failed
      	3 == 5 is false
      	------------[ cut here ]------------
      	kernel BUG at fs/fscache/operation.c:421!
      
      or
      
      	FS-Cache: Assertion failed
      	3 == 5 is false
      	------------[ cut here ]------------
      	kernel BUG at fs/fscache/operation.c:414!
      
      With a backtrace like the following:
      
      RIP: 0010:[<ffffffffa0211b1d>] fscache_put_operation+0x1ad/0x240 [fscache]
      Call Trace:
       [<ffffffffa0213185>] fscache_retrieval_work+0x55/0x270 [fscache]
       [<ffffffffa0213130>] ? fscache_retrieval_work+0x0/0x270 [fscache]
       [<ffffffff81090b10>] worker_thread+0x170/0x2a0
       [<ffffffff81096d10>] ? autoremove_wake_function+0x0/0x40
       [<ffffffff810909a0>] ? worker_thread+0x0/0x2a0
       [<ffffffff81096966>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff810968d0>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-and-tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      1bb4b7f9
    • Haicheng Li's avatar
    • David Howells's avatar
      FS-Cache: Simplify cookie retention for fscache_objects, fixing oops · 1362729b
      David Howells authored
      Simplify the way fscache cache objects retain their cookie.  The way I
      implemented the cookie storage handling made synchronisation a pain (ie. the
      object state machine can't rely on the cookie actually still being there).
      
      Instead of the the object being detached from the cookie and the cookie being
      freed in __fscache_relinquish_cookie(), we defer both operations:
      
       (*) The detachment of the object from the list in the cookie now takes place
           in fscache_drop_object() and is thus governed by the object state machine
           (fscache_detach_from_cookie() has been removed).
      
       (*) The release of the cookie is now in fscache_object_destroy() - which is
           called by the cache backend just before it frees the object.
      
      This means that the fscache_cookie struct is now available to the cache all the
      way through from ->alloc_object() to ->drop_object() and ->put_object() -
      meaning that it's no longer necessary to take object->lock to guarantee access.
      
      However, __fscache_relinquish_cookie() doesn't wait for the object to go all
      the way through to destruction before letting the netfs proceed.  That would
      massively slow down the netfs.  Since __fscache_relinquish_cookie() leaves the
      cookie around, in must therefore break all attachments to the netfs - which
      includes ->def, ->netfs_data and any outstanding page read/writes.
      
      To handle this, struct fscache_cookie now has an n_active counter:
      
       (1) This starts off initialised to 1.
      
       (2) Any time the cache needs to get at the netfs data, it calls
           fscache_use_cookie() to increment it - if it is not zero.  If it was zero,
           then access is not permitted.
      
       (3) When the cache has finished with the data, it calls fscache_unuse_cookie()
           to decrement it.  This does a wake-up on it if it reaches 0.
      
       (4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
           reach 0.  The initialisation to 1 in step (1) ensures that we only get
           wake ups when we're trying to get rid of the cookie.
      
      This leaves __fscache_relinquish_cookie() a lot simpler.
      
      
      ***
      This fixes a problem in the current code whereby if fscache_invalidate() is
      followed sufficiently quickly by fscache_relinquish_cookie() then it is
      possible for __fscache_relinquish_cookie() to have detached the cookie from the
      object and cleared the pointer before a thread is dispatched to process the
      invalidation state in the object state machine.
      
      Since the pending write clearance was deferred to the invalidation state to
      make it asynchronous, we need to either wait in relinquishment for the stores
      tree to be cleared in the invalidation state or we need to handle the clearance
      in relinquishment.
      
      Further, if the relinquishment code does clear the tree, then the invalidation
      state need to make the clearance contingent on still having the cookie to hand
      (since that's where the tree is rooted) and we have to prevent the cookie from
      disappearing for the duration.
      
      This can lead to an oops like the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
      ...
      RIP: 0010:[<ffffffff8151023e>] _spin_lock+0xe/0x30
      ...
      CR2: 000000000000000c ...
      ...
      Process kslowd002 (...)
      ....
      Call Trace:
       [<ffffffffa01c3278>] fscache_invalidate_writes+0x38/0xd0 [fscache]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150
       [<ffffffff8110ddd4>] ? slow_work_enqueue+0x104/0x180
       [<ffffffffa01c1303>] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
       [<ffffffff81096b67>] ? bit_waitqueue+0x17/0xd0
       [<ffffffff8110e233>] slow_work_execute+0x233/0x310
       [<ffffffff8110e515>] slow_work_thread+0x205/0x360
       [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffff8110e310>] ? slow_work_thread+0x0/0x360
       [<ffffffff81096936>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff810968a0>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      
      The parameter to fscache_invalidate_writes() was object->cookie which is NULL.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      1362729b
    • David Howells's avatar
      FS-Cache: Fix object state machine to have separate work and wait states · caaef690
      David Howells authored
      Fix object state machine to have separate work and wait states as that makes
      it easier to envision.
      
      There are now three kinds of state:
      
       (1) Work state.  This is an execution state.  No event processing is performed
           by a work state.  The function attached to a work state returns a pointer
           indicating the next state to which the OSM should transition.  Returning
           NO_TRANSIT repeats the current state, but goes back to the scheduler
           first.
      
       (2) Wait state.  This is an event processing state.  No execution is
           performed by a wait state.  Wait states are just tables of "if event X
           occurs, clear it and transition to state Y".  The dispatcher returns to
           the scheduler if none of the events in which the wait state has an
           interest are currently pending.
      
       (3) Out-of-band state.  This is a special work state.  Transitions to normal
           states can be overridden when an unexpected event occurs (eg. I/O error).
           Instead the dispatcher disables and clears the OOB event and transits to
           the specified work state.  This then acts as an ordinary work state,
           though object->state points to the overridden destination.  Returning
           NO_TRANSIT resumes the overridden transition.
      
      In addition, the states have names in their definitions, so there's no need for
      tables of state names.  Further, the EV_REQUEUE event is no longer necessary as
      that is automatic for work states.
      
      Since the states are now separate structs rather than values in an enum, it's
      not possible to use comparisons other than (non-)equality between them, so use
      some object->flags to indicate what phase an object is in.
      
      The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
      (EV_KILL).  An object flag now carries the information about retirement.
      
      Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
      into an KILL_OBJECT state and additional states have been added for handling
      waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).
      
      A state has also been added for synchronising with parent object initialisation
      (WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      caaef690
    • David Howells's avatar
      FS-Cache: Wrap checks on object state · 493f7bc1
      David Howells authored
      Wrap checks on object state (mostly outside of fs/fscache/object.c) with
      inline functions so that the mechanism can be replaced.
      
      Some of the state checks within object.c are left as-is as they will be
      replaced.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      493f7bc1
    • David Howells's avatar
      FS-Cache: Uninline fscache_object_init() · 610be24e
      David Howells authored
      Uninline fscache_object_init() so as not to expose some of the FS-Cache
      internals to the cache backend.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      610be24e
    • David Howells's avatar
      FS-Cache: Don't sleep in page release if __GFP_FS is not set · 0c59a95d
      David Howells authored
      Don't sleep in __fscache_maybe_release_page() if __GFP_FS is not set.  This
      goes some way towards mitigating fscache deadlocking against ext4 by way of
      the allocator, eg:
      
      INFO: task flush-8:0:24427 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      flush-8:0       D ffff88003e2b9fd8     0 24427      2 0x00000000
       ffff88003e2b9138 0000000000000046 ffff880012e3a040 ffff88003e2b9fd8
       0000000000011c80 ffff88003e2b9fd8 ffffffff81a10400 ffff880012e3a040
       0000000000000002 ffff880012e3a040 ffff88003e2b9098 ffffffff8106dcf5
      Call Trace:
       [<ffffffff8106dcf5>] ? __lock_is_held+0x31/0x53
       [<ffffffff81219b61>] ? radix_tree_lookup_element+0xf4/0x12a
       [<ffffffff81454bed>] schedule+0x60/0x62
       [<ffffffffa01d349c>] __fscache_wait_on_page_write+0x8b/0xa5 [fscache]
       [<ffffffff810498a8>] ? __init_waitqueue_head+0x4d/0x4d
       [<ffffffffa01d393a>] __fscache_maybe_release_page+0x30c/0x324 [fscache]
       [<ffffffffa01d369a>] ? __fscache_maybe_release_page+0x6c/0x324 [fscache]
       [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170
       [<ffffffffa01fd7b2>] nfs_fscache_release_page+0x68/0x94 [nfs]
       [<ffffffffa01ef73e>] nfs_release_page+0x7e/0x86 [nfs]
       [<ffffffff810aa553>] try_to_release_page+0x32/0x3b
       [<ffffffff810b6c70>] shrink_page_list+0x535/0x71a
       [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170
       [<ffffffff810b7352>] shrink_inactive_list+0x20a/0x2dd
       [<ffffffff81071a13>] ? mark_held_locks+0xbe/0xea
       [<ffffffff810b7a65>] shrink_lruvec+0x34c/0x3eb
       [<ffffffff810b7bd3>] do_try_to_free_pages+0xcf/0x355
       [<ffffffff810b7fc8>] try_to_free_pages+0x9a/0xa1
       [<ffffffff810b08d2>] __alloc_pages_nodemask+0x494/0x6f7
       [<ffffffff810d9a07>] kmem_getpages+0x58/0x155
       [<ffffffff810dc002>] fallback_alloc+0x120/0x1f3
       [<ffffffff8106db23>] ? trace_hardirqs_off+0xd/0xf
       [<ffffffff810dbed3>] ____cache_alloc_node+0x177/0x186
       [<ffffffff81162a6c>] ? ext4_init_io_end+0x1c/0x37
       [<ffffffff810dc403>] kmem_cache_alloc+0xf1/0x176
       [<ffffffff810b17ac>] ? test_set_page_writeback+0x101/0x113
       [<ffffffff81162a6c>] ext4_init_io_end+0x1c/0x37
       [<ffffffff81162ce4>] ext4_bio_write_page+0x20f/0x3af
       [<ffffffff8115cc02>] mpage_da_submit_io+0x26e/0x2f6
       [<ffffffff811088e5>] ? __find_get_block_slow+0x38/0x133
       [<ffffffff81161348>] mpage_da_map_and_submit+0x3a7/0x3bd
       [<ffffffff81161a60>] ext4_da_writepages+0x30d/0x426
       [<ffffffff810b3359>] do_writepages+0x1c/0x2a
       [<ffffffff81102f4d>] __writeback_single_inode+0x3e/0xe5
       [<ffffffff81103995>] writeback_sb_inodes+0x1bd/0x2f4
       [<ffffffff81103b3b>] __writeback_inodes_wb+0x6f/0xb4
       [<ffffffff81103c81>] wb_writeback+0x101/0x195
       [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170
       [<ffffffff811043aa>] ? wb_do_writeback+0xaa/0x173
       [<ffffffff8110434a>] wb_do_writeback+0x4a/0x173
       [<ffffffff81071bbc>] ? trace_hardirqs_on+0xd/0xf
       [<ffffffff81038554>] ? del_timer+0x4b/0x5b
       [<ffffffff811044e0>] bdi_writeback_thread+0x6d/0x147
       [<ffffffff81104473>] ? wb_do_writeback+0x173/0x173
       [<ffffffff81048fbc>] kthread+0xd0/0xd8
       [<ffffffff81455eb2>] ? _raw_spin_unlock_irq+0x29/0x3e
       [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55
       [<ffffffff81456aac>] ret_from_fork+0x7c/0xb0
       [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55
      2 locks held by flush-8:0/24427:
       #0:  (&type->s_umount_key#41){.+.+..}, at: [<ffffffff810e3b73>] grab_super_passive+0x4c/0x76
       #1:  (jbd2_handle){+.+...}, at: [<ffffffff81190d81>] start_this_handle+0x475/0x4ea
      
      
      The problem here is that another thread, which is attempting to write the
      to-be-stored NFS page to the on-ext4 cache file is waiting for the journal
      lock, eg:
      
      INFO: task kworker/u:2:24437 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kworker/u:2     D ffff880039589768     0 24437      2 0x00000000
       ffff8800395896d8 0000000000000046 ffff8800283bf040 ffff880039589fd8
       0000000000011c80 ffff880039589fd8 ffff880039f0b040 ffff8800283bf040
       0000000000000006 ffff8800283bf6b8 ffff880039589658 ffffffff81071a13
      Call Trace:
       [<ffffffff81071a13>] ? mark_held_locks+0xbe/0xea
       [<ffffffff81455e73>] ? _raw_spin_unlock_irqrestore+0x3a/0x50
       [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170
       [<ffffffff81071bbc>] ? trace_hardirqs_on+0xd/0xf
       [<ffffffff81454bed>] schedule+0x60/0x62
       [<ffffffff81190c23>] start_this_handle+0x317/0x4ea
       [<ffffffff810498a8>] ? __init_waitqueue_head+0x4d/0x4d
       [<ffffffff81190fcc>] jbd2__journal_start+0xb3/0x12e
       [<ffffffff81176606>] __ext4_journal_start_sb+0xb2/0xc6
       [<ffffffff8115f137>] ext4_da_write_begin+0x109/0x233
       [<ffffffff810a964d>] generic_file_buffered_write+0x11a/0x264
       [<ffffffff811032cf>] ? __mark_inode_dirty+0x2d/0x1ee
       [<ffffffff810ab1ab>] __generic_file_aio_write+0x2a5/0x2d5
       [<ffffffff810ab24a>] generic_file_aio_write+0x6f/0xd0
       [<ffffffff81159a2c>] ext4_file_write+0x38c/0x3c4
       [<ffffffff810e0915>] do_sync_write+0x91/0xd1
       [<ffffffffa00a17f0>] cachefiles_write_page+0x26f/0x310 [cachefiles]
       [<ffffffffa01d470b>] fscache_write_op+0x21e/0x37a [fscache]
       [<ffffffff81455eb2>] ? _raw_spin_unlock_irq+0x29/0x3e
       [<ffffffffa01d2479>] fscache_op_work_func+0x78/0xd7 [fscache]
       [<ffffffff8104455a>] process_one_work+0x232/0x3a8
       [<ffffffff810444ff>] ? process_one_work+0x1d7/0x3a8
       [<ffffffff81044ee0>] worker_thread+0x214/0x303
       [<ffffffff81044ccc>] ? manage_workers+0x245/0x245
       [<ffffffff81048fbc>] kthread+0xd0/0xd8
       [<ffffffff81455eb2>] ? _raw_spin_unlock_irq+0x29/0x3e
       [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55
       [<ffffffff81456aac>] ret_from_fork+0x7c/0xb0
       [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55
      4 locks held by kworker/u:2/24437:
       #0:  (fscache_operation){.+.+.+}, at: [<ffffffff810444ff>] process_one_work+0x1d7/0x3a8
       #1:  ((&op->work)){+.+.+.}, at: [<ffffffff810444ff>] process_one_work+0x1d7/0x3a8
       #2:  (sb_writers#14){.+.+.+}, at: [<ffffffff810ab22c>] generic_file_aio_write+0x51/0xd0
       #3:  (&sb->s_type->i_mutex_key#19){+.+.+.}, at: [<ffffffff810ab236>] generic_file_aio_write+0x5b/0x
      
      fscache already tries to cancel pending stores, but it can't cancel a write
      for which I/O is already in progress.
      
      An alternative would be to accept writing garbage to the cache under extreme
      circumstances and to kill the afflicted cache object if we have to do this.
      However, we really need to know how strapped the allocator is before deciding
      to do that.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      0c59a95d
    • J. Bruce Fields's avatar
      CacheFiles: name i_mutex lock class explicitly · 6bd5e82b
      J. Bruce Fields authored
      Just some cleanup.
      
      (And note the caller of this function may, for example, call vfs_unlink
      on a child, so the "1" (I_MUTEX_PARENT) really was what was intended
      here.)
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      6bd5e82b
    • Sebastian Andrzej Siewior's avatar
      fs/fscache: remove spin_lock() from the condition in while() · ee8be57b
      Sebastian Andrzej Siewior authored
      The spinlock() within the condition in while() will cause a compile error
      if it is not a function. This is not a problem on mainline but it does not
      look pretty and there is no reason to do it that way.
      That patch writes it a little differently and avoids the double condition.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      ee8be57b
  2. 15 May, 2013 1 commit
    • David Howells's avatar
      Add wait_on_atomic_t() and wake_up_atomic_t() · cb65537e
      David Howells authored
      Add wait_on_atomic_t() and wake_up_atomic_t() to indicate became-zero events on
      atomic_t types.  This uses the bit-wake waitqueue table.  The key is set to a
      value outside of the number of bits in a long so that wait_on_bit() won't be
      woken up accidentally.
      
      What I'm using this for is: in a following patch I add a counter to struct
      fscache_cookie to count the number of outstanding operations that need access
      to netfs data.  The way this works is:
      
       (1) When a cookie is allocated, the counter is initialised to 1.
      
       (2) When an operation wants to access netfs data, it calls atomic_inc_unless()
           to increment the counter before it does so.  If it was 0, then the counter
           isn't incremented, the operation isn't permitted to access the netfs data
           (which might by this point no longer exist) and the operation aborts in
           some appropriate manner.
      
       (3) When an operation finishes with the netfs data, it decrements the counter
           and if it reaches 0, calls wake_up_atomic_t() on it - the assumption being
           that it was the last blocker.
      
       (4) When a cookie is released, the counter is decremented and the releaser
           uses wait_on_atomic_t() to wait for the counter to become 0 - which should
           indicate no one is using the netfs data any longer.  The netfs data can
           then be destroyed.
      
      There are some alternatives that I have thought of and that have been suggested
      by Tejun Heo:
      
       (A) Using wait_on_bit() to wait on a bit in the counter.  This doesn't work
           because if that bit happens to be 0 then the wait won't happen - even if
           the counter is non-zero.
      
       (B) Using wait_on_bit() to wait on a flag elsewhere which is cleared when the
           counter reaches 0.  Such a flag would be redundant and would add
           complexity.
      
       (C) Adding a waitqueue to fscache_cookie - this would expand that struct by
           several words for an event that happens just once in each cookie's
           lifetime.  Further, cookies are generally per-file so there are likely to
           be a lot of them.
      
       (D) Similar to (C), but add a pointer to a waitqueue in the cookie instead of
           a waitqueue.  This would add single word per cookie and so would be less
           of an expansion - but still an expansion.
      
       (E) Adding a static waitqueue to the fscache module.  Generally this would be
           fine, but under certain circumstances many cookies will all get added at
           the same time (eg. NFS umount, cache withdrawal) thereby presenting
           scaling issues.  Note that the wait may be significant as disk I/O may be
           in progress.
      
      So, I think reusing the wait_on_bit() waitqueue set is reasonable.  I don't
      make much use of the waitqueue I need on a per-cookie basis, but sometimes I
      have a huge flood of the cookies to deal with.
      
      I also don't want to add a whole new set of global waitqueue tables
      specifically for the dec-to-0 event if I can reuse the bit tables.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-By: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      cb65537e
  3. 14 May, 2013 25 commits
  4. 13 May, 2013 4 commits
    • Linus Torvalds's avatar
      Merge branch 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · c83bb885
      Linus Torvalds authored
      Pull parisc update from Helge Deller:
       "The second round of parisc updates for 3.10 includes build fixes and
        enhancements to utilize irq stacks, fixes SMP races when updating PTE
        and TLB entries by proper locking and makes the search for the correct
        cross compiler more robust on Debian and Gentoo."
      
      * 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: make default cross compiler search more robust (v3)
        parisc: fix SMP races when updating PTE and TLB entries in entry.S
        parisc: implement irq stacks - part 2 (v2)
      c83bb885
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · dbbffe68
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Several small bug fixes all over:
      
         1) be2net driver uses wrong payload length when submitting MAC list
            get requests to the chip.  From Sathya Perla.
      
         2) Fix mwifiex memory leak on driver unload, from Amitkumar Karwar.
      
         3) Prevent random memory access in batman-adv, from Marek Lindner.
      
         4) batman-adv doesn't check for pskb_trim_rcsum() errors, also from
            Marek Lindner.
      
         5) Fix fec crashes on rapid link up/down, from Frank Li.
      
         6) Fix inner protocol grovelling in GSO, from Pravin B Shelar.
      
         7) Link event validation fix in qlcnic from Rajesh Borundia.
      
         8) Not all FEC chips can support checksum offload, fix from Shawn
            Guo.
      
         9) EXPORT_SYMBOL + inline doesn't make any sense, from Denis Efremov.
      
        10) Fix race in passthru mode during device removal in macvlan, from
            Jiri Pirko.
      
        11) Fix RCU hash table lookup socket state race in ipv6, leading to
            NULL pointer derefs, from Eric Dumazet.
      
        12) Add several missing HAS_DMA kconfig dependencies, from Geert
            Uyttterhoeven.
      
        13) Fix bogus PCI resource management in 3c59x driver, from Sergei
            Shtylyov.
      
        14) Fix info leak in ipv6 GRE tunnel driver, from Amerigo Wang.
      
        15) Fix device leak in ipv6 IPSEC policy layer, from Cong Wang.
      
        16) DMA mapping leak fix in qlge from Thadeu Lima de Souza Cascardo.
      
        17) Missing iounmap on probe failure in bna driver, from Wei Yongjun."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
        bna: add missing iounmap() on error in bnad_init()
        qlge: fix dma map leak when the last chunk is not allocated
        xfrm6: release dev before returning error
        ipv6,gre: do not leak info to user-space
        virtio_net: use default napi weight by default
        emac: Fix EMAC soft reset on 460EX/GT
        3c59x: fix PCI resource management
        caif: CAIF_VIRTIO should depend on HAS_DMA
        net/ethernet: MACB should depend on HAS_DMA
        net/ethernet: ARM_AT91_ETHER should depend on HAS_DMA
        net/wireless: ATH9K should depend on HAS_DMA
        net/ethernet: STMMAC_ETH should depend on HAS_DMA
        net/ethernet: NET_CALXEDA_XGMAC should depend on HAS_DMA
        ipv6: do not clear pinet6 field
        macvlan: fix passthru mode race between dev removal and rx path
        ipv4: ip_output: remove inline marking of EXPORT_SYMBOL functions
        net/mlx4: Strengthen VLAN tags/priorities enforcement in VST mode
        net/mlx4_core: Add missing report on VST and spoof-checking dev caps
        net: fec: enable hardware checksum only on imx6q-fec
        qlcnic: Fix validation of link event command.
        ...
      dbbffe68
    • Helge Deller's avatar
      parisc: make default cross compiler search more robust (v3) · 6880b015
      Helge Deller authored
      People/distros vary how they prefix the toolchain name for 64bit builds.
      Rather than enforce one convention over another, add a for loop which
      does a search for all the general prefixes.
      
      For 64bit builds, we now search for (in order):
      	hppa64-unknown-linux-gnu
      	hppa64-linux-gnu
      	hppa64-linux
      
      For 32bit builds, we look for:
      	hppa-unknown-linux-gnu
      	hppa-linux-gnu
      	hppa-linux
      	hppa2.0-unknown-linux-gnu
      	hppa2.0-linux-gnu
      	hppa2.0-linux
      	hppa1.1-unknown-linux-gnu
      	hppa1.1-linux-gnu
      	hppa1.1-linux
      
      This patch was initiated by Mike Frysinger, with feedback from Jeroen
      Roovers, John David Anglin and Helge Deller.
      Signed-off-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarJeroen Roovers <jer@gentoo.org>
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      6880b015
    • Wei Yongjun's avatar
      bna: add missing iounmap() on error in bnad_init() · ba21fc69
      Wei Yongjun authored
      Add the missing iounmap() before return from bnad_init()
      in the error handling case.
      Introduced by commit 01b54b14
      (bna: tx rx cleanup fix).
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba21fc69