1. 23 May, 2018 1 commit
    • Mathieu Malaterre's avatar
      workqueue: move function definitions within CONFIG_SMP block · 66448bc2
      Mathieu Malaterre authored
      In commit 7ee681b2 ("workqueue: Convert to state machine callbacks"),
      three new function definitions were added: ‘workqueue_prepare_cpu’,
      ‘workqueue_online_cpu’ and ‘workqueue_offline_cpu’.
      
      Move these function definitions within a CONFIG_SMP block since they are
      not used outside of it. This will match function declarations in header
      <include/linux/workqueue.h>, and silence the following gcc warning (W=1):
      
        kernel/workqueue.c:4743:5: warning: no previous prototype for ‘workqueue_prepare_cpu’ [-Wmissing-prototypes]
        kernel/workqueue.c:4756:5: warning: no previous prototype for ‘workqueue_online_cpu’ [-Wmissing-prototypes]
        kernel/workqueue.c:4783:5: warning: no previous prototype for ‘workqueue_offline_cpu’ [-Wmissing-prototypes]
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      66448bc2
  2. 21 May, 2018 1 commit
  3. 18 May, 2018 5 commits
    • Tejun Heo's avatar
      workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status} · 6b59808b
      Tejun Heo authored
      There can be a lot of workqueue workers and they all show up with the
      cryptic kworker/* names making it difficult to understand which is
      doing what and how they came to be.
      
        # ps -ef | grep kworker
        root           4       2  0 Feb25 ?        00:00:00 [kworker/0:0H]
        root           6       2  0 Feb25 ?        00:00:00 [kworker/u112:0]
        root          19       2  0 Feb25 ?        00:00:00 [kworker/1:0H]
        root          25       2  0 Feb25 ?        00:00:00 [kworker/2:0H]
        root          31       2  0 Feb25 ?        00:00:00 [kworker/3:0H]
        ...
      
      This patch makes workqueue workers report the latest workqueue it was
      executing for through /proc/PID/{comm,stat,status}.  The extra
      information is appended to the kthread name with intervening '+' if
      currently executing, otherwise '-'.
      
        # cat /proc/25/comm
        kworker/2:0-events_power_efficient
        # cat /proc/25/stat
        25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
        # grep Name /proc/25/status
        Name:   kworker/2:0-events_power_efficient
      
      Unfortunately, ps(1) truncates comm to 15 characters,
      
        # ps 25
          PID TTY      STAT   TIME COMMAND
           25 ?        I      0:00 [kworker/2:0-eve]
      
      making it a lot less useful; however, this should be an easy fix from
      ps(1) side.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Craig Small <csmall@enc.com.au>
      6b59808b
    • Tejun Heo's avatar
      proc: Consolidate task->comm formatting into proc_task_name() · 88b72b31
      Tejun Heo authored
      proc shows task->comm in three places - comm, stat, status - and each
      is fetching and formatting task->comm slighly differently.  This patch
      renames task_name() to proc_task_name(), makes it more generic, and
      updates all three paths to use it.
      
      This will enable expanding comm reporting for workqueue workers.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      88b72b31
    • Tejun Heo's avatar
      workqueue: Set worker->desc to workqueue name by default · 8bf89593
      Tejun Heo authored
      Work functions can use set_worker_desc() to improve the visibility of
      what the worker task is doing.  Currently, the desc field is unset at
      the beginning of each execution and there is a separate field to track
      the field is set during the current execution.
      
      Instead of leaving empty till desc is set, worker->desc can be used to
      remember the last workqueue the worker worked on by default and users
      that use set_worker_desc() can override it to something more
      informative as necessary.
      
      This simplifies desc handling and helps tracking the last workqueue
      that the worker exected on to improve visibility.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      8bf89593
    • Tejun Heo's avatar
      workqueue: Make worker_attach/detach_pool() update worker->pool · a2d812a2
      Tejun Heo authored
      For historical reasons, the worker attach/detach functions don't
      currently manage worker->pool and the callers are manually and
      inconsistently updating it.
      
      This patch moves worker->pool updates into the worker attach/detach
      functions.  This makes worker->pool consistent and clearly defines how
      worker->pool updates are synchronized.
      
      This will help later workqueue visibility improvements by allowing
      safe access to workqueue information from worker->task.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a2d812a2
    • Tejun Heo's avatar
      workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex · 1258fae7
      Tejun Heo authored
      To improve workqueue visibility, we want to be able to access
      workqueue information from worker tasks.  The per-pool attach mutex
      makes that difficult because there's no way of stabilizing task ->
      worker pool association without knowing the pool first.
      
      Worker attach/detach is a slow path and there's no need for different
      pools to be able to perform them concurrently.  This patch replaces
      the per-pool attach_mutex with global wq_pool_attach_mutex to prepare
      for visibility improvement changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      1258fae7
  4. 16 May, 2018 3 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · e6506eb2
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Some of the ftrace internal events use a zero for a data size of a
        field event. This is increasingly important for the histogram trigger
        work that is being extended.
      
        While auditing trace events, I found that a couple of the xen events
        were used as just marking that a function was called, by creating a
        static array of size zero. This can play havoc with the tracing
        features if these events are used, because a zero size of a static
        array is denoted as a special nul terminated dynamic array (this is
        what the trace_marker code uses). But since the xen events have no
        size, they are not nul terminated, and unexpected results may occur.
      
        As trace events were never intended on being a marker to denote that a
        function was hit or not, especially since function tracing and kprobes
        can trivially do the same, the best course of action is to simply
        remove these events"
      
      * tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
      e6506eb2
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.17-rc5-vsprintf' of... · 9d38cd06
      Linus Torvalds authored
      Merge tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull memory barrier for from Steven Rostedt:
       "The memory barrier usage in updating the random ptr hash for %p in
        vsprintf is incorrect.
      
        Instead of adding the read memory barrier into vsprintf() which will
        cause a slight degradation to a commonly used function in the kernel
        just to solve a very unlikely race condition that can only happen at
        boot up, change the code from using a variable branch to a
        static_branch.
      
        Not only does this solve the race condition, it actually will improve
        the performance of vsprintf() by removing the conditional branch that
        is only needed at boot"
      
      * tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        vsprintf: Replace memory barrier with static_key for random_ptr_key update
      9d38cd06
    • Steven Rostedt (VMware)'s avatar
      vsprintf: Replace memory barrier with static_key for random_ptr_key update · 85f4f12d
      Steven Rostedt (VMware) authored
      Reviewing Tobin's patches for getting pointers out early before
      entropy has been established, I noticed that there's a lone smp_mb() in
      the code. As with most lone memory barriers, this one appears to be
      incorrectly used.
      
      We currently basically have this:
      
      	get_random_bytes(&ptr_key, sizeof(ptr_key));
      	/*
      	 * have_filled_random_ptr_key==true is dependent on get_random_bytes().
      	 * ptr_to_id() needs to see have_filled_random_ptr_key==true
      	 * after get_random_bytes() returns.
      	 */
      	smp_mb();
      	WRITE_ONCE(have_filled_random_ptr_key, true);
      
      And later we have:
      
      	if (unlikely(!have_filled_random_ptr_key))
      		return string(buf, end, "(ptrval)", spec);
      
      /* Missing memory barrier here. */
      
      	hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);
      
      As the CPU can perform speculative loads, we could have a situation
      with the following:
      
      	CPU0				CPU1
      	----				----
      				   load ptr_key = 0
         store ptr_key = random
         smp_mb()
         store have_filled_random_ptr_key
      
      				   load have_filled_random_ptr_key = true
      
      				    BAD BAD BAD! (you're so bad!)
      
      Because nothing prevents CPU1 from loading ptr_key before loading
      have_filled_random_ptr_key.
      
      But this race is very unlikely, but we can't keep an incorrect smp_mb() in
      place. Instead, replace the have_filled_random_ptr_key with a static_branch
      not_filled_random_ptr_key, that is initialized to true and changed to false
      when we get enough entropy. If the update happens in early boot, the
      static_key is updated immediately, otherwise it will have to wait till
      entropy is filled and this happens in an interrupt handler which can't
      enable a static_key, as that requires a preemptible context. In that case, a
      work_queue is used to enable it, as entropy already took too long to
      establish in the first place waiting a little more shouldn't hurt anything.
      
      The benefit of using the static key is that the unlikely branch in
      vsprintf() now becomes a nop.
      
      Link: http://lkml.kernel.org/r/20180515100558.21df515e@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      85f4f12d
  5. 15 May, 2018 4 commits
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 21b9f1c7
      Linus Torvalds authored
      Pull AFS fixes from David Howells:
       "Here's a set of patches that fix a number of bugs in the in-kernel AFS
        client, including:
      
         - Fix directory locking to not use individual page locks for
           directory reading/scanning but rather to use a semaphore on the
           afs_vnode struct as the directory contents must be read in a single
           blob and data from different reads must not be mixed as the entire
           contents may be shuffled about between reads.
      
         - Fix address list parsing to handle port specifiers correctly.
      
         - Only give up callback records on a server if we actually talked to
           that server (we might not be able to access a server).
      
         - Fix some callback handling bugs, including refcounting,
           whole-volume callbacks and when callbacks actually get broken in
           response to a CB.CallBack op.
      
         - Fix some server/address rotation bugs, including giving up if we
           can't probe a server; giving up if a server says it doesn't have a
           volume, but there are more servers to try.
      
         - Fix the decoding of fetched statuses to be OpenAFS compatible.
      
         - Fix the handling of server lookups in Cache Manager ops (such as
           CB.InitCallBackState3) to use a UUID if possible and to handle no
           server being found.
      
         - Fix a bug in server lookup where not all addresses are compared.
      
         - Fix the non-encryption of calls that prevents some servers from
           being accessed (this also requires an AF_RXRPC patch that has
           already gone in through the net tree).
      
        There's also a patch that adds tracepoints to log Cache Manager ops
        that don't find a matching server, either by UUID or by address"
      
      * tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix the non-encryption of calls
        afs: Fix CB.CallBack handling
        afs: Fix whole-volume callback handling
        afs: Fix afs_find_server search loop
        afs: Fix the handling of an unfound server in CM operations
        afs: Add a tracepoint to record callbacks from unlisted servers
        afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
        afs: Fix VNOVOL handling in address rotation
        afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
        afs: Fix server rotation's handling of fileserver probe failure
        afs: Fix refcounting in callback registration
        afs: Fix giving up callbacks on server destruction
        afs: Fix address list parsing
        afs: Fix directory page locking
      21b9f1c7
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · eeba2dfa
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two small driver fixes: aacraid to fix an unknown IU type on task
        management functions which causes a firmware fault and vmw_pvscsi to
        change a return code to retry the operation instead of causing an
        immediate error"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: aacraid: Correct hba_send to include iu_type
        scsi: vmw-pvscsi: return DID_BUS_BUSY for adapter-initated aborts
      eeba2dfa
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux · ee4b65c2
      Linus Torvalds authored
      Pull drm fix from Dave Airlie:
       "This fixes the mmap regression reported to me on irc by an i686 kernel
        user today, he's tested the fix works, and I've audited all the drm
        drivers for the bad mmap usage and since we use the mmap offset as a
        lookup in a table we aren't inclined to have anything bad in there"
      
      [ See commit be83bbf8 ("mmap: introduce sane default mmap limits")
        for details and the note on why the GPU drivers were expected to be a
        special case.    - Linus ]
      
      * tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux:
        drm: set FMODE_UNSIGNED_OFFSET for drm files
      ee4b65c2
    • Dave Airlie's avatar
      drm: set FMODE_UNSIGNED_OFFSET for drm files · 76ef6b28
      Dave Airlie authored
      Since we have the ttm and gem vma managers using a subset
      of the file address space for objects, and these start at
      0x100000000 they will overflow the new mmap checks.
      
      I've checked all the mmap routines I could see for any
      bad behaviour but overall most people use GEM/TTM VMA
      managers even the legacy drivers have a hashtable.
      
      Reported-and-Tested-by: Arthur Marsh (amarsh04 on #radeon)
      Fixes: be83bbf8 (mmap: introduce sane default mmap limits)
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      76ef6b28
  6. 14 May, 2018 15 commits
    • Steven Rostedt (VMware)'s avatar
      tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all} · 45dd9b06
      Steven Rostedt (VMware) authored
      Doing an audit of trace events, I discovered two trace events in the xen
      subsystem that use a hack to create zero data size trace events. This is not
      what trace events are for. Trace events add memory footprint overhead, and
      if all you need to do is see if a function is hit or not, simply make that
      function noinline and use function tracer filtering.
      
      Worse yet, the hack used was:
      
       __array(char, x, 0)
      
      Which creates a static string of zero in length. There's assumptions about
      such constructs in ftrace that this is a dynamic string that is nul
      terminated. This is not the case with these tracepoints and can cause
      problems in various parts of ftrace.
      
      Nuke the trace events!
      
      Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: 95a7d768 ("xen/mmu: Use Xen specific TLB flush instead of the generic one.")
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      45dd9b06
    • David Howells's avatar
      afs: Fix the non-encryption of calls · 4776cab4
      David Howells authored
      Some AFS servers refuse to accept unencrypted traffic, so can't be accessed
      with kAFS.  Set the AF_RXRPC security level to encrypt client calls to deal
      with this.
      
      Note that incoming service calls are set by the remote client and so aren't
      affected by this.
      
      This requires an AF_RXRPC patch to pass the value set by setsockopt to calls
      begun by the kernel.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4776cab4
    • David Howells's avatar
      afs: Fix CB.CallBack handling · 428edade
      David Howells authored
      The handling of CB.CallBack messages sent by the fileserver to the client
      is broken in that they are currently being processed after the reply has
      been transmitted.
      
      This is not what the fileserver expects, however.  It holds up change
      visibility until the reply comes so as to maintain cache coherency, and so
      expects the client to have to refetch the state on the affected files.
      
      Fix CB.CallBack handling to perform the callback break before sending the
      reply.
      
      The fileserver is free to hold up status fetches issued by other threads on
      the same client that occur in reponse to the callback until any pending
      changes have been committed.
      
      Fixes: d001648e ("rxrpc: Don't expose skbs to in-kernel users [ver #2]")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      428edade
    • David Howells's avatar
      afs: Fix whole-volume callback handling · 68251f0a
      David Howells authored
      It's possible for an AFS file server to issue a whole-volume notification
      that callbacks on all the vnodes in the file have been broken.  This is
      done for R/O and backup volumes (which don't have per-file callbacks) and
      for things like a volume being taken offline.
      
      Fix callback handling to detect whole-volume notifications, to track it
      across operations and to check it during inode validation.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      68251f0a
    • Marc Dionne's avatar
      afs: Fix afs_find_server search loop · f9c1bba3
      Marc Dionne authored
      The code that looks up servers by addresses makes the assumption
      that the list of addresses for a server is sorted.  It exits the
      loop if it finds that the target address is larger than the
      current candidate.  As the list is not currently sorted, this
      can lead to a failure to find a matching server, which can cause
      callbacks from that server to be ignored.
      
      Remove the early exit case so that the complete list is searched.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f9c1bba3
    • David Howells's avatar
      afs: Fix the handling of an unfound server in CM operations · a86b06d1
      David Howells authored
      If the client cache manager operations that need the server record
      (CB.Callback, CB.InitCallBackState, and CB.InitCallBackState3) can't find
      the server record, they abort the call from the file server with
      RX_CALL_DEAD when they should return okay.
      
      Fixes: c35eccb1 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a86b06d1
    • David Howells's avatar
      afs: Add a tracepoint to record callbacks from unlisted servers · 3709a399
      David Howells authored
      Add a tracepoint to record callbacks from servers for which we don't have a
      record.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3709a399
    • David Howells's avatar
      afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID · 001ab5a6
      David Howells authored
      Fix the handling of the CB.InitCallBackState3 service call to find the
      record of a server that we're using by looking it up by the UUID passed as
      the parameter rather than by its address (of which it might have many, and
      which may change).
      
      Fixes: c35eccb1 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      001ab5a6
    • David Howells's avatar
      afs: Fix VNOVOL handling in address rotation · 3d9fa911
      David Howells authored
      If a volume location record lists multiple file servers for a volume, then
      it's possible that due to a misconfiguration or a changing configuration
      that one of the file servers doesn't know about it yet and will abort
      VNOVOL.  Currently, the rotation algorithm will stop with EREMOTEIO.
      
      Fix this by moving on to try the next server if VNOVOL is returned.  Once
      all the servers have been tried and the record rechecked, the algorithm
      will stop with EREMOTEIO or ENOMEDIUM.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3d9fa911
    • David Howells's avatar
      afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility · 684b0f68
      David Howells authored
      The OpenAFS server's RXAFS_InlineBulkStatus implementation has a bug
      whereby if an error occurs on one of the vnodes being queried, then the
      errorCode field is set correctly in the corresponding status, but the
      interfaceVersion field is left unset.
      
      Fix kAFS to deal with this by evaluating the AFSFetchStatus blob against
      the following cases when called from FS.InlineBulkStatus delivery:
      
       (1) If InterfaceVersion == 0 then:
      
           (a) If errorCode != 0 then it indicates the abort code for the
               corresponding vnode.
      
           (b) If errorCode == 0 then the status record is invalid.
      
       (2) If InterfaceVersion == 1 then:
      
           (a) If errorCode != 0 then it indicates the abort code for the
               corresponding vnode.
      
           (b) If errorCode == 0 then the status record is valid and can be
           	 parsed.
      
       (3) If InterfaceVersion is anything else then the status record is
           invalid.
      
      Fixes: dd9fbcb8 ("afs: Rearrange status mapping")
      Reported-by: default avatarJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      684b0f68
    • David Howells's avatar
      afs: Fix server rotation's handling of fileserver probe failure · ec5a3b4b
      David Howells authored
      The server rotation algorithm just gives up if it fails to probe a
      fileserver.  Fix this by rotating to the next fileserver instead.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      ec5a3b4b
    • David Howells's avatar
      afs: Fix refcounting in callback registration · d4a96bec
      David Howells authored
      The refcounting on afs_cb_interest struct objects in
      afs_register_server_cb_interest() is wrong as it uses the server list
      entry's call back interest pointer without regard for the fact that it
      might be replaced at any time and the object thrown away.
      
      Fix this by:
      
       (1) Put a lock on the afs_server_list struct that can be used to
           mediate access to the callback interest pointers in the servers array.
      
       (2) Keep a ref on the callback interest that we get from the entry.
      
       (3) Dropping the old reference held by vnode->cb_interest if we replace
           the pointer.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d4a96bec
    • David Howells's avatar
      afs: Fix giving up callbacks on server destruction · f2686b09
      David Howells authored
      When a server record is destroyed, we want to send a message to the server
      telling it that we're giving up all the callbacks it has promised us.
      
      Apply two fixes to this:
      
       (1) Only send the FS.GiveUpAllCallBacks message if we actually got a
           callback from that server.  We assume this to be the case if we
           performed at least one successful FS operation on that server.
      
       (2) Send it to the address last used for that server rather than always
           picking the first address in the list (which might be unreachable).
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f2686b09
    • David Howells's avatar
      afs: Fix address list parsing · 01fd79e6
      David Howells authored
      The parsing of port specifiers in the address list obtained from the DNS
      resolution upcall doesn't work as in4_pton() and in6_pton() will fail on
      encountering an unexpected delimiter (in this case, the '+' marking the
      port number).  However, in*_pton() can't be given multiple specifiers.
      
      Fix this by finding the delimiter in advance and not relying on in*_pton()
      to find the end of the address for us.
      
      Fixes: 8b2a464c ("afs: Add an address list concept")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      01fd79e6
    • David Howells's avatar
      afs: Fix directory page locking · b61f7dcf
      David Howells authored
      The afs directory loading code (primarily afs_read_dir()) locks all the
      pages that hold a directory's content blob to defend against
      getdents/getdents races and getdents/lookup races where the competitors
      issue conflicting reads on the same data.  As the reads will complete
      consecutively, they may retrieve different versions of the data and
      one may overwrite the data that the other is busy parsing.
      
      Fix this by not locking the pages at all, but rather by turning the
      validation lock into an rwsem and getting an exclusive lock on it whilst
      reading the data or validating the attributes and a shared lock whilst
      parsing the data.  Sharing the attribute validation lock should be fine as
      the data fetch will retrieve the attributes also.
      
      The individual page locks aren't needed at all as the only place they're
      being used is to serialise data loading.
      
      Without this patch, the:
      
       	if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
      		...
      	}
      
      part of afs_read_dir() may be skipped, leaving the pages unlocked when we
      hit the success: clause - in which case we try to unlock the not-locked
      pages, leading to the following oops:
      
        page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
        flags: 0xfffe000001004(referenced|private)
        raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
        raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
        page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
        page->mem_cgroup:ffff98156b27c000
        ------------[ cut here ]------------
        kernel BUG at mm/filemap.c:1205!
        ...
        RIP: 0010:unlock_page+0x43/0x50
        ...
        Call Trace:
         afs_dir_iterate+0x789/0x8f0 [kafs]
         ? _cond_resched+0x15/0x30
         ? kmem_cache_alloc_trace+0x166/0x1d0
         ? afs_do_lookup+0x69/0x490 [kafs]
         ? afs_do_lookup+0x101/0x490 [kafs]
         ? key_default_cmp+0x20/0x20
         ? request_key+0x3c/0x80
         ? afs_lookup+0xf1/0x340 [kafs]
         ? __lookup_slow+0x97/0x150
         ? lookup_slow+0x35/0x50
         ? walk_component+0x1bf/0x490
         ? path_lookupat.isra.52+0x75/0x200
         ? filename_lookup.part.66+0xa0/0x170
         ? afs_end_vnode_operation+0x41/0x60 [kafs]
         ? __check_object_size+0x9c/0x171
         ? strncpy_from_user+0x4a/0x170
         ? vfs_statx+0x73/0xe0
         ? __do_sys_newlstat+0x39/0x70
         ? __x64_sys_getdents+0xc9/0x140
         ? __x64_sys_getdents+0x140/0x140
         ? do_syscall_64+0x5b/0x160
         ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: f3ddee8d ("afs: Fix directory handling")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b61f7dcf
  7. 13 May, 2018 6 commits
    • Linus Torvalds's avatar
      Linux 4.17-rc5 · 67b8d5c7
      Linus Torvalds authored
      67b8d5c7
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 66e1c94d
      Linus Torvalds authored
      Pull x86/pti updates from Thomas Gleixner:
       "A mixed bag of fixes and updates for the ghosts which are hunting us.
      
        The scheduler fixes have been pulled into that branch to avoid
        conflicts.
      
         - A set of fixes to address a khread_parkme() race which caused lost
           wakeups and loss of state.
      
         - A deadlock fix for stop_machine() solved by moving the wakeups
           outside of the stopper_lock held region.
      
         - A set of Spectre V1 array access restrictions. The possible
           problematic spots were discuvered by Dan Carpenters new checks in
           smatch.
      
         - Removal of an unused file which was forgotten when the rest of that
           functionality was removed"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/vdso: Remove unused file
        perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
        perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
        perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
        perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
        perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]
        sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
        sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
        sched/core: Introduce set_special_state()
        kthread, sched/wait: Fix kthread_parkme() completion issue
        kthread, sched/wait: Fix kthread_parkme() wait-loop
        sched/fair: Fix the update of blocked load when newly idle
        stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
      66e1c94d
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 86a4ac43
      Linus Torvalds authored
      Pull scheduler fix from Thomas Gleixner:
       "Revert the new NUMA aware placement approach which turned out to
        create more problems than it solved"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"
      86a4ac43
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · baeda713
      Linus Torvalds authored
      Pull perf tooling fixes from Thomas Gleixner:
       "Another small set of perf tooling fixes and updates:
      
         - Revert "perf pmu: Fix pmu events parsing rule", as it broke Intel
           PT event description parsing (Arnaldo Carvalho de Melo)
      
         - Sync x86's cpufeatures.h and kvm UAPI headers with the kernel
           sources, suppressing the ABI drift warnings (Arnaldo Carvalho de
           Melo)
      
         - Remove duplicated entry for westmereep-dp in Intel's mapfile.csv
           (William Cohen)
      
         - Fix typo in 'perf bench numa' options description (Yisheng Xie)"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "perf pmu: Fix pmu events parsing rule"
        tools headers kvm: Sync ARM UAPI headers with the kernel sources
        tools headers kvm: Sync uapi/linux/kvm.h with the kernel sources
        tools headers: Sync x86 cpufeatures.h with the kernel sources
        perf vendor events intel: Remove duplicated entry for westmereep-dp in mapfile.csv
        perf bench numa: Fix typo in options
      baeda713
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping · 0503fd65
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
       "Just one little fix from Jean to avoid a harmless but very annoying
        warning, especially for the drm code"
      
      * tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: silent unwanted warning "buffer is full"
      0503fd65
    • Linus Torvalds's avatar
      Merge tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6 · ccda3c4b
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Some small SMB3 fixes for 4.17-rc5, some for stable"
      
      * tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: directory sync should not return an error
        cifs: smb2ops: Fix listxattr() when there are no EAs
        cifs: smbd: Enable signing with smbdirect
        cifs: Allocate validate negotiation request through kmalloc
      ccda3c4b
  8. 12 May, 2018 5 commits
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 427fbe89
      Linus Torvalds authored
      Pull thermal fixes from Zhang Rui:
      
       - fix NULL pointer dereference on module load/probe for int3403_thermal
         driver
      
       - fix an emergency shutdown issue on exynos thermal driver
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
        thermal: exynos: Propagate error value from tmu_read()
        thermal: exynos: Reading temperature makes sense only when TMU is turned on
        thermal: int3403_thermal: Fix NULL pointer deref on module load / probe
      427fbe89
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180511' of git://git.kernel.dk/linux-block · 0d4cafd1
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Just a few NVMe fixes this round - one fixing a use-after-free, one
        fixes the return value after controller reset, and the last one fixes
        an issue where some drives will spuriously EIO. We should get these
        into 4.17"
      
      * tag 'for-linus-20180511' of git://git.kernel.dk/linux-block:
        nvme: add quirk to force medium priority for SQ creation
        nvme: Fix sync controller reset return
        nvme: fix use-after-free in nvme_free_ns_head
      0d4cafd1
    • Jean Delvare's avatar
      swiotlb: silent unwanted warning "buffer is full" · 05e13bb5
      Jean Delvare authored
      If DMA_ATTR_NO_WARN is passed to swiotlb_alloc_buffer(), it should be
      passed further down to swiotlb_tbl_map_single(). Otherwise we escape
      half of the warnings but still log the other half.
      
      This is one of the multiple causes of spurious warnings reported at:
      https://bugs.freedesktop.org/show_bug.cgi?id=104082Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: 0176adb0 ("swiotlb: refactor coherent buffer allocation")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Michel Dänzer <michel@daenzer.net>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: stable@vger.kernel.org # v4.16
      05e13bb5
    • Mel Gorman's avatar
      Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()" · 789ba280
      Mel Gorman authored
      This reverts commit 7347fc87.
      
      Srikar Dronamra pointed out that while the commit in question did show
      a performance improvement on ppc64, it did so at the cost of disabling
      active CPU migration by automatic NUMA balancing which was not the intent.
      The issue was that a serious flaw in the logic failed to ever active balance
      if SD_WAKE_AFFINE was disabled on scheduler domains. Even when it's enabled,
      the logic is still bizarre and against the original intent.
      
      Investigation showed that fixing the patch in either the way he suggested,
      using the correct comparison for jiffies values or introducing a new
      numa_migrate_deferred variable in task_struct all perform similarly to a
      revert with a mix of gains and losses depending on the workload, machine
      and socket count.
      
      The original intent of the commit was to handle a problem whereby
      wake_affine, idle balancing and automatic NUMA balancing disagree on the
      appropriate placement for a task. This was particularly true for cases where
      a single task was a massive waker of tasks but where wake_wide logic did
      not apply.  This was particularly noticeable when a futex (a barrier) woke
      all worker threads and tried pulling the wakees to the waker nodes. In that
      specific case, it could be handled by tuning MPI or openMP appropriately,
      but the behavior is not illogical and was worth attempting to fix. However,
      the approach was wrong. Given that we're at rc4 and a fix is not obvious,
      it's better to play safe, revert this commit and retry later.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: efault@gmx.de
      Cc: ggherdovich@suse.cz
      Cc: hpa@zytor.com
      Cc: matt@codeblueprint.co.uk
      Cc: mpe@ellerman.id.au
      Link: http://lkml.kernel.org/r/20180509163115.6fnnyeg4vdm2ct4v@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      789ba280
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · f0ab773f
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        rbtree: include rcu.h
        scripts/faddr2line: fix error when addr2line output contains discriminator
        ocfs2: take inode cluster lock before moving reflinked inode from orphan dir
        mm, oom: fix concurrent munlock and oom reaper unmap, v3
        mm: migrate: fix double call of radix_tree_replace_slot()
        proc/kcore: don't bounds check against address 0
        mm: don't show nr_indirectly_reclaimable in /proc/vmstat
        mm: sections are not offlined during memory hotremove
        z3fold: fix reclaim lock-ups
        init: fix false positives in W+X checking
        lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()
        KASAN: prohibit KASAN+STRUCTLEAK combination
        MAINTAINERS: update Shuah's email address
      f0ab773f