An error occurred fetching the project authors.
  1. 04 Nov, 2019 1 commit
  2. 10 Oct, 2019 1 commit
    • Brad Fitzpatrick's avatar
      all: remove nacl (part 3, more amd64p32) · 03ef105d
      Brad Fitzpatrick authored
      Part 1: CL 199499 (GOOS nacl)
      Part 2: CL 200077 (amd64p32 files, toolchain)
      Part 3: stuff that arguably should've been part of Part 2, but I forgot
              one of my grep patterns when splitting the original CL up into
              two parts.
      
      This one might also have interesting stuff to resurrect for any future
      x32 ABI support.
      
      Updates #30439
      
      Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
      Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      03ef105d
  3. 04 Sep, 2019 1 commit
    • Michael Anthony Knyszek's avatar
      runtime: don't hold worldsema across mark phase · 7b294cdd
      Michael Anthony Knyszek authored
      This change makes it so that worldsema isn't held across the mark phase.
      This means that various operations like ReadMemStats may now stop the
      world during the mark phase, reducing latency on such operations.
      
      Only three such operations are still no longer allowed to occur during
      marking: GOMAXPROCS, StartTrace, and StopTrace.
      
      For the former it's because any change to GOMAXPROCS impacts GC mark
      background worker scheduling and the details there are tricky.
      
      For the latter two it's because tracing needs to observe consistent GC
      start and GC end events, and if StartTrace or StopTrace may stop the
      world during marking, then it's possible for it to see a GC end event
      without a start or GC start event without an end, respectively.
      
      To ensure that GOMAXPROCS and StartTrace/StopTrace cannot proceed until
      marking is complete, the runtime now holds a new semaphore, gcsema,
      across the mark phase just like it used to with worldsema.
      
      Fixes #19812.
      
      Change-Id: I15d43ed184f711b3d104e8f267fb86e335f86bf9
      Reviewed-on: https://go-review.googlesource.com/c/go/+/182657
      Run-TryBot: Michael Knyszek <mknyszek@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      7b294cdd
  4. 22 Aug, 2018 2 commits
  5. 03 May, 2018 1 commit
    • Josh Bleecher Snyder's avatar
      runtime: convert g.waitreason from string to uint8 · 4d7cf3fe
      Josh Bleecher Snyder authored
      Every time I poke at #14921, the g.waitreason string
      pointer writes show up.
      
      They're not particularly important performance-wise,
      but it'd be nice to clear the noise away.
      
      And it does open up a few extra bytes in the g struct
      for some future use.
      
      This is a re-roll of CL 99078, which was rolled
      back because of failures on s390x.
      Those failures were apparently due to an old version of gdb.
      
      Change-Id: Icc2c12f449b2934063fd61e272e06237625ed589
      Reviewed-on: https://go-review.googlesource.com/111256
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMichael Munday <mike.munday@ibm.com>
      4d7cf3fe
  6. 24 Apr, 2018 1 commit
    • Hana Kim's avatar
      runtime/trace: rename "Span" with "Region" · c2d10243
      Hana Kim authored
      "Span" is a commonly used term in many distributed tracing systems
      (Dapper, OpenCensus, OpenTracing, ...). They use it to refer to a
      period of time, not necessarily tied into execution of underlying
      processor, thread, or goroutine, unlike the "Span" of runtime/trace
      package.
      
      Since distributed tracing and go runtime execution tracing are
      already similar enough to cause confusion, this CL attempts to avoid
      using the same word if possible.
      
      "Region" is being used in a certain tracing system to refer to a code
      region which is pretty close to what runtime/trace.Span currently
      refers to. So, replace that.
      https://software.intel.com/en-us/itc-user-and-reference-guide-defining-and-recording-functions-or-regions
      
      This CL also tweaks APIs a bit based on jbd and heschi's comments:
      
        NewContext -> NewTask
          and it now returns a Task object that exports End method.
      
        StartSpan -> StartRegion
          and it now returns a Region object that exports End method.
      
      Also, changed WithSpan to WithRegion and it now takes func() with no
      context. Another thought is to get rid of WithRegion. It is a nice
      concept but in practice, it seems problematic (a lot of code churn,
      and polluting stack trace). Already, the tracing concept is very low
      level, and we hope this API to be used with great care.
      
      Recommended usage will be
         defer trace.StartRegion(ctx, "someRegion").End()
      
      Left old APIs untouched in this CL. Once the usage of them are cleaned
      up, they will be removed in a separate CL.
      
      Change-Id: I73880635e437f3aad51314331a035dd1459b9f3a
      Reviewed-on: https://go-review.googlesource.com/108296
      Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarJBD <jbd@google.com>
      c2d10243
  7. 13 Mar, 2018 1 commit
  8. 12 Mar, 2018 1 commit
  9. 15 Feb, 2018 2 commits
    • Hana Kim's avatar
      runtime/trace: implement annotation API · 6977a3b2
      Hana Kim authored
      This implements the annotation API proposed in golang.org/cl/63274.
      
      traceString is updated to protect the string map with trace.stringsLock
      because the assumption that traceString is called by a single goroutine
      (either at the beginning of tracing and at the end of tracing when
      dumping all the symbols and function names) is no longer true.
      
      traceString is used by the annotation apis (NewContext, StartSpan, Log)
      to register frequently appearing strings (task and span names, and log
      keys) after this change.
      
      NewContext -> one or two records (EvString, EvUserTaskCreate)
      end function -> one record (EvUserTaskEnd)
      StartSpan -> one or two records (EvString, EvUserSpan)
      span end function -> one or two records (EvString, EvUserSpan)
      Log -> one or two records (EvString, EvUserLog)
      
      EvUserLog record is of the typical record format written by traceEvent
      except that it is followed by bytes that represents the value string.
      
      In addition to runtime/trace change, this change includes
      corresponding changes in internal/trace to parse the new record types.
      
      Future work to improve efficiency:
        More efficient unique task id generation instead of atomic. (per-P
        counter).
        Instead of a centralized trace.stringsLock, consider using per-P
        string cache or something more efficient.
      
      R=go1.11
      
      Change-Id: Iec9276c6c51e5be441ccd52dec270f1e3b153970
      Reviewed-on: https://go-review.googlesource.com/71690Reviewed-by: default avatarAustin Clements <austin@google.com>
      6977a3b2
    • Hana Kim's avatar
      runtime/trace: user annotation API · 32d1cd33
      Hana Kim authored
      This CL presents the proposed user annotation API skeleton.
      This CL bumps up the trace version to 1.11.
      
      Design doc https://goo.gl/iqJfJ3
      
      Implementation CLs are followed.
      
      The API introduces three basic building blocks. Log, Span, and Task.
      
      Log is for basic logging. When called, the message will be recorded
      to the trace along with timestamp, goroutine id, and stack info.
      
         trace.Log(ctx, messageType message)
      
      Span can be thought as an extension of log to record interesting
      time interval during a goroutine's execution. A span is local to a
      goroutine by definition.
      
         trace.WithSpan(ctx, "doVeryExpensiveOp", func(ctx context) {
            /* do something very expensive */
         })
      
      Task is higher-level concept that aids tracing of complex operations
      that encompass multiple goroutines or are asynchronous.
      For example, an RPC request, a HTTP request, a file write, or a
      batch job can be traced with a Task.
      
      Note we chose to design the API around context.Context so it allows
      easier integration with other tracing tools, often designed around
      context.Context as well. Log and WithSpan APIs recognize the task
      information embedded in the context and record it in the trace as
      well. That allows the Go execution tracer to associate and group
      the spans and log messages based on the task information.
      
      In order to create a Task,
      
         ctx, end := trace.NewContext(ctx, "myTask")
         defer end()
      
      The Go execution tracer measures the time between the task created
      and the task ended for the task latency.
      
      More discussion history in golang.org/cl/59572.
      
      Update #16619
      
      R=go1.11
      
      Change-Id: I59a937048294dafd23a75cf1723c6db461b193cd
      Reviewed-on: https://go-review.googlesource.com/63274Reviewed-by: default avatarAustin Clements <austin@google.com>
      32d1cd33
  10. 31 Oct, 2017 1 commit
    • Hana (Hyang-Ah) Kim's avatar
      runtime/trace: fix corrupted trace during StartTrace · d58f4e9b
      Hana (Hyang-Ah) Kim authored
      Since Go1.8, different types of GC mark workers were annotated and the
      annotation strings were recorded during StartTrace. This change fixes
      two issues around the use of traceString from StartTrace here.
      
      1) "failed to parse trace: no consistent ordering of events possible"
      
      This issue is a result of a missing 'batch' event entry. For efficient
      tracing, tracer maintains system allocated buffers and once a buffer
      is full, it is Flushed out for writing. Moreover, tracing assumes all
      the records in the same buffer (batch) are already ordered and implements
      more optimization in encoding and defers the completing order
      reconstruction till the trace parsing time. Thus, when a Flush happens
      and a new buffer is used, the new buffer should contain an event to
      indicate the start of a new batch. Before this CL, the batch entry was
      written only by traceEvent only when the buffer position is 0 and
      wasn't written when flush occurs during traceString.
      
      This CL fixes it by moving the batch entry write to the traceFlush.
      
      2) crash during tracing due to invalid memory access, or during parsing
      due to duplicate string entries
      
      This issue is a result of memory allocation during traceString calls.
      Execution tracer traces some memory allocation activities. Before this
      CL, traceString took the buffer address (*traceBuf) and mutated the buffer.
      If memory tracing occurs in the meantime from the same P, the allocation
      tracing (traceEvent) will take the same buffer address through the pointer
      to the buffer address (**traceBuf), and mutate the buffer.
      
      As a result, one of the followings can happen:
       - the allocation record is overwritten by the following trace string
         record (data loss)
       - if buffer flush occurs during the allocation tracing, traceString
         will attempt to write the string record to the old buffer and
         eventually causes invalid memory access crash.
       - or flush on the same buffer can occur twice (once from the memory
         allocation, and once from the string record write), and in this case
         the trace can contain the same data twice and the parse will complain
         about duplicate string record entries.
      
      This CL fixes the second issue by making the traceString take
      **traceBuf (*traceBufPtr).
      
      Change-Id: I24f629758625b38e1916fbfc7d7be6ea210586af
      Reviewed-on: https://go-review.googlesource.com/50873
      Run-TryBot: Austin Clements <austin@google.com>
      Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      d58f4e9b
  11. 27 Sep, 2017 2 commits
    • Austin Clements's avatar
      runtime: clean up loops over allp · e900e275
      Austin Clements authored
      allp now has length gomaxprocs, which means none of allp[i] are nil or
      in state _Pdead. This lets replace several different styles of loops
      over allp with normal range loops.
      
      for i := 0; i < gomaxprocs; i++ { ... } loops can simply range over
      allp. Likewise, range loops over allp[:gomaxprocs] can just range over
      allp.
      
      Loops that check for p == nil || p.state == _Pdead don't need to check
      this any more.
      
      Loops that check for p == nil don't have to check this *if* dead Ps
      don't affect them. I checked that all such loops are, in fact,
      unaffected by dead Ps. One loop was potentially affected, which this
      fixes by zeroing p.gcAssistTime in procresize.
      
      Updates #15131.
      
      Change-Id: Ifa1c2a86ed59892eca0610360a75bb613bc6dcee
      Reviewed-on: https://go-review.googlesource.com/45575
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      e900e275
    • Austin Clements's avatar
      runtime: dynamically allocate allp · 84d2c7ea
      Austin Clements authored
      This makes it possible to eliminate the hard cap on GOMAXPROCS.
      
      Updates #15131.
      
      Change-Id: I4c422b340791621584c118a6be1b38e8a44f8b70
      Reviewed-on: https://go-review.googlesource.com/45573
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      84d2c7ea
  12. 12 Sep, 2017 1 commit
    • Aliaksandr Valialkin's avatar
      runtime: improve timers scalability on multi-CPU systems · 76f4fd8a
      Aliaksandr Valialkin authored
      Use per-P timers, so each P may work with its own timers.
      
      This CL improves performance on multi-CPU systems
      in the following cases:
      
      - When serving high number of concurrent connections
        with read/write deadlines set (for instance, highly loaded
        net/http server).
      
      - When using high number of concurrent timers. These timers
        may be implicitly created via context.WithDeadline
        or context.WithTimeout.
      
      Production servers should usually set timeout on connections
      and external requests in order to prevent from resource leakage.
      See https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/
      
      Below are relevant benchmark results for various GOMAXPROCS values
      on linux/amd64:
      
      context package:
      
      name                                     old time/op  new time/op  delta
      WithTimeout/concurrency=40      4.92µs ± 0%  5.17µs ± 1%  +5.07%  (p=0.000 n=9+9)
      WithTimeout/concurrency=4000    6.03µs ± 1%  6.49µs ± 0%  +7.63%  (p=0.000 n=8+10)
      WithTimeout/concurrency=400000  8.58µs ± 7%  9.02µs ± 4%  +5.02%  (p=0.019 n=10+10)
      
      name                                     old time/op  new time/op  delta
      WithTimeout/concurrency=40-2      3.70µs ± 1%  2.78µs ± 4%  -24.90%  (p=0.000 n=8+9)
      WithTimeout/concurrency=4000-2    4.49µs ± 4%  3.67µs ± 5%  -18.26%  (p=0.000 n=10+10)
      WithTimeout/concurrency=400000-2  6.16µs ±10%  5.15µs ±13%  -16.30%  (p=0.000 n=10+10)
      
      name                                     old time/op  new time/op  delta
      WithTimeout/concurrency=40-4      3.58µs ± 1%  2.64µs ± 2%  -26.13%  (p=0.000 n=9+10)
      WithTimeout/concurrency=4000-4    4.17µs ± 0%  3.32µs ± 1%  -20.36%  (p=0.000 n=10+10)
      WithTimeout/concurrency=400000-4  5.57µs ± 9%  4.83µs ±10%  -13.27%  (p=0.001 n=10+10)
      
      time package:
      
      name                     old time/op  new time/op  delta
      AfterFunc                6.15ms ± 3%  6.07ms ± 2%     ~     (p=0.133 n=10+9)
      AfterFunc-2              3.43ms ± 1%  3.56ms ± 1%   +3.91%  (p=0.000 n=10+9)
      AfterFunc-4              5.04ms ± 2%  2.36ms ± 0%  -53.20%  (p=0.000 n=10+9)
      After                    6.54ms ± 2%  6.49ms ± 3%     ~     (p=0.393 n=10+10)
      After-2                  3.68ms ± 1%  3.87ms ± 0%   +5.14%  (p=0.000 n=9+9)
      After-4                  6.66ms ± 1%  2.87ms ± 1%  -56.89%  (p=0.000 n=10+10)
      Stop                      698µs ± 2%   689µs ± 1%   -1.26%  (p=0.011 n=10+10)
      Stop-2                    729µs ± 2%   434µs ± 3%  -40.49%  (p=0.000 n=10+10)
      Stop-4                    837µs ± 3%   333µs ± 2%  -60.20%  (p=0.000 n=10+10)
      SimultaneousAfterFunc     694µs ± 1%   692µs ± 7%     ~     (p=0.481 n=10+10)
      SimultaneousAfterFunc-2   714µs ± 3%   569µs ± 2%  -20.33%  (p=0.000 n=10+10)
      SimultaneousAfterFunc-4   782µs ± 2%   386µs ± 2%  -50.67%  (p=0.000 n=10+10)
      StartStop                 267µs ± 3%   274µs ± 0%   +2.64%  (p=0.000 n=8+9)
      StartStop-2               238µs ± 2%   140µs ± 3%  -40.95%  (p=0.000 n=10+8)
      StartStop-4               320µs ± 1%   125µs ± 1%  -61.02%  (p=0.000 n=9+9)
      Reset                    75.0µs ± 1%  77.5µs ± 2%   +3.38%  (p=0.000 n=10+10)
      Reset-2                   150µs ± 2%    40µs ± 5%  -73.09%  (p=0.000 n=10+9)
      Reset-4                   226µs ± 1%    33µs ± 1%  -85.42%  (p=0.000 n=10+10)
      Sleep                     857µs ± 6%   878µs ± 9%     ~     (p=0.079 n=10+9)
      Sleep-2                   617µs ± 4%   585µs ± 2%   -5.21%  (p=0.000 n=10+10)
      Sleep-4                   689µs ± 3%   465µs ± 4%  -32.53%  (p=0.000 n=10+10)
      Ticker                   55.9ms ± 2%  55.9ms ± 2%     ~     (p=0.971 n=10+10)
      Ticker-2                 28.7ms ± 2%  28.1ms ± 1%   -2.06%  (p=0.000 n=10+10)
      Ticker-4                 14.6ms ± 0%  13.6ms ± 1%   -6.80%  (p=0.000 n=9+10)
      
      Fixes #15133
      
      Change-Id: I6f4b09d2db8c5bec93146db6501b44dbfe5c0ac4
      Reviewed-on: https://go-review.googlesource.com/34784Reviewed-by: default avatarAustin Clements <austin@google.com>
      76f4fd8a
  13. 29 Aug, 2017 1 commit
    • Austin Clements's avatar
      runtime,cmd/trace: trace GC STW events · b0392159
      Austin Clements authored
      Right now we only kind of sort of trace GC STW events. We emit events
      around mark termination, but those start well after stopping the world
      and end before starting it again, and we don't emit any events for
      sweep termination.
      
      Fix this by generalizing EvGCScanStart/EvGCScanDone. These were
      already re-purposed to indicate mark termination (despite the names).
      This commit renames them to EvGCSTWStart/EvGCSTWDone, adds an argument
      to indicate the STW reason, and shuffles the runtime to generate them
      right before stopping the world and right after starting the world,
      respectively.
      
      These events will make it possible to generate precise minimum mutator
      utilization (MMU) graphs and could be useful in detecting
      non-preemptible goroutines (e.g., #20792).
      
      Change-Id: If95783f370781d8ef66addd94886028103a7c26f
      Reviewed-on: https://go-review.googlesource.com/55411Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      b0392159
  14. 19 Apr, 2017 2 commits
    • Austin Clements's avatar
      runtime: record swept and reclaimed bytes in sweep trace · 22000f54
      Austin Clements authored
      This extends the GCSweepDone event with counts of swept and reclaimed
      bytes. These are useful for understanding the duration and
      effectiveness of sweep events.
      
      Change-Id: I3c97a4f0f3aad3adbd188adb264859775f54e2df
      Reviewed-on: https://go-review.googlesource.com/40811
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      22000f54
    • Austin Clements's avatar
      runtime: make sweep trace events encompass entire sweep loop · 79c56add
      Austin Clements authored
      Currently, each individual span sweep emits a span to the trace. But
      sweeps are generally done in loops until some condition is satisfied,
      so this tracing is lower-level than anyone really wants any hides the
      fact that no other work is being accomplished between adjacent sweep
      events. This is also high overhead: enabling tracing significantly
      impacts sweep latency.
      
      Replace this with instead tracing around the sweep loops used for
      allocation. This is slightly tricky because sweep loops don't
      generally know if any sweeping will happen in them. Hence, we make the
      tracing lazy by recording in the P that we would like to start tracing
      the sweep *if* one happens, and then only closing the sweep event if
      we started it.
      
      This does mean we don't get tracing on every sweep path, which are
      legion. However, we get much more informative tracing on the paths
      that block allocation, which are the paths that matter.
      
      Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c
      Reviewed-on: https://go-review.googlesource.com/40810
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      79c56add
  15. 14 Apr, 2017 1 commit
    • David Lazar's avatar
      runtime/trace: iterate over frames instead of PCs · 3249cb0a
      David Lazar authored
      Now the runtime/trace tests pass with -l=4.
      
      This also gets rid of the frames cache for multiple reasons:
      
      1) The frames cache was used to avoid repeated calls to funcname and
      funcline. Now these calls happen inside the CallersFrames iterator.
      
      2) Maintaining a frames cache is harder: map[uintptr]traceFrame
      doesn't work since each PC can map to multiple traceFrames.
      
      3) It's not clear that the cache is important.
      
      Change-Id: I2914ac0b3ba08e39b60149d99a98f9f532b35bbb
      Reviewed-on: https://go-review.googlesource.com/40591
      Run-TryBot: David Lazar <lazard@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      3249cb0a
  16. 16 Mar, 2017 1 commit
  17. 06 Mar, 2017 1 commit
    • Austin Clements's avatar
      runtime: avoid repeated findmoduledatap calls · 0efc8b21
      Austin Clements authored
      Currently almost every function that deals with a *_func has to first
      look up the *moduledata for the module containing the function's entry
      point. This means we almost always do at least two identical module
      lookups whenever we deal with a *_func (one to get the *_func and
      another to get something from its module data) and sometimes several
      more.
      
      Fix this by making findfunc return a new funcInfo type that embeds
      *_func, but also includes the *moduledata, and making all of the
      functions that currently take a *_func instead take a funcInfo and use
      the already-found *moduledata.
      
      This transformation is trivial for the most part, since the *_func
      type is usually inferred. The annoying part is that we can no longer
      use nil to indicate failure, so this introduces a funcInfo.valid()
      method and replaces nil checks with calls to valid.
      
      Change-Id: I9b8075ef1c31185c1943596d96dec45c7ab5100f
      Reviewed-on: https://go-review.googlesource.com/37331
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMichael Hudson-Doyle <michael.hudson@canonical.com>
      0efc8b21
  18. 17 Feb, 2017 1 commit
    • Dmitry Vyukov's avatar
      sync: make Mutex more fair · 0556e262
      Dmitry Vyukov authored
      Add new starvation mode for Mutex.
      In starvation mode ownership is directly handed off from
      unlocking goroutine to the next waiter. New arriving goroutines
      don't compete for ownership.
      Unfair wait time is now limited to 1ms.
      Also fix a long standing bug that goroutines were requeued
      at the tail of the wait queue. That lead to even more unfair
      acquisition times with multiple waiters.
      
      Performance of normal mode is not considerably affected.
      
      Fixes #13086
      
      On the provided in the issue lockskew program:
      
      done in 1.207853ms
      done in 1.177451ms
      done in 1.184168ms
      done in 1.198633ms
      done in 1.185797ms
      done in 1.182502ms
      done in 1.316485ms
      done in 1.211611ms
      done in 1.182418ms
      
      name                    old time/op  new time/op   delta
      MutexUncontended-48     0.65ns ± 0%   0.65ns ± 1%     ~           (p=0.087 n=10+10)
      Mutex-48                 112ns ± 1%    114ns ± 1%   +1.69%        (p=0.000 n=10+10)
      MutexSlack-48            113ns ± 0%     87ns ± 1%  -22.65%         (p=0.000 n=8+10)
      MutexWork-48             149ns ± 0%    145ns ± 0%   -2.48%         (p=0.000 n=9+10)
      MutexWorkSlack-48        149ns ± 0%    122ns ± 3%  -18.26%         (p=0.000 n=6+10)
      MutexNoSpin-48           103ns ± 4%    105ns ± 3%     ~           (p=0.089 n=10+10)
      MutexSpin-48             490ns ± 4%    515ns ± 6%   +5.08%        (p=0.006 n=10+10)
      Cond32-48               13.4µs ± 6%   13.1µs ± 5%   -2.75%        (p=0.023 n=10+10)
      RWMutexWrite100-48      53.2ns ± 3%   41.2ns ± 3%  -22.57%        (p=0.000 n=10+10)
      RWMutexWrite10-48       45.9ns ± 2%   43.9ns ± 2%   -4.38%        (p=0.000 n=10+10)
      RWMutexWorkWrite100-48   122ns ± 2%    134ns ± 1%   +9.92%        (p=0.000 n=10+10)
      RWMutexWorkWrite10-48    206ns ± 1%    188ns ± 1%   -8.52%         (p=0.000 n=8+10)
      Cond32-24               12.1µs ± 3%   12.4µs ± 3%   +1.98%         (p=0.043 n=10+9)
      MutexUncontended-24     0.74ns ± 1%   0.75ns ± 1%     ~           (p=0.650 n=10+10)
      Mutex-24                 122ns ± 2%    124ns ± 1%   +1.31%        (p=0.007 n=10+10)
      MutexSlack-24           96.9ns ± 2%  102.8ns ± 2%   +6.11%        (p=0.000 n=10+10)
      MutexWork-24             146ns ± 1%    135ns ± 2%   -7.70%         (p=0.000 n=10+9)
      MutexWorkSlack-24        135ns ± 1%    128ns ± 2%   -5.01%         (p=0.000 n=10+9)
      MutexNoSpin-24           114ns ± 3%    110ns ± 4%   -3.84%        (p=0.000 n=10+10)
      MutexSpin-24             482ns ± 4%    475ns ± 8%     ~           (p=0.286 n=10+10)
      RWMutexWrite100-24      43.0ns ± 3%   43.1ns ± 2%     ~           (p=0.956 n=10+10)
      RWMutexWrite10-24       43.4ns ± 1%   43.2ns ± 1%     ~            (p=0.085 n=10+9)
      RWMutexWorkWrite100-24   130ns ± 3%    131ns ± 3%     ~           (p=0.747 n=10+10)
      RWMutexWorkWrite10-24    191ns ± 1%    192ns ± 1%     ~           (p=0.210 n=10+10)
      Cond32-12               11.5µs ± 2%   11.7µs ± 2%   +1.98%        (p=0.002 n=10+10)
      MutexUncontended-12     1.48ns ± 0%   1.50ns ± 1%   +1.08%        (p=0.004 n=10+10)
      Mutex-12                 141ns ± 1%    143ns ± 1%   +1.63%        (p=0.000 n=10+10)
      MutexSlack-12            121ns ± 0%    119ns ± 0%   -1.65%          (p=0.001 n=8+9)
      MutexWork-12             141ns ± 2%    150ns ± 3%   +6.36%         (p=0.000 n=9+10)
      MutexWorkSlack-12        131ns ± 0%    138ns ± 0%   +5.73%         (p=0.000 n=9+10)
      MutexNoSpin-12          87.0ns ± 1%   83.7ns ± 1%   -3.80%        (p=0.000 n=10+10)
      MutexSpin-12             364ns ± 1%    377ns ± 1%   +3.77%        (p=0.000 n=10+10)
      RWMutexWrite100-12      42.8ns ± 1%   43.9ns ± 1%   +2.41%         (p=0.000 n=8+10)
      RWMutexWrite10-12       39.8ns ± 4%   39.3ns ± 1%     ~            (p=0.433 n=10+9)
      RWMutexWorkWrite100-12   131ns ± 1%    131ns ± 0%     ~            (p=0.591 n=10+9)
      RWMutexWorkWrite10-12    173ns ± 1%    174ns ± 0%     ~            (p=0.059 n=10+8)
      Cond32-6                10.9µs ± 2%   10.9µs ± 2%     ~           (p=0.739 n=10+10)
      MutexUncontended-6      2.97ns ± 0%   2.97ns ± 0%     ~     (all samples are equal)
      Mutex-6                  122ns ± 6%    122ns ± 2%     ~           (p=0.668 n=10+10)
      MutexSlack-6             149ns ± 3%    142ns ± 3%   -4.63%        (p=0.000 n=10+10)
      MutexWork-6              136ns ± 3%    140ns ± 5%     ~           (p=0.077 n=10+10)
      MutexWorkSlack-6         152ns ± 0%    138ns ± 2%   -9.21%         (p=0.000 n=6+10)
      MutexNoSpin-6            150ns ± 1%    152ns ± 0%   +1.50%         (p=0.000 n=8+10)
      MutexSpin-6              726ns ± 0%    730ns ± 1%     ~           (p=0.069 n=10+10)
      RWMutexWrite100-6       40.6ns ± 1%   40.9ns ± 1%   +0.91%         (p=0.001 n=8+10)
      RWMutexWrite10-6        37.1ns ± 0%   37.0ns ± 1%     ~            (p=0.386 n=9+10)
      RWMutexWorkWrite100-6    133ns ± 1%    134ns ± 1%   +1.01%         (p=0.005 n=9+10)
      RWMutexWorkWrite10-6     152ns ± 0%    152ns ± 0%     ~     (all samples are equal)
      Cond32-2                7.86µs ± 2%   7.95µs ± 2%   +1.10%        (p=0.023 n=10+10)
      MutexUncontended-2      8.10ns ± 0%   9.11ns ± 4%  +12.44%         (p=0.000 n=9+10)
      Mutex-2                 32.9ns ± 9%   38.4ns ± 6%  +16.58%        (p=0.000 n=10+10)
      MutexSlack-2            93.4ns ± 1%   98.5ns ± 2%   +5.39%         (p=0.000 n=10+9)
      MutexWork-2             40.8ns ± 3%   43.8ns ± 7%   +7.38%         (p=0.000 n=10+9)
      MutexWorkSlack-2        98.6ns ± 5%  108.2ns ± 2%   +9.80%         (p=0.000 n=10+8)
      MutexNoSpin-2            399ns ± 1%    398ns ± 2%     ~             (p=0.463 n=8+9)
      MutexSpin-2             1.99µs ± 3%   1.97µs ± 1%   -0.81%          (p=0.003 n=9+8)
      RWMutexWrite100-2       37.6ns ± 5%   46.0ns ± 4%  +22.17%         (p=0.000 n=10+8)
      RWMutexWrite10-2        50.1ns ± 6%   36.8ns ±12%  -26.46%         (p=0.000 n=9+10)
      RWMutexWorkWrite100-2    136ns ± 0%    134ns ± 2%   -1.80%          (p=0.001 n=7+9)
      RWMutexWorkWrite10-2     140ns ± 1%    138ns ± 1%   -1.50%        (p=0.000 n=10+10)
      Cond32                  5.93µs ± 1%   5.91µs ± 0%     ~            (p=0.411 n=9+10)
      MutexUncontended        15.9ns ± 0%   15.8ns ± 0%   -0.63%          (p=0.000 n=8+8)
      Mutex                   15.9ns ± 0%   15.8ns ± 0%   -0.44%        (p=0.003 n=10+10)
      MutexSlack              26.9ns ± 3%   26.7ns ± 2%     ~           (p=0.084 n=10+10)
      MutexWork               47.8ns ± 0%   47.9ns ± 0%   +0.21%          (p=0.014 n=9+8)
      MutexWorkSlack          54.9ns ± 3%   54.5ns ± 3%     ~           (p=0.254 n=10+10)
      MutexNoSpin              786ns ± 2%    765ns ± 1%   -2.66%        (p=0.000 n=10+10)
      MutexSpin               3.87µs ± 1%   3.83µs ± 0%   -0.85%          (p=0.005 n=9+8)
      RWMutexWrite100         21.2ns ± 2%   21.0ns ± 1%   -0.88%         (p=0.018 n=10+9)
      RWMutexWrite10          22.6ns ± 1%   22.6ns ± 0%     ~             (p=0.471 n=9+9)
      RWMutexWorkWrite100      132ns ± 0%    132ns ± 0%     ~     (all samples are equal)
      RWMutexWorkWrite10       124ns ± 0%    123ns ± 0%     ~           (p=0.656 n=10+10)
      
      Change-Id: I66412a3a0980df1233ad7a5a0cd9723b4274528b
      Reviewed-on: https://go-review.googlesource.com/34310
      Run-TryBot: Russ Cox <rsc@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      0556e262
  19. 14 Feb, 2017 1 commit
  20. 10 Feb, 2017 1 commit
    • Heschi Kreinick's avatar
      cmd/trace: Record mark assists in execution traces · 2a74b9e8
      Heschi Kreinick authored
      During the mark phase of garbage collection, goroutines that allocate
      may be recruited to assist. This change creates trace events for mark
      assists and displays them similarly to sweep assists in the trace
      viewer.
      
      Mark assists are different than sweeps in that they can be preempted, so
      displaying them in the trace viewer is a little tricky -- we may need to
      synthesize multiple slices for one mark assist. This could have been
      done in the parser instead, but I thought it might be preferable to keep
      the parser as true to the event stream as possible.
      
      Change-Id: I381dcb1027a187a354b1858537851fa68a620ea7
      Reviewed-on: https://go-review.googlesource.com/36015
      Run-TryBot: Heschi Kreinick <heschi@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      Reviewed-by: default avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      2a74b9e8
  21. 28 Oct, 2016 3 commits
    • Austin Clements's avatar
      runtime, cmd/trace: track goroutines blocked on GC assists · 6da83c6f
      Austin Clements authored
      Currently when a goroutine blocks on a GC assist, it emits a generic
      EvGoBlock event. Since assist blocking events and, in particular, the
      length of the blocked assist queue, are important for diagnosing GC
      behavior, this commit adds a new EvGoBlockGC event for blocking on a
      GC assist. The trace viewer uses this event to report a "waiting on
      GC" count in the "Goroutines" row. This makes sense because, unlike
      other blocked goroutines, these goroutines do have work to do, so
      being blocked on a GC assist is quite similar to being in the
      "runnable" state, which we also report in the trace viewer.
      
      Change-Id: Ic21a326992606b121ea3d3d00110d8d1fdc7a5ef
      Reviewed-on: https://go-review.googlesource.com/30704
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      6da83c6f
    • Austin Clements's avatar
      runtime, cmd/trace: annotate different mark worker types · 68348394
      Austin Clements authored
      Currently mark workers are shown in the trace as regular goroutines
      labeled "runtime.gcBgMarkWorker". That's somewhat unhelpful to an end
      user because of the opaque label and particularly unhelpful to runtime
      developers because it doesn't distinguish the different types of mark
      workers.
      
      Fix this by introducing a variant of the GoStart event called
      GoStartLabel that lets the runtime indicate a label for a goroutine
      execution span and using this to label mark worker executions as "GC
      (<mode>)" in the trace viewer.
      
      Since this bumps the trace version to 1.8, we also add test data for
      1.7 traces.
      
      Change-Id: Id7b9c0536508430c661ffb9e40e436f3901ca121
      Reviewed-on: https://go-review.googlesource.com/30702
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      68348394
    • Peter Weinberger's avatar
      runtime: Profile goroutines holding contended mutexes. · ca922b6d
      Peter Weinberger authored
      runtime.SetMutexProfileFraction(n int) will capture 1/n-th of stack
      traces of goroutines holding contended mutexes if n > 0. From runtime/pprof,
      pprot.Lookup("mutex").WriteTo writes the accumulated
      stack traces to w (in essentially the same format that blocking
      profiling uses).
      
      Change-Id: Ie0b54fa4226853d99aa42c14cb529ae586a8335a
      Reviewed-on: https://go-review.googlesource.com/29650Reviewed-by: default avatarAustin Clements <austin@google.com>
      ca922b6d
  22. 21 Oct, 2016 1 commit
  23. 15 Oct, 2016 1 commit
  24. 07 Oct, 2016 2 commits
    • Austin Clements's avatar
      cmd/trace: label mark termination spans as such · 94589054
      Austin Clements authored
      Currently these are labeled "MARK", which was accurate in the STW
      collector, but these really indicate mark termination now, since
      marking happens for the full duration of the concurrent GC. Re-label
      them as "MARK TERMINATION" to clarify this.
      
      Change-Id: Ie98bd961195acde49598b4fa3f9e7d90d757c0a6
      Reviewed-on: https://go-review.googlesource.com/30018Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      94589054
    • Austin Clements's avatar
      runtime: make next_gc ^0 when GC is disabled · fa9b57bb
      Austin Clements authored
      When GC is disabled, we set gcpercent to -1. However, we still use
      gcpercent to compute several values, such as next_gc and gc_trigger.
      These calculations are meaningless when gcpercent is -1 and result in
      meaningless values. This is okay in a sense because we also never use
      these values if gcpercent is -1, but they're confusing when exposed to
      the user, for example via MemStats or the execution trace. It's
      particularly unfortunate in the execution trace because it attempts to
      plot the underflowed value of next_gc, which scales all useful
      information in the heap row into oblivion.
      
      Fix this by making next_gc ^0 when gcpercent < 0. This has the
      advantage of being true in a way: next_gc is effectively infinite when
      gcpercent < 0. We can also detect this special value when updating the
      execution trace and report next_gc as 0 so it doesn't blow up the
      display of the heap line.
      
      Change-Id: I4f366e4451f8892a4908da7b2b6086bdc67ca9a9
      Reviewed-on: https://go-review.googlesource.com/30016Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      fa9b57bb
  25. 02 Sep, 2016 1 commit
  26. 22 Aug, 2016 1 commit
    • Dmitry Vyukov's avatar
      runtime: speed up StartTrace with lots of blocked goroutines · 747a158e
      Dmitry Vyukov authored
      In StartTrace we emit EvGoCreate for all existing goroutines.
      This includes stack unwind to obtain current stack.
      Real Go programs can contain hundreds of thousands of blocked goroutines.
      For such programs StartTrace can take up to a second (few ms per goroutine).
      
      Obtain current stack ID once and use it for all EvGoCreate events.
      
      This speeds up StartTrace with 10K blocked goroutines from 20ms to 4 ms
      (win for StartTrace called from net/http/pprof hander will be bigger
      as stack is deeper).
      
      Change-Id: I9e5ff9468331a840f8fdcdd56c5018c2cfde61fc
      Reviewed-on: https://go-review.googlesource.com/25573
      Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      747a158e
  27. 23 Apr, 2016 1 commit
    • Dmitry Vyukov's avatar
      runtime: use per-goroutine sequence numbers in tracer · a3703618
      Dmitry Vyukov authored
      Currently tracer uses global sequencer and it introduces
      significant slowdown on parallel machines (up to 10x).
      Replace the global sequencer with per-goroutine sequencer.
      
      If we assign per-goroutine sequence numbers to only 3 types
      of events (start, unblock and syscall exit), it is enough to
      restore consistent partial ordering of all events. Even these
      events don't need sequence numbers all the time (if goroutine
      starts on the same P where it was unblocked, then start does
      not need sequence number).
      The burden of restoring the order is put on trace parser.
      Details of the algorithm are described in the comments.
      
      On http benchmark with GOMAXPROCS=48:
      no tracing: 5026 ns/op
      tracing: 27803 ns/op (+453%)
      with this change: 6369 ns/op (+26%, mostly for traceback)
      
      Also trace size is reduced by ~22%. Average event size before: 4.63
      bytes/event, after: 3.62 bytes/event.
      
      Besides running trace tests, I've also tested with manually broken
      cputicks (random skew for each event, per-P skew and episodic random skew).
      In all cases broken timestamps were detected and no test failures.
      
      Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa
      Reviewed-on: https://go-review.googlesource.com/21512
      Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a3703618
  28. 22 Apr, 2016 1 commit
  29. 11 Apr, 2016 1 commit
  30. 08 Apr, 2016 1 commit
  31. 27 Jan, 2016 1 commit
    • Austin Clements's avatar
      runtime: acquire stack lock in traceEvent · 08594ac7
      Austin Clements authored
      traceEvent records system call events after a G has already entered
      _Gsyscall, which means the garbage collector could be installing stack
      barriers in the G's stack during the traceEvent. If traceEvent
      attempts to capture the user stack during this, it may observe a
      inconsistent stack barriers and panic. Fix this by acquiring the stack
      lock around the stack walk in traceEvent.
      
      Fixes #14101.
      
      Change-Id: I15f0ab0c70c04c6e182221f65a6f761c5a896459
      Reviewed-on: https://go-review.googlesource.com/18973
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      08594ac7
  32. 19 Nov, 2015 1 commit
  33. 12 Nov, 2015 1 commit