Update `job_traces.md` documentation

c5067b12 · Kamil Trzciński · 2a03142c · c5067b12
Commit c5067b12 authored May 07, 2018 by Kamil Trzciński
Hide whitespace changes
Inline Side-by-side

Showing with 27 additions and 28 deletions

doc/administration/job_traces.md doc/administration/job_traces.md +27 -28

No files found.
--- a/doc/administration/job_traces.md
+++ b/doc/administration/job_traces.md
@@ -52,35 +52,34 @@ To change the location where the job logs will be stored, follow the steps below

 **What is "live trace"?**

-It's job traces exists while job is being processed by Gitlab-Runner. You can see the progress in job pages(GUI).
-In contrast, all traces will be archived after job is finished, that's called "archived trace".
+Job trace that is sent by runner while jobs are running. You can see live trace in job pages UI.
+The live traces are archived once job finishes.

 **What is new architecture?**

-So far, when GitLab-Runner sends a job trace to GitLab-Rails, traces have been saved to File Storage as text files.
-This was a problem on [Cloud Native-compatible GitLab application](https://gitlab.com/gitlab-com/migration/issues/23) that
-GitLab-Rails had to rely on File Storage.
+So far, when GitLab Runner sends a job trace to GitLab-Rails, traces have been saved to file storage as text files.
+This was a problem for [Cloud Native-compatible GitLab application](https://gitlab.com/gitlab-com/migration/issues/23) where GitLab had to rely on File Storage.

-This new live trace architecture stores traces to Redis and Database instead of File Storage.
-Redis is used as first-class trace storage, it stores each trace upto 128KB. Once the data is fulfileld, it's flushed to Database. Afterwhile, the data in Redis and Database will be archived to ObjectStorage.
+This new live trace architecture stores chunks of traces in Redis and database instead of file storage.
+Redis is used as first-class storage, and it stores up-to 128kB. Once the full chunk is sent it will be flushed to database. Afterwhile, the data in Redis and database will be archived to ObjectStorage.

 Here is the detailed data flow.

-1. GitLab-Runner picks a job from GitLab-Rails
-1. GitLab-Runner sends a piece of trace to GitLab-Rails
+1. GitLab Runner picks a job from GitLab-Rails
+1. GitLab Runner sends a piece of trace to GitLab-Rails
 1. GitLab-Rails appends the data to Redis
-1. If the data in Redis is fulfilled 128KB, the data is flushed to Database.
+1. If the data in Redis is fulfilled 128kB, the data is flushed to Database.
 1. 2.~4. is continued until the job is finished
 1. Once the job is finished, GitLab-Rails schedules a sidekiq worker to archive the trace
 1. The sidekiq worker archives the trace to Object Storage, and cleanup the trace in Redis and Database

-**How to check if it's on or off**
+**How to check if it's on or off?**

 ```ruby
 Feature.enabled?('ci_enable_live_trace')
 ```

-**How to enable**
+**How to enable?**

 ```ruby
 Feature.enable('ci_enable_live_trace')
@@ -89,7 +88,7 @@ Feature.enable('ci_enable_live_trace')
 >**Note:**
 The transition period will be handled gracefully. Upcoming traces will be generated with the new architecture, and on-going live traces will stay with the legacy architecture (i.e. on-going live traces won't be re-generated forcibly with the new architecture).

-**How to disable**
+**How to disable?**

 ```ruby
 Feature.disable('ci_enable_live_trace')
@@ -98,41 +97,41 @@ Feature.disable('ci_enable_live_trace')
 >**Note:**
 The transition period will be handled gracefully. Upcoming traces will be generated with the legacy architecture, and on-going live traces will stay with the new architecture (i.e. on-going live traces won't be re-generated forcibly with the legacy architecture).

-**Redis namespace**
+**Redis namespace:**

 `Gitlab::Redis::SharedState`

-**Potential impact**
+**Potential impact:**

- This feature could incur data loss
+- This feature could incur data loss:
  - Case 1: When all data in Redis are accidentally flushed.
-    - On-going live traces could be recovered by re-sending traces (This is supported by all versions of GitLab-Runner)
-    - Finished jobs which has not archived live traces will lose the last part(~128KB) of trace data.
-  - Case 2: When sidekiq workers failed to archive (e.g. There was a bug that prevents archiving process, Sidekiq inconsistancy, etc)
+    - On-going live traces could be recovered by re-sending traces (This is supported by all versions of GitLab Runner)
+    - Finished jobs which has not archived live traces will lose the last part (~128kB) of trace data.
+  - Case 2: When sidekiq workers failed to archive (e.g. There was a bug that prevents archiving process, Sidekiq inconsistancy, etc):
    - Currently all trace data in Redis will be deleted after one week. If the sidekiq workers can't finish by the expiry date, the part of trace data will be lost.
- This feature could consume all memeory on Redis instance. If the number of jobs is 1000, 128KB * 1000 = 128MB is consumed.
- This feature could pressure Database instance. `INSERT` is queried per 128KB per a job. `UPDATE` is queried with the same condition, but only if the total size of the trace exceeds 128KB.
+- This feature could consume all memory on Redis instance. If the number of jobs is 1000, 128MB (128kB * 1000) is consumed.
+- This feature could pressure Database replication lag. `INSERT` are generated to indicate that we have trace chunk. `UPDATE` with 128kB of data is issued once we receive multiple chunks.
 - and so on

-**How to test**
+**How to test?**

 We're currently evaluating this feature on dev.gitalb.org or staging.gitlab.com to verify this features. Here is the list of tests/measurements.

- Features
+- Features:
  - Live traces should be visible on job pages
  - Archived traces should be visible on job pages
  - Live traces should be archived to Object storage
  - Live traces should be cleaned up after archived
  - etc
- Performance
+- Performance:
  - Schedule 1000~10000 jobs and let GitLab-runners process concurrently. Measure memoery presssure, IO load, etc.
  - etc
- Failover
+- Failover:
  - Simulate Redis outage
  - etc

-**How to verify the correctnesss**
- 
- - TBD
+**How to verify the correctnesss?**
+
+- TBD

 [ce-44935]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/18169