Docs: Merge EE doc/development to CE

02074684 · Marcel Amirault · Achilleas Pipinellis · 6f54ced4 · 02074684 · 02074684
Commit 02074684 authored May 05, 2019 by Marcel Amirault Committed by Achilleas Pipinellis May 05, 2019
9 changed files
--- a/doc/development/README.md
+++ b/doc/development/README.md
@@ -38,6 +38,7 @@ description: 'Learn how to contribute to GitLab.'
 - [Sidekiq guidelines](sidekiq_style_guide.md) for working with Sidekiq workers
 - [Working with Gitaly](gitaly.md)
 - [Manage feature flags](feature_flags.md)
+- [Licensed feature availability](licensed_feature_availability.md)
 - [View sent emails or preview mailers](emails.md)
 - [Shell commands](shell_commands.md) in the GitLab codebase
 - [`Gemfile` guidelines](gemfile.md)
@@ -48,6 +49,7 @@ description: 'Learn how to contribute to GitLab.'
 - [How to dump production data to staging](db_dump.md)
 - [Working with the GitHub importer](github_importer.md)
 - [Import/Export development documentation](import_export.md)
+- [Elasticsearch integration docs](elasticsearch.md)
 - [Working with Merge Request diffs](diffs.md)
 - [Kubernetes integration guidelines](kubernetes.md)
 - [Permissions](permissions.md)
@@ -55,6 +57,7 @@ description: 'Learn how to contribute to GitLab.'
 - [Guidelines for reusing abstractions](reusing_abstractions.md)
 - [DeclarativePolicy framework](policies.md)
 - [How Git object deduplication works in GitLab](git_object_deduplication.md)
+- [Geo development](geo.md)

 ## Performance guides


--- a/doc/development/contributing/merge_request_workflow.md
+++ b/doc/development/contributing/merge_request_workflow.md
@@ -155,7 +155,7 @@ the contribution acceptance criteria below:
     restarting the failing CI job, rebasing from master to bring in updates that
     may resolve the failure, or if it has not been fixed yet, ask a developer to
     help you fix the test.
-1. The MR initially contains a a few logically organized commits.
+1. The MR initially contains a few logically organized commits.
 1. The changes can merge without problems. If not, you should rebase if you're the
   only one working on your feature branch, otherwise merge `master`.
 1. Only one specific issue is fixed or one specific feature is implemented. Do not

--- a/doc/development/elasticsearch.md
+++ b/doc/development/elasticsearch.md
+# Elasticsearch knowledge **[STARTER ONLY]**
+
+This area is to maintain a compendium of useful information when working with elasticsearch.
+
+Information on how to enable ElasticSearch and perform the initial indexing is kept in https://docs.gitlab.com/ee/integration/elasticsearch.html#enabling-elasticsearch
+
+## Initial installation on OS X
+
+It is recommended to use the Docker image. After installing docker you can immediately spin up an instance with
+
+```
+docker run --name elastic56 -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:5.6.12
+```
+
+and use `docker stop elastic56` and `docker start elastic56` to stop/start it.
+
+### Installing on the host
+
+We currently only support Elasticsearch [5.6 to 6.x](https://docs.gitlab.com/ee/integration/elasticsearch.html#requirements)
+
+Version 5.6 is available on homebrew and is the recommended version to use in order to test compatibility.
+
+```
+brew install elasticsearch@5.6
+```
+
+There is no need to install any plugins
+
+## New repo indexer (beta)
+
+If you're interested on working with the new beta repo indexer, all you need to do is:
+
+- git clone git@gitlab.com:gitlab-org/gitlab-elasticsearch-indexer.git
+- make
+- make install
+
+this adds `gitlab-elasticsearch-indexer` to `$GOPATH/bin`, please make sure that is in your `$PATH`. After that GitLab will find it and you'll be able to enable it in the admin settings area.
+
+**note:** `make` will not recompile the executable unless you do `make clean` beforehand
+
+## Helpful rake tasks
+
+- `gitlab:elastic:test:index_size`: Tells you how much space the current index is using, as well as how many documents are in the index.
+- `gitlab:elastic:test:index_size_change`: Outputs index size, reindexes, and outputs index size again. Useful when testing improvements to indexing size.
+
+Additionally, if you need large repos or multiple forks for testing, please consider [following these instructions](https://docs.gitlab.com/ee/development/rake_tasks.html#extra-project-seed-options)
+
+## How does it work?
+
+The ElasticSearch integration depends on an external indexer. We ship a [ruby indexer](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/bin/elastic_repo_indexer) by default but are also working on an [indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer). The user must trigger the initial indexing via a rake task, but after this is done GitLab itself will trigger reindexing when required via `after_` callbacks on create, update, and destroy that are inherited from [/ee/app/models/concerns/elastic/application_search.rb](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/app/models/concerns/elastic/application_search.rb).
+
+All indexing after the initial one is done via `ElasticIndexerWorker` (sidekiq jobs).
+
+Search queries are generated by the concerns found in [ee/app/models/concerns/elastic](https://gitlab.com/gitlab-org/gitlab-ee/tree/master/ee/app/models/concerns/elastic). These concerns are also in charge of access control, and have been a historic source of security bugs so please pay close attention to them!
+
+## Existing Analyzers/Tokenizers/Filters
+These are all defined in https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/elasticsearch/git/model.rb
+
+### Analyzers
+#### `path_analyzer`
+Used when indexing blobs' paths. Uses the `path_tokenizer` and the `lowercase` and `asciifolding` filters.
+
+Please see the `path_tokenizer` explanation below for an example.
+
+#### `sha_analyzer`
+Used in blobs and commits. Uses the `sha_tokenizer` and the `lowercase` and `asciifolding` filters.
+
+Please see the `sha_tokenizer` explanation later below for an example.
+
+#### `code_analyzer`
+Used when indexing a blob's filename and content. Uses the `whitespace` tokenizer and the filters: `code`, `edgeNGram_filter`, `lowercase`, and `asciifolding`
+
+The `whitespace` tokenizer was selected in order to have more control over how tokens are split. For example the string `Foo::bar(4)` needs to generate tokens like `Foo` and `bar(4)` in order to be properly searched.
+
+Please see the `code` filter for an explanation on how tokens are split.
+
+#### `code_search_analyzer`
+Not directly used for indexing, but rather used to transform a search input. Uses the `whitespace` tokenizer and the `lowercase` and `asciifolding` filters.
+
+### Tokenizers
+#### `sha_tokenizer`
+This is a custom tokenizer that uses the [`edgeNGram` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenizer.html) to allow SHAs to be searcheable by any sub-set of it (minimum of 5 chars).
+
+example:
+
+`240c29dc7e` becomes:
+- `240c2`
+- `240c29`
+- `240c29d`
+- `240c29dc`
+- `240c29dc7`
+- `240c29dc7e`
+
+#### `path_tokenizer`
+This is a custom tokenizer that uses the [`path_hierarchy` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-pathhierarchy-tokenizer.html) with `reverse: true` in order to allow searches to find paths no matter how much or how little of the path is given as input.
+
+example:
+
+`'/some/path/application.js'` becomes:
+- `'/some/path/application.js'`
+- `'some/path/application.js'`
+- `'path/application.js'`
+- `'application.js'`
+
+### Filters
+#### `code`
+Uses a [Pattern Capture token filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-pattern-capture-tokenfilter.html) to split tokens into more easily searched versions of themselves. 
+
+Patterns:
+- `"(\\p{Ll}+|\\p{Lu}\\p{Ll}+|\\p{Lu}+)"`: captures CamelCased and lowedCameCased strings as separate tokens
+- `"(\\d+)"`: extracts digits
+- `"(?=([\\p{Lu}]+[\\p{L}]+))"`: captures CamelCased strings recursively. Ex: `ThisIsATest` => `[ThisIsATest, IsATest, ATest, Test]`
+- `'"((?:\\"|[^"]|\\")*)"'`: captures terms inside quotes, removing the quotes
+- `"'((?:\\'|[^']|\\')*)'"`: same as above, for single-quotes
+- `'\.([^.]+)(?=\.|\s|\Z)'`: separate terms with periods in-between
+- `'\/?([^\/]+)(?=\/|\b)'`: separate path terms `like/this/one`
+
+#### `edgeNGram_filter`
+Uses an [Edge NGram token filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenfilter.html) to allow inputs with only parts of a token to find the token. For example it would turn `glasses` into permutations starting with `gl` and ending with `glasses`, which would allow a search for "`glass`" to find the original token `glasses`
+
+## Gotchas
+
+- Searches can have their own analyzers. Remember to check when editing analyzers
+- `Character` filters (as opposed to token filters) always replace the original character, so they're not a good choice as they can hinder exact searches
+
+## Troubleshooting
+
+### Getting "flood stage disk watermark [95%] exceeded"
+
+You might get an error such as
+
+```
+[2018-10-31T15:54:19,762][WARN ][o.e.c.r.a.DiskThresholdMonitor] [pval5Ct] 
+   flood stage disk watermark [95%] exceeded on 
+   [pval5Ct7SieH90t5MykM5w][pval5Ct][/usr/local/var/lib/elasticsearch/nodes/0] free: 56.2gb[3%], 
+   all indices on this node will be marked read-only
+```
+
+This is because you've exceeded the disk space threshold - it thinks you don't have enough disk space left, based on the default 95% threshold.  
+
+In addition, the `read_only_allow_delete` setting will be set to `true`.  It will block indexing, `forcemerge`, etc
+
+```
+curl "http://localhost:9200/gitlab-development/_settings?pretty"
+```
+
+Add this to your `elasticsearch.yml` file:
+
+```
+# turn off the disk allocator
+cluster.routing.allocation.disk.threshold_enabled: false 
+```
+
+_or_
+
+```
+# set your own limits
+cluster.routing.allocation.disk.threshold_enabled: true 
+cluster.routing.allocation.disk.watermark.flood_stage: 5gb   # ES 6.x only
+cluster.routing.allocation.disk.watermark.low: 15gb 
+cluster.routing.allocation.disk.watermark.high: 10gb
+```
+
+Restart ElasticSearch, and the `read_only_allow_delete` will clear on it's own.
+
+_from "Disk-based Shard Allocation | Elasticsearch Reference" [5.6](https://www.elastic.co/guide/en/elasticsearch/reference/5.6/disk-allocator.html#disk-allocator) and [6.x](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/disk-allocator.html)_
--- a/doc/development/fe_guide/style_guide_scss.md
+++ b/doc/development/fe_guide/style_guide_scss.md
@@ -16,10 +16,12 @@ New utility classes should be added to [`utilities.scss`](https://gitlab.com/git

 **Background color**: `.bg-variant-shade` e.g. `.bg-warning-400`  
 **Text color**: `.text-variant-shade` e.g. `.text-success-500` 
+
 - variant is one of 'primary', 'secondary', 'success', 'warning', 'error'
 - shade is on of the shades listed on [colors](https://design.gitlab.com/foundations/colors/)

 **Font size**: `.text-size` e.g. `.text-2`
+
 - **size** is number from 1-6 from our [Type scale](https://design.gitlab.com/foundations/typography)

 ### Naming

--- a/doc/development/geo.md
+++ b/doc/development/geo.md
+# Geo (development) **[PREMIUM ONLY]**
+
+Geo connects GitLab instances together. One GitLab instance is
+designated as a **primary** node and can be run with multiple
+**secondary** nodes. Geo orchestrates quite a few components that are
+described in more detail below.
+
+## Database replication
+
+Geo uses [streaming replication](#streaming-replication) to replicate
+the database from the **primary** to the **secondary** nodes. This
+replication gives the **secondary** nodes access to all the data saved
+in the database. So users can log in on the **secondary** and read all
+the issues, merge requests, etc. on the **secondary** node.
+
+## Repository replication
+
+Geo also replicates repositories. Each **secondary** node keeps track of
+the state of every repository in the [tracking database](#tracking-database).
+
+There are a few ways a repository gets replicated by the:
+
+- [Repository Sync worker](#repository-sync-worker).
+- [Geo Log Cursor](#geo-log-cursor).
+
+### Project Registry
+
+The `Geo::ProjectRegistry` class defines the model used to track the
+state of repository replication. For each project in the main
+database, one record in the tracking database is kept.
+
+It records the following about repositories:
+
+- The last time they were synced.
+- The last time they were synced successfully.
+- If they need to be resynced.
+- When retry should be attempted.
+- The number of retries.
+- If and when the they were verified.
+
+It also stores these attributes for project wikis in dedicated columns.
+
+### Repository Sync worker
+
+The `Geo::RepositorySyncWorker` class runs periodically in the
+background and it searches the `Geo::ProjectRegistry` model for
+projects that need updating. Those projects can be:
+
+- Unsynced: Projects that have never been synced on the **secondary**
+  node and so do not exist yet.
+- Updated recently: Projects that have a `last_repository_updated_at`
+  timestamp that is more recent than the `last_repository_successful_sync_at`
+  timestamp in the `Geo::ProjectRegistry` model.
+- Manual: The admin can manually flag a repository to resync in the
+  [Geo admin panel](https://docs.gitlab.com/ee/user/admin_area/geo_nodes.html).
+
+When we fail to fetch a repository on the secondary `RETRIES_BEFORE_REDOWNLOAD`
+times, Geo does a so-called _redownload_. It will do a clean clone
+into the `@geo-temporary` directory in the root of the storage. When
+it's successful, we replace the main repo with the newly cloned one.
+
+### Geo Log Cursor
+
+The [Geo Log Cursor](#geo-log-cursor) is a separate process running on
+each **secondary** node. It monitors the [Geo Event Log](#geo-event-log)
+and handles all of the events. When it sees an unhandled event, it
+starts a background worker to handle that event, depending on the type
+of event.
+
+When a repository receives an update, the Geo **primary** node creates
+a Geo event with an associated repository updated event. The cursor
+picks that up, and schedules a `Geo::ProjectSyncWorker` job which will
+use the `Geo::RepositorySyncService` class and `Geo::WikiSyncService`
+class to update the repository and the wiki.
+
+## Uploads replication
+
+File uploads are also being replicated to the **secondary** node. To
+track the state of syncing, the `Geo::FileRegistry` model is used.
+
+### File Registry
+
+Similar to the [Project Registry](#project-registry), there is a
+`Geo::FileRegistry` model that tracks the synced uploads.
+
+CI Job Artifacts are synced in a similar way as uploads or LFS
+objects, but they are tracked by `Geo::JobArtifactRegistry` model.
+
+### File Download Dispatch worker
+
+Also similar to the [Repository Sync worker](#repository-sync-worker),
+there is a `Geo::FileDownloadDispatchWorker` class that is run
+periodically to sync all uploads that aren't synced to the Geo
+**secondary** node yet.
+
+Files are copied via HTTP(s) and initiated via the
+`/api/v4/geo/transfers/:type/:id` endpoint,
+e.g. `/api/v4/geo/transfers/lfs/123`.
+
+## Authentication
+
+To authenticate file transfers, each `GeoNode` record has two fields:
+
+- A public access key (`access_key` field).
+- A secret access key (`secret_access_key` field).
+
+The **secondary** node authenticates itself via a [JWT request](https://jwt.io/).
+When the **secondary** node wishes to download a file, it sends an
+HTTP request with the `Authorization` header:
+
+```
+Authorization: GL-Geo <access_key>:<JWT payload>
+```
+
+The **primary** node uses the `access_key` field to look up the
+corresponding Geo **secondary** node and decrypts the JWT payload,
+which contains additional information to identify the file
+request. This ensures that the **secondary** node downloads the right
+file for the right database ID. For example, for an LFS object, the
+request must also include the SHA256 sum of the file. An example JWT
+payload looks like:
+
+```
+{ "data": { sha256: "31806bb23580caab78040f8c45d329f5016b0115" }, iat: "1234567890" }
+```
+
+If the requested file matches the requested SHA256 sum, then the Geo
+**primary** node sends data via the [X-Sendfile](https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/)
+feature, which allows NGINX to handle the file transfer without tying
+up Rails or Workhorse.
+
+NOTE: **Note:**
+JWT requires synchronized clocks between the machines
+involved, otherwise it may fail with an encryption error.
+
+## Using the Tracking Database
+
+Along with the main database that is replicated, a Geo **secondary**
+node has its own separate [Tracking database](#tracking-database).
+
+The tracking database contains the state of the **secondary** node.
+
+Any database migration that needs to be run as part of an upgrade
+needs to be applied to the tracking database on each **secondary** node.
+
+### Configuration
+
+The database configuration is set in [`config/database_geo.yml`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/config/database_geo.yml.postgresql).
+The directory [`ee/db/geo`](https://gitlab.com/gitlab-org/gitlab-ee/tree/master/ee/db/geo)
+contains the schema and migrations for this database.
+
+To write a migration for the database, use the `GeoMigrationGenerator`:
+
+```
+rails g geo_migration [args] [options]
+```
+
+To migrate the tracking database, run:
+
+```
+bundle exec rake geo:db:migrate
+```
+
+### Foreign Data Wrapper
+
+The use of [FDW](#fdw) was introduced in GitLab 10.1.
+
+This is useful for the [Geo Log Cursor](#geo-log-cursor) and improves
+the performance of some synchronization operations.
+
+While FDW is available in older versions of PostgreSQL, we needed to
+raise the minimum required version to 9.6 as this includes many
+performance improvements to the FDW implementation.
+
+#### Refeshing the Foreign Tables
+
+Whenever the database schema changes on the **primary** node, the
+**secondary** node will need to refresh its foreign tables by running
+the following:
+
+```sh
+bundle exec rake geo:db:refresh_foreign_tables
+```
+
+Failure to do this will prevent the **secondary** node from
+functioning properly. The **secondary** node will generate error
+messages, as the following PostgreSQL error:
+
+```
+ERROR:  relation "gitlab_secondary.ci_job_artifacts" does not exist at character 323
+STATEMENT:                SELECT a.attname, format_type(a.atttypid, a.atttypmod),
+                          pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
+                     FROM pg_attribute a LEFT JOIN pg_attrdef d
+                       ON a.attrelid = d.adrelid AND a.attnum = d.adnum
+                    WHERE a.attrelid = '"gitlab_secondary"."ci_job_artifacts"'::regclass
+                      AND a.attnum > 0 AND NOT a.attisdropped
+                    ORDER BY a.attnum
+```
+
+## Finders
+
+Geo uses [Finders](https://gitlab.com/gitlab-org/gitlab-ee/tree/master/app/finders),
+which are classes take care of the heavy lifting of looking up
+projects/attachments/etc. in the tracking database and main database.
+
+### Finders Performance
+
+The Finders need to compare data from the main database with data in
+the tracking database. For example, counting the number of synced
+projects normally involves retrieving the project IDs from one
+database and checking their state in the other database. This is slow
+and requires a lot of memory.
+
+To overcome this, the Finders use [FDW](#fdw), or Foreign Data
+Wrappers. This allows a regular `JOIN` between the main database and
+the tracking database.
+
+## Redis
+
+Redis on the **secondary** node works the same as on the **primary**
+node. It is used for caching, storing sessions, and other persistent
+data.
+
+Redis data replication between **primary** and **secondary** node is
+not used, so sessions etc. aren't shared between nodes.
+
+## Object Storage
+
+GitLab can optionally use Object Storage to store data it would
+otherwise store on disk. These things can be:
+
+ - LFS Objects
+ - CI Job Artifacts
+ - Uploads
+
+Objects that are stored in object storage, are not handled by Geo. Geo
+ignores items in object storage. Either:
+
+- The object storage layer should take care of its own geographical
+  replication.
+- All secondary nodes should use the same storage node.
+
+## Verification
+
+### Repository verification
+
+Repositories are verified with a checksum.
+
+The **primary** node calculates a checksum on the repository. It
+basically hashes all Git refs together and stores that hash in the
+`project_repository_states` table of the database.
+
+The **secondary** node does the same to calculate the hash of its
+clone, and compares the hash with the value the **primary** node
+calculated. If there is a mismatch, Geo will mark this as a mismatch
+and the administrator can see this in the [Geo admin panel](https://docs.gitlab.com/ee/user/admin_area/geo_nodes.html).
+
+## Glossary
+
+### Primary node
+
+A **primary** node is the single node in a Geo setup that read-write
+capabilities. It's the single source of truth and the Geo
+**secondary** nodes replicate their data from there.
+
+In a Geo setup, there can only be one **primary** node. All
+**secondary** nodes connect to that **primary**.
+
+### Secondary node
+
+A **secondary** node is a read-only replica of the **primary** node
+running in a different geographical location.
+
+### Streaming replication
+
+Geo depends on the streaming replication feature of PostgreSQL. It
+completely replicates the database data and the database schema. The
+database replica is a read-only copy.
+
+Streaming replication depends on the Write Ahead Logs, or WAL. Those
+logs are copied over to the replica and replayed there.
+
+Since streaming replication also replicates the schema, the database
+migration do not need to run on the secondary nodes.
+
+### Tracking database
+
+A database on each Geo **secondary** node that keeps state for the node
+on which it resides. Read more in [Using the Tracking database](#using-the-tracking-database).
+
+### FDW
+
+Foreign Data Wrapper, or FDW, is a feature built-in in PostgreSQL. It
+allows data to be queried from different data sources. In Geo, it's
+used to query data from different PostgreSQL instances.
+
+## Geo Event Log
+
+The Geo **primary** stores events in the `geo_event_log` table. Each
+entry in the log contains a specific type of event. These type of
+events include:
+
+ - Repository Deleted event
+ - Repository Renamed event
+ - Repositories Changed event
+ - Repository Created event
+ - Hashed Storage Migrated event
+ - Lfs Object Deleted event
+ - Hashed Storage Attachments event
+ - Job Artifact Deleted event
+ - Upload Deleted event
+
+### Geo Log Cursor
+
+The process running on the **secondary** node that looks for new
+`Geo::EventLog` rows.
+
+## Code features
+
+### `Gitlab::Geo` utilities
+
+Small utility methods related to Geo go into the
+[`ee/lib/gitlab/geo.rb`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/gitlab/geo.rb)
+file.
+
+Many of these methods are cached using the `RequestStore` class, to
+reduce the performance impact of using the methods throughout the
+codebase.
+
+#### Current node
+
+The class method `.current_node` returns the `GeoNode` record for the
+current node.
+
+We use the `host`, `port`, and `relative_url_root` values from
+`gitlab.yml` and search in the database to identify which node we are
+in (see `GeoNode.current_node`).
+
+#### Primary or secondary
+
+To determine whether the current node is a **primary** node or a
+**secondary** node use the `.primary?` and `.secondary?` class
+methods.
+
+It is possible for these methods to both return `false` on a node when
+the node is not enabled. See [Enablement](#enablement).
+
+#### Geo Database configured?
+
+There is also an additional gotcha when dealing with things that
+happen during initialization time. In a few places, we use the
+`Gitlab::Geo.geo_database_configured?` method to check if the node has
+the tracking database, which only exists on the **secondary**
+node. This overcomes race conditions that could happen during
+bootstrapping of a new node.
+
+#### Enablement
+
+We consider Geo feature enabled when the user has a valid license with the
+feature included, and they have at least one node defined at the Geo Nodes
+screen.
+
+See `Gitlab::Geo.enabled?` and `Gitlab::Geo.license_allows?` methods.
+
+#### Read-only
+
+All Geo **secondary** nodes are read-only.
+
+The general principle of a [read-only database](verifying_database_capabilities.md#read-only-database)
+applies to all Geo **secondary** nodes. So the
+`Gitlab::Database.read_only?` method will always return `true` on a
+**secondary** node.
+
+When some write actions are not allowed because the node is a
+**secondary**, consider adding the `Gitlab::Database.read_only?` or
+`Gitlab::Database.read_write?` guard, instead of `Gitlab::Geo.secondary?`.
+
+The database itself will already be read-only in a replicated setup,
+so we don't need to take any extra step for that.
+
+## History of communication channel
+
+The communication channel has changed since first iteration, you can
+check here historic decisions and why we moved to new implementations.
+
+### Custom code (GitLab 8.6 and earlier)
+
+In GitLab versions before 8.6, custom code is used to handle
+notification from **primary** node to **secondary** nodes by HTTP
+requests.
+
+### System hooks (GitLab 8.7 to 9.5)
+
+Later, it was decided to move away from custom code and begin using
+system hooks. More people were using them, so
+many would benefit from improvements made to this communication layer.
+
+There is a specific **internal** endpoint in our API code (Grape),
+that receives all requests from this System Hooks:
+`/api/v4/geo/receive_events`.
+
+We switch and filter from each event by the `event_name` field.
+
+### Geo Log Cursor (GitLab 10.0 and up)
+
+Since GitLab 10.0, [System Webhooks](#system-hooks-gitlab-87-to-95) are no longer
+used and Geo Log Cursor is used instead. The Log Cursor traverses the
+`Geo::EventLog` rows to see if there are changes since the last time
+the log was checked and will handle repository updates, deletes,
+changes, and renames.
+
+The table is within the replicated database. This has two advantages over the
+old method:
+
+- Replication is synchronous and we preserve the order of events.
+- Replication of the events happen at the same time as the changes in the
+   database.
--- a/doc/development/go_guide/index.md
+++ b/doc/development/go_guide/index.md
@@ -93,7 +93,7 @@ become available, you will be able to share job templates like this

 Dependencies should be kept to the minimum. The introduction of a new
 dependency should be argued in the merge request, as per our [Approval
-Guidelines](../code_review.html#approval-guidelines). Both [License
+Guidelines](../code_review.md#approval-guidelines). Both [License
 Management](https://docs.gitlab.com/ee/user/project/merge_requests/license_management.html)
 **[ULTIMATE]** and [Dependency
 Scanning](https://docs.gitlab.com/ee/user/project/merge_requests/dependency_scanning.html)

--- a/doc/development/licensed_feature_availability.md
+++ b/doc/development/licensed_feature_availability.md
+# Licensed feature availability **[STARTER]**
+
+As of GitLab 9.4, we've been supporting a simplified version of licensed 
+feature availability checks via `ee/app/models/license.rb`, both for 
+on-premise or GitLab.com plans and features.
+
+## Restricting features scoped by namespaces or projects
+
+GitLab.com plans are persisted on user groups and namespaces, therefore, if you're adding a
+feature such as [Related issues](https://docs.gitlab.com/ee/user/project/issues/related_issues.html) or 
+[Service desk](https://docs.gitlab.com/ee/user/project/service_desk.html), 
+it should be restricted on namespace scope.
+
+1. Add the feature symbol on `EES_FEATURES`, `EEP_FEATURES` or `EEU_FEATURES` constants in 
+  `ee/app/models/license.rb`. Note on `ee/app/models/ee/namespace.rb` that _Bronze_ GitLab.com 
+  features maps to on-premise _EES_, _Silver_ to _EEP_ and _Gold_ to _EEU_.
+2. Check using:
+
+```ruby
+project.feature_available?(:feature_symbol)
+```
+
+## Restricting global features (instance)
+
+However, for features such as [Geo](https://docs.gitlab.com/ee/administration/geo/replication/index.html) and 
+[Load balancing](https://docs.gitlab.com/ee/administration/database_load_balancing.html), which cannot be restricted 
+to only a subset of projects or namespaces, the check will be made directly in 
+the instance license.
+
+1. Add the feature symbol on `EES_FEATURES`, `EEP_FEATURES` or `EEU_FEATURES` constants in 
+  `ee/app/models/license.rb`.
+2. Add the same feature symbol to `GLOBAL_FEATURES`
+3. Check using:
+
+```ruby
+License.feature_available?(:feature_symbol)
+```
--- a/doc/development/packages.md
+++ b/doc/development/packages.md
+# Packages **[PREMIUM]**
+
+This document will guide you through adding another [package management system](https://docs.gitlab.com/ee/administration/packages.html) support to GitLab.
+
+See already supported package types in [Packages documentation](https://docs.gitlab.com/ee/administration/packages.html)
+
+Since GitLab packages' UI is pretty generic, it is possible to add new
+package system support by solely backend changes. This guide is superficial and does 
+not cover the way the code should be written. However, you can find a good example 
+by looking at existing merge requests with Maven and NPM support: 
+
+- [NPM registry support](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8673). 
+- [Maven repository](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/6607).
+- [Instance level endpoint for Maven repository](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8757)
+
+## General information
+
+The existing database model requires the following:
+
+- Every package belongs to a project. 
+- Every package file belongs to a package.
+- A package can have one or more package files.
+- The package model is based on storing information about the package and its version.
+
+## API endpoints
+
+Package systems work with GitLab via API. For example `ee/lib/api/npm_packages.rb` 
+implements API endpoints to work with NPM clients. So, the first thing to do is to 
+add a new `ee/lib/api/your_name_packages.rb` file with API endpoints that are 
+necessary to make the package system client to work. Usually that means having 
+endpoints like: 
+
+- GET package information.
+- GET package file content.
+- PUT upload package.
+
+Since the packages belong to a project, it's expected to have project-level endpoint
+for uploading and downloading them. For example: 
+
+```
+GET https://gitlab.com/api/v4/projects/<your_project_id>/packages/npm/
+PUT https://gitlab.com/api/v4/projects/<your_project_id>/packages/npm/
+```
+
+Group-level and instance-level endpoints are good to have but are optional. 
+
+NOTE: **Note:**
+To avoid name conflict for instance-level endpoints we use 
+[the package naming convention](https://docs.gitlab.com/ee/user/project/packages/npm_registry.html#package-naming-convention)
+
+## Configuration
+
+GitLab has a `packages` section in its configuration file (`gitlab.rb`). 
+It applies to all package systems supported by GitLab. Usually you don't need 
+to add anything there. 
+
+Packages can be configured to use object storage, therefore your code must support it. 
+
+## Database
+
+The current database model allows you to store a name and a version for each package.
+Every time you upload a new package, you can either create a new record of `Package`
+or add files to existing record. `PackageFile` should be able to store all file-related
+information like the file `name`, `side`, `sha1`, etc.
+
+If there is specific data necessary to be stored for only one package system support, 
+consider creating a separate metadata model. See `packages_maven_metadata` table 
+and `Packages::MavenMetadatum` model as example for package specific data.
--- a/doc/development/rake_tasks.md
+++ b/doc/development/rake_tasks.md
@@ -28,6 +28,24 @@ bin/rake "gitlab:seed:issues[group-path/project-path]"
 By default, this seeds an average of 2 issues per week for the last 5 weeks per
 project.

+#### Seeding issues for Insights charts **[ULTIMATE]**
+
+You can seed issues specifically for working with the
+[Insights charts](https://docs.gitlab.com/ee/user/group/insights/index.html) with the
+`gitlab:seed:insights:issues` task:
+
+```shell
+# All projects
+bin/rake gitlab:seed:insights:issues
+
+# A specific project
+bin/rake "gitlab:seed:insights:issues[group-path/project-path]"
+```
+
+By default, this seeds an average of 10 issues per week for the last 52 weeks
+per project. All issues will also be randomly labeled with team, type, severity,
+and priority.
+
 ### Automation

 If you're very sure that you want to **wipe the current database** and refill