Docs: Merge EE doc/development to CE

02074684 · Marcel Amirault · Achilleas Pipinellis · 6f54ced4 · 02074684 · 02074684
Commit 02074684 authored May 05, 2019 by Marcel Amirault Committed by Achilleas Pipinellis May 05, 2019
9 changed files
--- a/doc/development/README.md
+++ b/doc/development/README.md
@@ -38,6 +38,7 @@ description: 'Learn how to contribute to GitLab.'
 - [Sidekiq guidelines](sidekiq_style_guide.md) for working with Sidekiq workers
 - [Working with Gitaly](gitaly.md)
 - [Manage feature flags](feature_flags.md)
+- [Licensed feature availability](licensed_feature_availability.md)
 - [View sent emails or preview mailers](emails.md)
 - [Shell commands](shell_commands.md) in the GitLab codebase
 - [`Gemfile` guidelines](gemfile.md)
@@ -48,6 +49,7 @@ description: 'Learn how to contribute to GitLab.'
 - [How to dump production data to staging](db_dump.md)
 - [Working with the GitHub importer](github_importer.md)
 - [Import/Export development documentation](import_export.md)
+- [Elasticsearch integration docs](elasticsearch.md)
 - [Working with Merge Request diffs](diffs.md)
 - [Kubernetes integration guidelines](kubernetes.md)
 - [Permissions](permissions.md)
@@ -55,6 +57,7 @@ description: 'Learn how to contribute to GitLab.'
 - [Guidelines for reusing abstractions](reusing_abstractions.md)
 - [DeclarativePolicy framework](policies.md)
 - [How Git object deduplication works in GitLab](git_object_deduplication.md)
+- [Geo development](geo.md)
 ## Performance guides

--- a/doc/development/contributing/merge_request_workflow.md
+++ b/doc/development/contributing/merge_request_workflow.md
@@ -155,7 +155,7 @@ the contribution acceptance criteria below:
     restarting the failing CI job, rebasing from master to bring in updates that
     may resolve the failure, or if it has not been fixed yet, ask a developer to
     help you fix the test.
-1. The MR initially contains a a few logically organized commits.
+1. The MR initially contains a few logically organized commits.
 1. The changes can merge without problems. If not, you should rebase if you're the
   only one working on your feature branch, otherwise merge `master`.
 1. Only one specific issue is fixed or one specific feature is implemented. Do not

--- a/doc/development/elasticsearch.md
+++ b/doc/development/elasticsearch.md
+# Elasticsearch knowledge **[STARTER ONLY]**
+This area is to maintain a compendium of useful information when working with elasticsearch.
+Information on how to enable ElasticSearch and perform the initial indexing is kept in https://docs.gitlab.com/ee/integration/elasticsearch.html#enabling-elasticsearch
+## Initial installation on OS X
+It is recommended to use the Docker image. After installing docker you can immediately spin up an instance with
+```
+docker run --name elastic56 -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:5.6.12
+```
+and use `docker stop elastic56` and `docker start elastic56` to stop/start it.
+### Installing on the host
+We currently only support Elasticsearch [5.6 to 6.x](https://docs.gitlab.com/ee/integration/elasticsearch.html#requirements)
+Version 5.6 is available on homebrew and is the recommended version to use in order to test compatibility.
+```
+brew install elasticsearch@5.6
+```
+There is no need to install any plugins
+## New repo indexer (beta)
+If you're interested on working with the new beta repo indexer, all you need to do is:
+- git clone git@gitlab.com:gitlab-org/gitlab-elasticsearch-indexer.git
+- make
+- make install
+this adds `gitlab-elasticsearch-indexer` to `$GOPATH/bin`, please make sure that is in your `$PATH`. After that GitLab will find it and you'll be able to enable it in the admin settings area.
+**note:** `make` will not recompile the executable unless you do `make clean` beforehand
+## Helpful rake tasks
+- `gitlab:elastic:test:index_size`: Tells you how much space the current index is using, as well as how many documents are in the index.
+- `gitlab:elastic:test:index_size_change`: Outputs index size, reindexes, and outputs index size again. Useful when testing improvements to indexing size.
+Additionally, if you need large repos or multiple forks for testing, please consider [following these instructions](https://docs.gitlab.com/ee/development/rake_tasks.html#extra-project-seed-options)
+## How does it work?
+The ElasticSearch integration depends on an external indexer. We ship a [ruby indexer](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/bin/elastic_repo_indexer) by default but are also working on an [indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer). The user must trigger the initial indexing via a rake task, but after this is done GitLab itself will trigger reindexing when required via `after_` callbacks on create, update, and destroy that are inherited from [/ee/app/models/concerns/elastic/application_search.rb](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/app/models/concerns/elastic/application_search.rb).
+All indexing after the initial one is done via `ElasticIndexerWorker` (sidekiq jobs).
+Search queries are generated by the concerns found in [ee/app/models/concerns/elastic](https://gitlab.com/gitlab-org/gitlab-ee/tree/master/ee/app/models/concerns/elastic). These concerns are also in charge of access control, and have been a historic source of security bugs so please pay close attention to them!
+## Existing Analyzers/Tokenizers/Filters
+These are all defined in https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/elasticsearch/git/model.rb
+### Analyzers
+#### `path_analyzer`
+Used when indexing blobs' paths. Uses the `path_tokenizer` and the `lowercase` and `asciifolding` filters.
+Please see the `path_tokenizer` explanation below for an example.
+#### `sha_analyzer`
+Used in blobs and commits. Uses the `sha_tokenizer` and the `lowercase` and `asciifolding` filters.
+Please see the `sha_tokenizer` explanation later below for an example.
+#### `code_analyzer`
+Used when indexing a blob's filename and content. Uses the `whitespace` tokenizer and the filters: `code`, `edgeNGram_filter`, `lowercase`, and `asciifolding`
+The `whitespace` tokenizer was selected in order to have more control over how tokens are split. For example the string `Foo::bar(4)` needs to generate tokens like `Foo` and `bar(4)` in order to be properly searched.
+Please see the `code` filter for an explanation on how tokens are split.
+#### `code_search_analyzer`
+Not directly used for indexing, but rather used to transform a search input. Uses the `whitespace` tokenizer and the `lowercase` and `asciifolding` filters.
+### Tokenizers
+#### `sha_tokenizer`
+This is a custom tokenizer that uses the [`edgeNGram` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenizer.html) to allow SHAs to be searcheable by any sub-set of it (minimum of 5 chars).
+example:
+`240c29dc7e` becomes:
+- `240c2`
+- `240c29`
+- `240c29d`
+- `240c29dc`
+- `240c29dc7`
+- `240c29dc7e`
+#### `path_tokenizer`
+This is a custom tokenizer that uses the [`path_hierarchy` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-pathhierarchy-tokenizer.html) with `reverse: true` in order to allow searches to find paths no matter how much or how little of the path is given as input.
+example:
+`'/some/path/application.js'` becomes:
+- `'/some/path/application.js'`
+- `'some/path/application.js'`
+- `'path/application.js'`
+- `'application.js'`
+### Filters
+#### `code`
+Uses a [Pattern Capture token filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-pattern-capture-tokenfilter.html) to split tokens into more easily searched versions of themselves. 
+Patterns:
+- `"(\\p{Ll}+|\\p{Lu}\\p{Ll}+|\\p{Lu}+)"`: captures CamelCased and lowedCameCased strings as separate tokens
+- `"(\\d+)"`: extracts digits
+- `"(?=([\\p{Lu}]+[\\p{L}]+))"`: captures CamelCased strings recursively. Ex: `ThisIsATest` => `[ThisIsATest, IsATest, ATest, Test]`
+- `'"((?:\\"|[^"]|\\")*)"'`: captures terms inside quotes, removing the quotes
+- `"'((?:\\'|[^']|\\')*)'"`: same as above, for single-quotes
+- `'\.([^.]+)(?=\.|\s|\Z)'`: separate terms with periods in-between
+- `'\/?([^\/]+)(?=\/|\b)'`: separate path terms `like/this/one`
+#### `edgeNGram_filter`
+Uses an [Edge NGram token filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenfilter.html) to allow inputs with only parts of a token to find the token. For example it would turn `glasses` into permutations starting with `gl` and ending with `glasses`, which would allow a search for "`glass`" to find the original token `glasses`
+## Gotchas
+- Searches can have their own analyzers. Remember to check when editing analyzers
+- `Character` filters (as opposed to token filters) always replace the original character, so they're not a good choice as they can hinder exact searches
+## Troubleshooting
+### Getting "flood stage disk watermark [95%] exceeded"
+You might get an error such as
+```
+[2018-10-31T15:54:19,762][WARN ][o.e.c.r.a.DiskThresholdMonitor] [pval5Ct] 
+   flood stage disk watermark [95%] exceeded on 
+   [pval5Ct7SieH90t5MykM5w][pval5Ct][/usr/local/var/lib/elasticsearch/nodes/0] free: 56.2gb[3%], 
+   all indices on this node will be marked read-only
+```
+This is because you've exceeded the disk space threshold - it thinks you don't have enough disk space left, based on the default 95% threshold.  
+In addition, the `read_only_allow_delete` setting will be set to `true`.  It will block indexing, `forcemerge`, etc
+```
+curl "http://localhost:9200/gitlab-development/_settings?pretty"
+```
+Add this to your `elasticsearch.yml` file:
+```
+# turn off the disk allocator
+cluster.routing.allocation.disk.threshold_enabled: false 
+```
+_or_
+```
+# set your own limits
+cluster.routing.allocation.disk.threshold_enabled: true 
+cluster.routing.allocation.disk.watermark.flood_stage: 5gb   # ES 6.x only
+cluster.routing.allocation.disk.watermark.low: 15gb 
+cluster.routing.allocation.disk.watermark.high: 10gb
+```
+Restart ElasticSearch, and the `read_only_allow_delete` will clear on it's own.
+_from "Disk-based Shard Allocation | Elasticsearch Reference" [5.6](https://www.elastic.co/guide/en/elasticsearch/reference/5.6/disk-allocator.html#disk-allocator) and [6.x](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/disk-allocator.html)_
--- a/doc/development/fe_guide/style_guide_scss.md
+++ b/doc/development/fe_guide/style_guide_scss.md
@@ -16,10 +16,12 @@ New utility classes should be added to [`utilities.scss`](https://gitlab.com/git
 **Background color**: `.bg-variant-shade` e.g. `.bg-warning-400`  
 **Text color**: `.text-variant-shade` e.g. `.text-success-500` 
 - variant is one of 'primary', 'secondary', 'success', 'warning', 'error'
 - shade is on of the shades listed on [colors](https://design.gitlab.com/foundations/colors/)
 **Font size**: `.text-size` e.g. `.text-2`
 - **size** is number from 1-6 from our [Type scale](https://design.gitlab.com/foundations/typography)
 ### Naming

--- a/doc/development/geo.md
+++ b/doc/development/geo.md
--- a/doc/development/go_guide/index.md
+++ b/doc/development/go_guide/index.md
@@ -93,7 +93,7 @@ become available, you will be able to share job templates like this
 Dependencies should be kept to the minimum. The introduction of a new
 dependency should be argued in the merge request, as per our [Approval
-Guidelines](../code_review.html#approval-guidelines). Both [License
+Guidelines](../code_review.md#approval-guidelines). Both [License
 Management](https://docs.gitlab.com/ee/user/project/merge_requests/license_management.html)
 **[ULTIMATE]** and [Dependency
 Scanning](https://docs.gitlab.com/ee/user/project/merge_requests/dependency_scanning.html)

--- a/doc/development/licensed_feature_availability.md
+++ b/doc/development/licensed_feature_availability.md
+# Licensed feature availability **[STARTER]**
+As of GitLab 9.4, we've been supporting a simplified version of licensed 
+feature availability checks via `ee/app/models/license.rb`, both for 
+on-premise or GitLab.com plans and features.
+## Restricting features scoped by namespaces or projects
+GitLab.com plans are persisted on user groups and namespaces, therefore, if you're adding a
+feature such as [Related issues](https://docs.gitlab.com/ee/user/project/issues/related_issues.html) or 
+[Service desk](https://docs.gitlab.com/ee/user/project/service_desk.html), 
+it should be restricted on namespace scope.
+1. Add the feature symbol on `EES_FEATURES`, `EEP_FEATURES` or `EEU_FEATURES` constants in 
+  `ee/app/models/license.rb`. Note on `ee/app/models/ee/namespace.rb` that _Bronze_ GitLab.com 
+  features maps to on-premise _EES_, _Silver_ to _EEP_ and _Gold_ to _EEU_.
+2. Check using:
+```ruby
+project.feature_available?(:feature_symbol)
+```
+## Restricting global features (instance)
+However, for features such as [Geo](https://docs.gitlab.com/ee/administration/geo/replication/index.html) and 
+[Load balancing](https://docs.gitlab.com/ee/administration/database_load_balancing.html), which cannot be restricted 
+to only a subset of projects or namespaces, the check will be made directly in 
+the instance license.
+1. Add the feature symbol on `EES_FEATURES`, `EEP_FEATURES` or `EEU_FEATURES` constants in 
+  `ee/app/models/license.rb`.
+2. Add the same feature symbol to `GLOBAL_FEATURES`
+3. Check using:
+```ruby
+License.feature_available?(:feature_symbol)
+```
--- a/doc/development/packages.md
+++ b/doc/development/packages.md
+# Packages **[PREMIUM]**
+This document will guide you through adding another [package management system](https://docs.gitlab.com/ee/administration/packages.html) support to GitLab.
+See already supported package types in [Packages documentation](https://docs.gitlab.com/ee/administration/packages.html)
+Since GitLab packages' UI is pretty generic, it is possible to add new
+package system support by solely backend changes. This guide is superficial and does 
+not cover the way the code should be written. However, you can find a good example 
+by looking at existing merge requests with Maven and NPM support: 
+- [NPM registry support](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8673). 
+- [Maven repository](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/6607).
+- [Instance level endpoint for Maven repository](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8757)
+## General information
+The existing database model requires the following:
+- Every package belongs to a project. 
+- Every package file belongs to a package.
+- A package can have one or more package files.
+- The package model is based on storing information about the package and its version.
+## API endpoints
+Package systems work with GitLab via API. For example `ee/lib/api/npm_packages.rb` 
+implements API endpoints to work with NPM clients. So, the first thing to do is to 
+add a new `ee/lib/api/your_name_packages.rb` file with API endpoints that are 
+necessary to make the package system client to work. Usually that means having 
+endpoints like: 
+- GET package information.
+- GET package file content.
+- PUT upload package.
+Since the packages belong to a project, it's expected to have project-level endpoint
+for uploading and downloading them. For example: 
+```
+GET https://gitlab.com/api/v4/projects/<your_project_id>/packages/npm/
+PUT https://gitlab.com/api/v4/projects/<your_project_id>/packages/npm/
+```
+Group-level and instance-level endpoints are good to have but are optional. 
+NOTE: **Note:**
+To avoid name conflict for instance-level endpoints we use 
+[the package naming convention](https://docs.gitlab.com/ee/user/project/packages/npm_registry.html#package-naming-convention)
+## Configuration
+GitLab has a `packages` section in its configuration file (`gitlab.rb`). 
+It applies to all package systems supported by GitLab. Usually you don't need 
+to add anything there. 
+Packages can be configured to use object storage, therefore your code must support it. 
+## Database
+The current database model allows you to store a name and a version for each package.
+Every time you upload a new package, you can either create a new record of `Package`
+or add files to existing record. `PackageFile` should be able to store all file-related
+information like the file `name`, `side`, `sha1`, etc.
+If there is specific data necessary to be stored for only one package system support, 
+consider creating a separate metadata model. See `packages_maven_metadata` table 
+and `Packages::MavenMetadatum` model as example for package specific data.
--- a/doc/development/rake_tasks.md
+++ b/doc/development/rake_tasks.md
@@ -28,6 +28,24 @@ bin/rake "gitlab:seed:issues[group-path/project-path]"
 By default, this seeds an average of 2 issues per week for the last 5 weeks per
 project.
+#### Seeding issues for Insights charts **[ULTIMATE]**
+You can seed issues specifically for working with the
+[Insights charts](https://docs.gitlab.com/ee/user/group/insights/index.html) with the
+`gitlab:seed:insights:issues` task:
+```shell
+# All projects
+bin/rake gitlab:seed:insights:issues
+# A specific project
+bin/rake "gitlab:seed:insights:issues[group-path/project-path]"
+```
+By default, this seeds an average of 10 issues per week for the last 52 weeks
+per project. All issues will also be randomly labeled with team, type, severity,
+and priority.
 ### Automation
 If you're very sure that you want to **wipe the current database** and refill