Commit e4e45f43 authored by Micaël Bergeron's avatar Micaël Bergeron

add documentation

parent d50c4078
# Pseudonymizer
## Object Storage Settings
**In Omnibus installations:**
1. Edit `/etc/gitlab/gitlab.rb` and add the following lines by replacing with
the values you want:
```ruby
gitlab_rails['pseudonymizer_enabled'] = true
gitlab_rails['pseudonymizer_manifest'] = 'lib/pseudonymizer/manifest.yml'
gitlab_rails['pseudonymizer_upload_remote_directory'] = 'gitlab-elt'
gitlab_rails['pseudonymizer_upload_connection'] = {
'provider' => 'AWS',
'region' => 'eu-central-1',
'aws_access_key_id' => 'AWS_ACCESS_KEY_ID',
'aws_secret_access_key' => 'AWS_SECRET_ACCESS_KEY'
}
```
>**Note:**
If you are using AWS IAM profiles, be sure to omit the AWS access key and secret access key/value pairs.
```ruby
gitlab_rails['pseudonymizer_upload_connection'] = {
'provider' => 'AWS',
'region' => 'eu-central-1',
'use_iam_profile' => true
}
```
1. Save the file and [reconfigure GitLab][] for the changes to take effect.
---
**In installations from source:**
1. Edit `/home/git/gitlab/config/gitlab.yml` and add or amend the following
lines:
```yaml
pseudonymizer:
enabled: true
manifest: lib/pseudonymizer/manifest.yml
upload:
remote_directory: 'gitlab-elt' # The bucket name
connection:
provider: AWS # Only AWS supported at the moment
aws_access_key_id: AWS_ACESS_KEY_ID
aws_secret_access_key: AWS_SECRET_ACCESS_KEY
region: eu-central-1
```
1. Save the file and [restart GitLab][] for the changes to take effect.
# Pseudonymizer
> [Introduced](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/5532) in [GitLab Enterprise Edition][ee] 11.1
## Export GitLab's data for safe analytics
As the GitLab's database host sensitive informations, using it unfiltered for analytics implies high security requirements. To help alleviate this constraint, the Pseudonymizer service shall export GitLab's data, in a pseudonymized way.
### Pseudonymization
> **Note:**
> This process is not impervious: if the source data is available, it is possible for an user to correlate data to the pseudonymized version.
The Pseudonymizer currently uses `HMAC(SHA256)` to mutate fields that should not textually exported. This should ensure that:
- End-user of the data source cannot infer/revert the pseudonymized fields
- Referencial integrity is maintained
### Manifest
The manifest is a file that describe which fields should be included or pseudonymized.
You may find this manifest at `lib/pseudonymizer/manifest.yml`.
### Usage
> **Note:**
> You can configure the pseudonymizer using the following environment variables:
>
> - PSEUDONYMIZER_OUTPUT_DIR: where to store the output CSV files (default: `/tmp`)
> - PSEUDONYMIZER_BATCH: the batch size when querying the DB (default: `100 000`)
> **Note:**
> Object store is required for the pseudonymizer to work properly.
```
bundle exec rake gitlab:db:pseudonymizer
```
### Output
> **Note:**
> The output CSV files might be very large. Make sure the `PSEUDONYMIZER_OUTPUT_DIR` has sufficient space. As a rule of thumb, at least 10% of the database size is recommended.
After the pseudonymizer has run, the output CSV files should be uploaded to the configured object store.
### Configuration
See [administration].
[ee]: https://about.gitlab.com/products/
[administration]: administration/pseudonymizer.md
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment