Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
G
gitlab-ce
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
nexedi
gitlab-ce
Commits
57e93c0b
Commit
57e93c0b
authored
Sep 21, 2017
by
Gabriel Mazetto
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Improved Geo development documentation with 10.0 and 10.1 changes.
parent
423f153d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
48 additions
and
8 deletions
+48
-8
doc/development/geo.md
doc/development/geo.md
+48
-8
No files found.
doc/development/geo.md
View file @
57e93c0b
# GitLab Geo
# GitLab Geo
Geo feature requires that we orchestrate a lot of components together.
Geo feature requires that we orchestrate a lot of components together.
For the Database we need to setup replication, writing operations that stores
For the Database we need to setup a streaming replication. Any operation on disk
data directly to disk replicates asynchronously by sending Webhook requests
is logged in an events table, that will leverage the database replication itself
from
**Primary**
to
**Secondary**
nodes, and _assets_ are to be replicated in
from
**Primary**
to
**Secondary**
nodes. These events are processed by the
a future release using either a shared filesystem architecture or an object
**Geo Log Cursor**
daemon (on the Secondary) and asynchronous jobs takes care of
store setup with geographical replication.
the changes.
To keep track on the state of the replication,
**Secondary**
nodes includes an
additional PostgreSQL database, that includes metadata from all the tracked
repositories and assets. This additional database is required because we can't
do any writing operation on the replicated database.
## Primary and Secondary
## Primary and Secondary
...
@@ -37,12 +42,18 @@ if Gitlab::Geo.secondary?
...
@@ -37,12 +42,18 @@ if Gitlab::Geo.secondary?
end
end
```
```
`.primary?`
and
`.secondary?`
are not mutually exclu
dabl
e, so you should never
`.primary?`
and
`.secondary?`
are not mutually exclu
siv
e, so you should never
take for granted that when one of them returns
`false`
, other will be true.
take for granted that when one of them returns
`false`
, other will be true.
Both methods check if Geo is
`.enabled?`
, so there is a "third" state where
Both methods check if Geo is
`.enabled?`
, so there is a "third" state where
both will return false (when Geo is not enabled).
both will return false (when Geo is not enabled).
There is also an additional gotcha when dealing with
`initializers`
or with
things that happen during initialization time. We use in a few places the
`Gitlab::Geo.geo_database_configured?`
to check if node has the additional
database which only happens in the secondary node, so we can overcome some
racing conditions that could happen during bootstrapping of a new node.
## Enablement
## Enablement
...
@@ -50,9 +61,14 @@ We consider Geo feature enabled when the user has a valid license with the
...
@@ -50,9 +61,14 @@ We consider Geo feature enabled when the user has a valid license with the
feature included, and they have at least one node defined at the Geo Nodes
feature included, and they have at least one node defined at the Geo Nodes
screen.
screen.
See
`Gitlab::Geo.enabled?`
and
`Gitlab::Geo.license_allows?`
.
## Communication
## Communication
The communication channel has changed since first iteration, you can check here
historic decisions and why we moved to new implementations.
### Custom code (GitLab 8.6 and earlier)
### Custom code (GitLab 8.6 and earlier)
In GitLab versions before 8.6 custom code is used to handle
In GitLab versions before 8.6 custom code is used to handle
...
@@ -66,10 +82,11 @@ improvements made to this communication layer.
...
@@ -66,10 +82,11 @@ improvements made to this communication layer.
There is a specific
**internal**
endpoint in our api code (Grape),
There is a specific
**internal**
endpoint in our api code (Grape),
that receives all requests from this System Hooks:
that receives all requests from this System Hooks:
`/api/
v3
/geo/receive_events`
.
`/api/
{v3,v4}
/geo/receive_events`
.
We switch and filter from each event by the
`event_name`
field.
We switch and filter from each event by the
`event_name`
field.
### Geo Log Cursor (GitLab 10.0 and up)
### Geo Log Cursor (GitLab 10.0 and up)
Since GitLab 10.0,
**System Webhooks**
are no longer used, and Geo Log
Since GitLab 10.0,
**System Webhooks**
are no longer used, and Geo Log
...
@@ -77,6 +94,13 @@ Cursor is used instead. The Log Cursor traverses the `Geo::EventLog`
...
@@ -77,6 +94,13 @@ Cursor is used instead. The Log Cursor traverses the `Geo::EventLog`
to see if there are changes since the last time the log was checked
to see if there are changes since the last time the log was checked
and will handle repository updates, deletes, changes & renames.
and will handle repository updates, deletes, changes & renames.
The table is within the replicated database. This has two advantages over the
old method:
1.
Replication is synchronous and we preserve the order of events
2.
Replication of the events happen at the same time as the changes in the
database
## Readonly
## Readonly
...
@@ -92,6 +116,7 @@ take any extra step for that.
...
@@ -92,6 +116,7 @@ take any extra step for that.
We do use our feature toggle
`.secondary?`
to coordinate Git operations and do
We do use our feature toggle
`.secondary?`
to coordinate Git operations and do
the correct authorization (denying writing on any secondary node).
the correct authorization (denying writing on any secondary node).
## File Transfers
## File Transfers
Secondary Geo Nodes need to transfer files, such as LFS objects, attachments, avatars,
Secondary Geo Nodes need to transfer files, such as LFS objects, attachments, avatars,
...
@@ -101,6 +126,7 @@ that records which objects it needs to transfer.
...
@@ -101,6 +126,7 @@ that records which objects it needs to transfer.
Files are copied via HTTP(s) and initiated via the
Files are copied via HTTP(s) and initiated via the
`/api/v4/geo/transfers/:type/:id`
endpoint.
`/api/v4/geo/transfers/:type/:id`
endpoint.
### Authentication
### Authentication
To authenticate file transfers, each GeoNode has two fields:
To authenticate file transfers, each GeoNode has two fields:
...
@@ -127,10 +153,14 @@ include the SHA256 of the file. An example JWT payload looks like:
...
@@ -127,10 +153,14 @@ include the SHA256 of the file. An example JWT payload looks like:
```
```
If the data checks out, then the Geo primary sends data via the
If the data checks out, then the Geo primary sends data via the
[
XSendfile
](
https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/
)
[
X
-
Sendfile
](
https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/
)
feature, which allows nginx to handle the file transfer without tying up Rails
feature, which allows nginx to handle the file transfer without tying up Rails
or Workhorse.
or Workhorse.
Please note that JWT requires synchronized clocks between involved machines,
otherwise it may fail with an encryption error.
## Geo Tracking Database
## Geo Tracking Database
Secondary Geo nodes track data about what has been downloaded in a second
Secondary Geo nodes track data about what has been downloaded in a second
...
@@ -149,3 +179,13 @@ To migrate the tracking database, run:
...
@@ -149,3 +179,13 @@ To migrate the tracking database, run:
```
```
bundle exec rake geo:db:migrate
bundle exec rake geo:db:migrate
```
```
In 10.1 we are introducing PostgreSQL FDW to bridge this database with the
replicated one, so we can perform queries joining tables from both instances.
This is useful for the Geo Log Cursor and improves the performance of some
synchronization operations.
While FDW is available in older versions of Postgres, we needed to bump the
minimum required version to 9.6 as this includes many performance improvements
to the FDW implementation.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment