Commit 22e1eeee authored by Nick Thomas's avatar Nick Thomas

Merge branch '3145-document-how-geo-cursor-works' into 'master'

Document how Geo cursor works

Closes #3145

See merge request gitlab-org/gitlab-ee!2899
parents a0ec29aa c02c2842
...@@ -33,15 +33,16 @@ and the replicated read-only ones as **secondaries**. ...@@ -33,15 +33,16 @@ and the replicated read-only ones as **secondaries**.
Keep in mind that: Keep in mind that:
- Secondaries talk to primary to authorize user logins (OAuth), - Secondaries talk to primary to get user data for logins (API), to
to synchronize data (database replication), and to clone/pull from clone/pull from repositories (SSH) and to retrieve LFS Objects and Attachments
repositories (SSH). (HTTPS + JWT).
- Primary talks to secondaries to get their status (API). - Since GitLab Enterprise Edition Premium 10.0, the primary no longer talks to
secondaries to notify for changes (API).
## Use-cases ## Use-cases
- Can be used for cloning and fetching projects, in addition - Can be used for cloning and fetching projects, in addition
to reading any data to reading any data available in the GitLab web interface
- Overcomes slow connection between distant offices, saving time by - Overcomes slow connection between distant offices, saving time by
improving speed for distributed teams improving speed for distributed teams
- Helps reducing the loading time for automated tasks, - Helps reducing the loading time for automated tasks,
...@@ -53,11 +54,12 @@ The following diagram illustrates the underlying architecture of GitLab Geo: ...@@ -53,11 +54,12 @@ The following diagram illustrates the underlying architecture of GitLab Geo:
![GitLab Geo architecture](img/geo-architecture.png) ![GitLab Geo architecture](img/geo-architecture.png)
[Source diagram](https://docs.google.com/drawings/d/1VQIcj6jyE3idWKyt9MRUAaE3XXrkwx8g-Ne4pmURmwI/edit) [Source diagram](https://docs.google.com/drawings/d/1L44flo2Mxng928yAcHduaCJyGtKNEjk2WQkxaCU_cT8/edit)
In this diagram, there is one Geo primary node and one secondary. The In this diagram, there is one Geo primary node and one secondary. The
secondary clones repositories via git over SSH. Attachments, LFS objects, and secondary clones repositories via git over SSH. Attachments, LFS objects, and
other files are downloaded via HTTPS using a GitLab API to authenticate. other files are downloaded via HTTPS using the GitLab API to authenticate,
with a special endpoint protected by JWT.
Writes to the database and Git repositories can only be performed on the Geo Writes to the database and Git repositories can only be performed on the Geo
primary node. The secondary node receives database updates via PostgreSQL primary node. The secondary node receives database updates via PostgreSQL
...@@ -67,6 +69,8 @@ Note that the secondary needs two different PostgreSQL databases: a read-only ...@@ -67,6 +69,8 @@ Note that the secondary needs two different PostgreSQL databases: a read-only
instance that streams data from the main GitLab database and another used instance that streams data from the main GitLab database and another used
internally by the secondary node to record what data has been replicated. internally by the secondary node to record what data has been replicated.
In the secondary nodes there is an additional daemon: Geo Log Cursor.
### LDAP ### LDAP
We recommend that if you use LDAP on your primary that you also set up a We recommend that if you use LDAP on your primary that you also set up a
...@@ -79,6 +83,31 @@ Check with your LDAP provider for instructions on on how to set up ...@@ -79,6 +83,31 @@ Check with your LDAP provider for instructions on on how to set up
replication. For example, OpenLDAP provides [these replication. For example, OpenLDAP provides [these
instructions](https://www.openldap.org/doc/admin24/replication.html). instructions](https://www.openldap.org/doc/admin24/replication.html).
### Geo Tracking Database
We use the tracking database as metadata to control what needs to be
updated on the disk of the local instance (for example, download new assets,
fetch new LFS Objects or fetch changes from a repository that has recently been
updated).
Because the replicated instance is read-only, we need this additional instance
per secondary location.
### Geo Log Cursor
This daemon reads a log of events replicated by the primary node to the secondary
database and updates the Geo Tracking Database with changes that need to be
executed.
When something is marked to be updated in the tracking database, asynchronous
jobs running on the secondary node will execute the required operations and
update the state.
This new architecture allows us to be resilient to connectivity issues between the
nodes. It doesn't matter if it was just a few minutes or days. The secondary
instance will be able to replay all the events in the correct order and get in
sync again.
## Setup instructions ## Setup instructions
In order to set up one or more GitLab Geo instances, follow the steps below in In order to set up one or more GitLab Geo instances, follow the steps below in
......
doc/gitlab-geo/img/geo-architecture.png

64.8 KB | W: | H:

doc/gitlab-geo/img/geo-architecture.png

59.7 KB | W: | H:

doc/gitlab-geo/img/geo-architecture.png
doc/gitlab-geo/img/geo-architecture.png
doc/gitlab-geo/img/geo-architecture.png
doc/gitlab-geo/img/geo-architecture.png
  • 2-up
  • Swipe
  • Onion skin
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment