Fix merge pre-receive errors when load balancing in use
When a user merges a merge request, the following sequence happens: 1. Sidekiq: `MergeService` locks the state of the merge request in the DB. 2. Gitaly: UserMergeBranch RPC runs and creates a merge commit. 3. Sidekiq: `Repository#merge` and `Repository#ff_merge` updates the `in_progress_merge_commit_sha` database column with this merge commit SHA. 4. Gitaly: UserMergeBranch RPC runs again and applies this merge commit to the target branch. 5. Gitaly (gitaly-ruby): This RPC calls the pre-receive hook. 6. Rails: This hook makes an API request to `/api/v4/internal/allowed`. 7. Rails: This API check makes a SQL query for locked merge requests with a matching SHA. Since steps 1 and 7 will happen in different database sessions, replication lag could erroneously cause step 7 to report no matching merge requests. To avoid this, we have a few options: 1. Wrap step 7 in a transaction. The EE load balancer will always direct these queries to the primary. 2. Always force the load balancer session to use the primary for this query. 3. Use the load balancing sticking mechanism to use the primary until the secondaries have caught up to the right write location. Option 1 isn't great because on GitLab.com, this query can take 500 ms to run, and long-running, read transactions are not desirable. Option 2 is simple and guarantees that we will always have a consistent read. However, none of these queries will ever be routed to a secondary, and this may cause undo load on the primary. We go with option 3. Whenever the `in_progress_merge_commit_sha` is updated, we mark the write location of the primary. Then in `MatchingMergeRequest#match?`, we stick to the primary if the replica has not caught up to that location. Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/247857
Showing
Please register or sign in to comment