1. 22 Sep, 2020 1 commit
    • Stan Hu's avatar
      Fix merge pre-receive errors when load balancing in use · 821e1a83
      Stan Hu authored
      When a user merges a merge request, the following sequence happens:
      
      1. Sidekiq: `MergeService` locks the state of the merge request in the
      DB.
      2. Gitaly: UserMergeBranch RPC runs and creates a merge commit.
      3. Sidekiq: `Repository#merge` and `Repository#ff_merge` updates the
         `in_progress_merge_commit_sha` database column with this merge commit
         SHA.
      4. Gitaly: UserMergeBranch RPC runs again and applies this merge commit
      to the target branch.
      5. Gitaly (gitaly-ruby): This RPC calls the pre-receive hook.
      6. Rails: This hook makes an API request to `/api/v4/internal/allowed`.
      7. Rails: This API check makes a SQL query for locked merge requests
         with a matching SHA.
      
      Since steps 1 and 7 will happen in different database sessions,
      replication lag could erroneously cause step 7 to report no matching
      merge requests. To avoid this, we have a few options:
      
      1. Wrap step 7 in a transaction. The EE load balancer will always
         direct these queries to the primary.
      2. Always force the load balancer session to use the primary for this
         query.
      3. Use the load balancing sticking mechanism to use the primary until
         the secondaries have caught up to the right write location.
      
      Option 1 isn't great because on GitLab.com, this query can take 500 ms
      to run, and long-running, read transactions are not desirable.
      
      Option 2 is simple and guarantees that we will always have a consistent
      read. However, none of these queries will ever be routed to a secondary,
      and this may cause undo load on the primary.
      
      We go with option 3. Whenever the `in_progress_merge_commit_sha` is
      updated, we mark the write location of the primary. Then in
      `MatchingMergeRequest#match?`, we stick to the primary if the replica
      has not caught up to that location.
      
      Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/247857
      821e1a83
  2. 21 Sep, 2020 39 commits