Commit d306b0d7 authored by Stan Hu's avatar Stan Hu

Merge branch 'use-optimistic-locking' into 'master'

Use optimistic locking

## What does this MR do?
Removes the usage of pessimistic locking in favor of optimistic which is way cheaper and doesn't block database operation.

Since this is very simple change it should be safe. If we receive `StaleObjectError` message we will reload object a retry operations in lock.

However, I still believe that we need this one: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/7005 as this will reduce a load on Database and FS.
This changes a behavior from:

### Pesimistic locking (previous behavior)

#### For updating
1. SELECT * FOR UPDATE (other updates wait on this)
2. we update ci_pipeline
3. latest_build_status
4. enqueue: (use: transition :created -> :pending)
5. [state_machine] we are in  state created, we can go to pending
6. [state_machine] ci_pipeline.status = created
7. [state_machine] ci_pipeline.save
8. [state_machine] after_transition: (if for success): PipelineSuccessWorker on Sidekiq
9. release DB lock

#### If no update is required
1. SELECT * FOR UPDATE (other updates wait on this)
2. we update ci_pipeline
3. latest_build_status
4. we are in pending, we can't transition to pending, because it's forbidden
5. release DB lock

### Optimistic locking (implemented by this MR)

#### For updating
1. latest_build_status
2. enqueue: (use `transition :created -> :pending`)
3. [state_machine] we are in state created, we can go to pending
4. [state_machine] ci_pipeline.status = created
5. [state_machine] ci_pipeline.save
6. [state_machine] [save] where(lock_version: ci_pipeline.lock_version).update_all(status: :created, updated_at: Time.now)
7. [state_machine] [save] unless we_updated_row then raise ObjectInconsistentError

#### If no update is required
1. we update ci_pipeline
2. latest_build_status
3. we are in pending, we can't transition to pending, because it's forbidden

## Why was this MR needed?
We have been seeing a number of problems when we migrated Pipeline/Build processing to Sidekiq. Especially we started seeing a lot of blocking queries.

We used a pessimistic locking which doesn't seem to be required. This effectively allows us to fix our issues with blocked queries by using more efficient method of operation.

## What are the relevant issue numbers?
Issues: https://gitlab.com/gitlab-com/infrastructure/issues/623 and https://gitlab.com/gitlab-com/infrastructure/issues/584, but also there's a bunch of Merge Requests that try to improve behavior of scheduled jobs.

cc @pcarranza @yorickpeterse @stanhu

See merge request !7040
parents 144358e9 2822526e
...@@ -91,6 +91,7 @@ Please view this file on the master branch, on stable branches it's out of date. ...@@ -91,6 +91,7 @@ Please view this file on the master branch, on stable branches it's out of date.
- Updating verbiage on git basics to be more intuitive - Updating verbiage on git basics to be more intuitive
- Fix project_feature record not generated on project creation - Fix project_feature record not generated on project creation
- Clarify documentation for Runners API (Gennady Trafimenkov) - Clarify documentation for Runners API (Gennady Trafimenkov)
- Use optimistic locking for pipelines and builds
- The instrumentation for Banzai::Renderer has been restored - The instrumentation for Banzai::Renderer has been restored
- Change user & group landing page routing from /u/:username to /:username - Change user & group landing page routing from /u/:username to /:username
- Added documentation for .gitattributes files - Added documentation for .gitattributes files
......
...@@ -30,23 +30,23 @@ module Ci ...@@ -30,23 +30,23 @@ module Ci
end end
event :run do event :run do
transition any => :running transition any - [:running] => :running
end end
event :skip do event :skip do
transition any => :skipped transition any - [:skipped] => :skipped
end end
event :drop do event :drop do
transition any => :failed transition any - [:failed] => :failed
end end
event :succeed do event :succeed do
transition any => :success transition any - [:success] => :success
end end
event :cancel do event :cancel do
transition any => :canceled transition any - [:canceled] => :canceled
end end
# IMPORTANT # IMPORTANT
...@@ -260,7 +260,7 @@ module Ci ...@@ -260,7 +260,7 @@ module Ci
end end
def update_status def update_status
with_lock do Gitlab::OptimisticLocking.retry_lock(self) do
case latest_builds_status case latest_builds_status
when 'pending' then enqueue when 'pending' then enqueue
when 'running' then run when 'running' then run
......
...@@ -73,16 +73,16 @@ class CommitStatus < ActiveRecord::Base ...@@ -73,16 +73,16 @@ class CommitStatus < ActiveRecord::Base
transition [:created, :pending, :running] => :canceled transition [:created, :pending, :running] => :canceled
end end
after_transition created: [:pending, :running] do |commit_status| before_transition created: [:pending, :running] do |commit_status|
commit_status.update_attributes queued_at: Time.now commit_status.queued_at = Time.now
end end
after_transition [:created, :pending] => :running do |commit_status| before_transition [:created, :pending] => :running do |commit_status|
commit_status.update_attributes started_at: Time.now commit_status.started_at = Time.now
end end
after_transition any => [:success, :failed, :canceled] do |commit_status| before_transition any => [:success, :failed, :canceled] do |commit_status|
commit_status.update_attributes finished_at: Time.now commit_status.finished_at = Time.now
end end
after_transition do |commit_status, transition| after_transition do |commit_status, transition|
......
...@@ -10,7 +10,6 @@ module Ci ...@@ -10,7 +10,6 @@ module Ci
create_builds! create_builds!
end end
@pipeline.with_lock do
new_builds = new_builds =
stage_indexes_of_created_builds.map do |index| stage_indexes_of_created_builds.map do |index|
process_stage(index) process_stage(index)
...@@ -18,10 +17,8 @@ module Ci ...@@ -18,10 +17,8 @@ module Ci
@pipeline.update_status @pipeline.update_status
# Return a flag if a when builds got enqueued
new_builds.flatten.any? new_builds.flatten.any?
end end
end
private private
...@@ -32,9 +29,11 @@ module Ci ...@@ -32,9 +29,11 @@ module Ci
def process_stage(index) def process_stage(index)
current_status = status_for_prior_stages(index) current_status = status_for_prior_stages(index)
created_builds_in_stage(index).select do |build|
if HasStatus::COMPLETED_STATUSES.include?(current_status) if HasStatus::COMPLETED_STATUSES.include?(current_status)
process_build(build, current_status) created_builds_in_stage(index).select do |build|
Gitlab::OptimisticLocking.retry_lock(build) do |subject|
process_build(subject, current_status)
end
end end
end end
end end
......
...@@ -28,17 +28,14 @@ module Ci ...@@ -28,17 +28,14 @@ module Ci
if build if build
# In case when 2 runners try to assign the same build, second runner will be declined # In case when 2 runners try to assign the same build, second runner will be declined
# with StateMachines::InvalidTransition in run! method. # with StateMachines::InvalidTransition or StaleObjectError when doing run! or save method.
build.with_lock do
build.runner_id = current_runner.id build.runner_id = current_runner.id
build.save!
build.run! build.run!
end end
end
build build
rescue StateMachines::InvalidTransition rescue StateMachines::InvalidTransition, ActiveRecord::StaleObjectError
nil nil
end end
......
# See http://doc.gitlab.com/ce/development/migration_style_guide.html
# for more information on how to write migrations for GitLab.
class AddLockVersionToBuildAndPipelines < ActiveRecord::Migration
include Gitlab::Database::MigrationHelpers
# Set this constant to true if this migration requires downtime.
DOWNTIME = false
def change
add_column :ci_builds, :lock_version, :integer
add_column :ci_commits, :lock_version, :integer
end
end
...@@ -189,6 +189,7 @@ ActiveRecord::Schema.define(version: 20161025231710) do ...@@ -189,6 +189,7 @@ ActiveRecord::Schema.define(version: 20161025231710) do
t.text "yaml_variables" t.text "yaml_variables"
t.datetime "queued_at" t.datetime "queued_at"
t.string "token" t.string "token"
t.integer "lock_version"
end end
add_index "ci_builds", ["commit_id", "stage_idx", "created_at"], name: "index_ci_builds_on_commit_id_and_stage_idx_and_created_at", using: :btree add_index "ci_builds", ["commit_id", "stage_idx", "created_at"], name: "index_ci_builds_on_commit_id_and_stage_idx_and_created_at", using: :btree
...@@ -219,6 +220,7 @@ ActiveRecord::Schema.define(version: 20161025231710) do ...@@ -219,6 +220,7 @@ ActiveRecord::Schema.define(version: 20161025231710) do
t.datetime "finished_at" t.datetime "finished_at"
t.integer "duration" t.integer "duration"
t.integer "user_id" t.integer "user_id"
t.integer "lock_version"
end end
add_index "ci_commits", ["gl_project_id", "sha"], name: "index_ci_commits_on_gl_project_id_and_sha", using: :btree add_index "ci_commits", ["gl_project_id", "sha"], name: "index_ci_commits_on_gl_project_id_and_sha", using: :btree
......
module Gitlab
module OptimisticLocking
extend self
def retry_lock(subject, retries = 100, &block)
loop do
begin
ActiveRecord::Base.transaction do
return block.call(subject)
end
rescue ActiveRecord::StaleObjectError
retries -= 1
raise unless retries >= 0
subject.reload
end
end
end
end
end
...@@ -178,6 +178,7 @@ Ci::Pipeline: ...@@ -178,6 +178,7 @@ Ci::Pipeline:
- finished_at - finished_at
- duration - duration
- user_id - user_id
- lock_version
CommitStatus: CommitStatus:
- id - id
- project_id - project_id
...@@ -217,6 +218,7 @@ CommitStatus: ...@@ -217,6 +218,7 @@ CommitStatus:
- yaml_variables - yaml_variables
- queued_at - queued_at
- token - token
- lock_version
Ci::Variable: Ci::Variable:
- id - id
- project_id - project_id
......
require 'spec_helper'
describe Gitlab::OptimisticLocking, lib: true do
describe '#retry_lock' do
let!(:pipeline) { create(:ci_pipeline) }
let!(:pipeline2) { Ci::Pipeline.find(pipeline.id) }
it 'does not reload object if state changes' do
expect(pipeline).not_to receive(:reload)
expect(pipeline).to receive(:succeed).and_call_original
described_class.retry_lock(pipeline) do |subject|
subject.succeed
end
end
it 'retries action if exception is raised' do
pipeline.succeed
expect(pipeline2).to receive(:reload).and_call_original
expect(pipeline2).to receive(:drop).twice.and_call_original
described_class.retry_lock(pipeline2) do |subject|
subject.drop
end
end
it 'raises exception when too many retries' do
expect(pipeline).to receive(:drop).twice.and_call_original
expect do
described_class.retry_lock(pipeline, 1) do |subject|
subject.lock_version = 100
subject.drop
end
end.to raise_error(ActiveRecord::StaleObjectError)
end
end
end
...@@ -138,32 +138,26 @@ describe Ci::Pipeline, models: true do ...@@ -138,32 +138,26 @@ describe Ci::Pipeline, models: true do
describe 'state machine' do describe 'state machine' do
let(:current) { Time.now.change(usec: 0) } let(:current) { Time.now.change(usec: 0) }
let(:build) { create_build('build1', current, 10) } let(:build) { create_build('build1', 0) }
let(:build_b) { create_build('build2', current, 20) } let(:build_b) { create_build('build2', 0) }
let(:build_c) { create_build('build3', current + 50, 10) } let(:build_c) { create_build('build3', 0) }
describe '#duration' do describe '#duration' do
before do before do
pipeline.update(created_at: current)
travel_to(current + 5) do
pipeline.run
pipeline.save
end
travel_to(current + 30) do travel_to(current + 30) do
build.success build.run!
build.success!
build_b.run!
build_c.run!
end end
travel_to(current + 40) do travel_to(current + 40) do
build_b.drop build_b.drop!
end end
travel_to(current + 70) do travel_to(current + 70) do
build_c.success build_c.success!
end end
pipeline.drop
end end
it 'matches sum of builds duration' do it 'matches sum of builds duration' do
...@@ -455,7 +449,9 @@ describe Ci::Pipeline, models: true do ...@@ -455,7 +449,9 @@ describe Ci::Pipeline, models: true do
context 'when all builds succeed' do context 'when all builds succeed' do
before do before do
build_a.success build_a.success
build_b.success
# We have to reload build_b as this is in next stage and it gets triggered by PipelineProcessWorker
build_b.reload.success
end end
it 'receives a success event once' do it 'receives a success event once' do
......
...@@ -277,6 +277,7 @@ describe API::API, api: true do ...@@ -277,6 +277,7 @@ describe API::API, api: true do
context 'with regular branch' do context 'with regular branch' do
before do before do
pipeline.reload
pipeline.update(ref: 'master', pipeline.update(ref: 'master',
sha: project.commit('master').sha) sha: project.commit('master').sha)
...@@ -288,6 +289,7 @@ describe API::API, api: true do ...@@ -288,6 +289,7 @@ describe API::API, api: true do
context 'with branch name containing slash' do context 'with branch name containing slash' do
before do before do
pipeline.reload
pipeline.update(ref: 'improve/awesome', pipeline.update(ref: 'improve/awesome',
sha: project.commit('improve/awesome').sha) sha: project.commit('improve/awesome').sha)
end end
......
...@@ -101,11 +101,11 @@ module Ci ...@@ -101,11 +101,11 @@ module Ci
it 'equalises number of running builds' do it 'equalises number of running builds' do
# after finishing the first build for project 1, get a second build from the same project # after finishing the first build for project 1, get a second build from the same project
expect(service.execute(shared_runner)).to eq(build1_project1) expect(service.execute(shared_runner)).to eq(build1_project1)
build1_project1.success build1_project1.reload.success
expect(service.execute(shared_runner)).to eq(build2_project1) expect(service.execute(shared_runner)).to eq(build2_project1)
expect(service.execute(shared_runner)).to eq(build1_project2) expect(service.execute(shared_runner)).to eq(build1_project2)
build1_project2.success build1_project2.reload.success
expect(service.execute(shared_runner)).to eq(build2_project2) expect(service.execute(shared_runner)).to eq(build2_project2)
expect(service.execute(shared_runner)).to eq(build1_project3) expect(service.execute(shared_runner)).to eq(build1_project3)
expect(service.execute(shared_runner)).to eq(build3_project1) expect(service.execute(shared_runner)).to eq(build3_project1)
......
...@@ -147,6 +147,7 @@ describe MergeRequests::MergeWhenBuildSucceedsService do ...@@ -147,6 +147,7 @@ describe MergeRequests::MergeWhenBuildSucceedsService do
expect(MergeWorker).not_to receive(:perform_async) expect(MergeWorker).not_to receive(:perform_async)
build.success build.success
test.reload
test.drop test.drop
end end
...@@ -154,6 +155,7 @@ describe MergeRequests::MergeWhenBuildSucceedsService do ...@@ -154,6 +155,7 @@ describe MergeRequests::MergeWhenBuildSucceedsService do
expect(MergeWorker).to receive(:perform_async) expect(MergeWorker).to receive(:perform_async)
build.success build.success
test.reload
test.success test.success
end end
end end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment