Merge branch '343055-update-docs-for-removed-methods' into 'master'

Update BG migration docs for removed methods See merge request gitlab-org/gitlab!77308

Merge branch '343055-update-docs-for-removed-methods' into 'master'
Update BG migration docs for removed methods See merge request gitlab-org/gitlab!77308
26fb603a · Marcia Ramos · 442a9482 · fdd1fda8 · 26fb603a
Commit 26fb603a authored Jan 07, 2022 by Marcia Ramos
Hide whitespace changes
Inline Side-by-side

Showing with 35 additions and 64 deletions

doc/development/background_migrations.md doc/development/background_migrations.md +35 -64

No files found.
--- a/doc/development/background_migrations.md
+++ b/doc/development/background_migrations.md
@@ -83,23 +83,11 @@ replacing the class name and arguments with whatever values are necessary for
 your migration:
 ```ruby
-migrate_async('BackgroundMigrationClassName', [arg1, arg2, ...])
+migrate_in('BackgroundMigrationClassName', [arg1, arg2, ...])
 ```
-Usually it's better to enqueue jobs in bulk, for this you can use
+You can use the function `queue_background_migration_jobs_by_range_at_intervals`
-`bulk_migrate_async`:
+to automatically split the job into batches:
-```ruby
-bulk_migrate_async(
-  [['BackgroundMigrationClassName', [1]],
-   ['BackgroundMigrationClassName', [2]]]
-)
-```
-Note that this will queue a Sidekiq job immediately: if you have a large number
-of records, this may not be what you want. You can use the function
-`queue_background_migration_jobs_by_range_at_intervals` to split the job into
-batches:
 ```ruby
 queue_background_migration_jobs_by_range_at_intervals(
@@ -117,16 +105,6 @@ consuming migrations it's best to schedule a background job using an
 updates. Removals in turn can be handled by simply defining foreign keys with
 cascading deletes.
-If you would like to schedule jobs in bulk with a delay, you can use
-`BackgroundMigrationWorker.bulk_perform_in`:
-```ruby
-jobs = [['BackgroundMigrationClassName', [1]],
-        ['BackgroundMigrationClassName', [2]]]
-bulk_migrate_in(5.minutes, jobs)
-```
 ### Rescheduling background migrations
 If one of the background migrations contains a bug that is fixed in a patch
@@ -197,53 +175,47 @@ the new format.
 ## Example
-To explain all this, let's use the following example: the table `services` has a
+To explain all this, let's use the following example: the table `integrations` has a
 field called `properties` which is stored in JSON. For all rows you want to
-extract the `url` key from this JSON object and store it in the `services.url`
+extract the `url` key from this JSON object and store it in the `integrations.url`
-column. There are millions of services and parsing JSON is slow, thus you can't
+column. There are millions of integrations and parsing JSON is slow, thus you can't
 do this in a regular migration.
 To do this using a background migration we'll start with defining our migration
 class:
 ```ruby
-class Gitlab::BackgroundMigration::ExtractServicesUrl
+class Gitlab::BackgroundMigration::ExtractIntegrationsUrl
-  class Service < ActiveRecord::Base
+  class Integration < ActiveRecord::Base
-    self.table_name = 'services'
+    self.table_name = 'integrations'
  end
-  def perform(service_id)
+  def perform(start_id, end_id)
-    # A row may be removed between scheduling and starting of a job, thus we
+    Integration.where(id: start_id..end_id).each do |integration|
-    # need to make sure the data is still present before doing any work.
+      json = JSON.load(integration.properties)
-    service = Service.select(:properties).find_by(id: service_id)
-    return unless service
+      integration.update(url: json['url']) if json['url']
-    begin
-      json = JSON.load(service.properties)
    rescue JSON::ParserError
      # If the JSON is invalid we don't want to keep the job around forever,
      # instead we'll just leave the "url" field to whatever the default value
      # is.
-      return
+      next
    end
-    service.update(url: json['url']) if json['url']
  end
 end
 ```
 Next we'll need to adjust our code so we schedule the above migration for newly
-created and updated services. We can do this using something along the lines of
+created and updated integrations. We can do this using something along the lines of
 the following:
 ```ruby
-class Service < ActiveRecord::Base
+class Integration < ActiveRecord::Base
-  after_commit :schedule_service_migration, on: :update
+  after_commit :schedule_integration_migration, on: :update
-  after_commit :schedule_service_migration, on: :create
+  after_commit :schedule_integration_migration, on: :create
-  def schedule_service_migration
+  def schedule_integration_migration
-    BackgroundMigrationWorker.perform_async('ExtractServicesUrl', [id])
+    BackgroundMigrationWorker.perform_async('ExtractIntegrationsUrl', [id, id])
  end
 end
 ```
@@ -253,21 +225,20 @@ before the transaction completes as doing so can lead to race conditions where
 the changes are not yet visible to the worker.
 Next we'll need a post-deployment migration that schedules the migration for
-existing data. Since we're dealing with a lot of rows we'll schedule jobs in
+existing data.
-batches instead of doing this one by one:
 ```ruby
-class ScheduleExtractServicesUrl < Gitlab::Database::Migration[1.0]
+class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0]
  disable_ddl_transaction!
-  def up
+  MIGRATION = 'ExtractIntegrationsUrl'
-    define_batchable_model('services').select(:id).in_batches do |relation|
+  DELAY_INTERVAL = 2.minutes
-      jobs = relation.pluck(:id).map do |id|
-        ['ExtractServicesUrl', [id]]
-      end
-      BackgroundMigrationWorker.bulk_perform_async(jobs)
+  def up
-    end
+    queue_background_migration_jobs_by_range_at_intervals(
+      define_batchable_model('integrations'),
+      MIGRATION,
+      DELAY_INTERVAL)
  end
  def down
@@ -284,18 +255,18 @@ jobs and manually run on any un-migrated rows. Such a migration would look like
 this:
 ```ruby
-class ConsumeRemainingExtractServicesUrlJobs < Gitlab::Database::Migration[1.0]
+class ConsumeRemainingExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0]
  disable_ddl_transaction!
  def up
    # This must be included
-    Gitlab::BackgroundMigration.steal('ExtractServicesUrl')
+    Gitlab::BackgroundMigration.steal('ExtractIntegrationsUrl')
    # This should be included, but can be skipped - see below
-    define_batchable_model('services').where(url: nil).each_batch(of: 50) do |batch|
+    define_batchable_model('integrations').where(url: nil).each_batch(of: 50) do |batch|
      range = batch.pluck('MIN(id)', 'MAX(id)').first
-      Gitlab::BackgroundMigration::ExtractServicesUrl.new.perform(*range)
+      Gitlab::BackgroundMigration::ExtractIntegrationsUrl.new.perform(*range)
    end
  end
@@ -313,9 +284,9 @@ If the application does not depend on the data being 100% migrated (for
 instance, the data is advisory, and not mission-critical), then this final step
 can be skipped.
-This migration will then process any jobs for the ExtractServicesUrl migration
+This migration will then process any jobs for the ExtractIntegrationsUrl migration
 and continue once all jobs have been processed. Once done you can safely remove
-the `services.properties` column.
+the `integrations.properties` column.
 ## Testing