Merge branch '343055-update-docs-for-removed-methods' into 'master'

Update BG migration docs for removed methods See merge request gitlab-org/gitlab!77308

Merge branch '343055-update-docs-for-removed-methods' into 'master'
Update BG migration docs for removed methods See merge request gitlab-org/gitlab!77308
26fb603a · Marcia Ramos · 442a9482 · fdd1fda8 · 26fb603a
Commit 26fb603a authored Jan 07, 2022 by Marcia Ramos
Hide whitespace changes
Inline Side-by-side

Showing with 35 additions and 64 deletions

doc/development/background_migrations.md doc/development/background_migrations.md +35 -64

No files found.
--- a/doc/development/background_migrations.md
+++ b/doc/development/background_migrations.md
@@ -83,23 +83,11 @@ replacing the class name and arguments with whatever values are necessary for
 your migration:

 ```ruby
-migrate_async('BackgroundMigrationClassName', [arg1, arg2, ...])
+migrate_in('BackgroundMigrationClassName', [arg1, arg2, ...])
 ```

-Usually it's better to enqueue jobs in bulk, for this you can use
-`bulk_migrate_async`:
-
-```ruby
-bulk_migrate_async(
-  [['BackgroundMigrationClassName', [1]],
-   ['BackgroundMigrationClassName', [2]]]
-)
-```
-
-Note that this will queue a Sidekiq job immediately: if you have a large number
-of records, this may not be what you want. You can use the function
-`queue_background_migration_jobs_by_range_at_intervals` to split the job into
-batches:
+You can use the function `queue_background_migration_jobs_by_range_at_intervals`
+to automatically split the job into batches:

 ```ruby
 queue_background_migration_jobs_by_range_at_intervals(
@@ -117,16 +105,6 @@ consuming migrations it's best to schedule a background job using an
 updates. Removals in turn can be handled by simply defining foreign keys with
 cascading deletes.

-If you would like to schedule jobs in bulk with a delay, you can use
-`BackgroundMigrationWorker.bulk_perform_in`:
-
-```ruby
-jobs = [['BackgroundMigrationClassName', [1]],
-        ['BackgroundMigrationClassName', [2]]]
-
-bulk_migrate_in(5.minutes, jobs)
-```
-
 ### Rescheduling background migrations

 If one of the background migrations contains a bug that is fixed in a patch
@@ -197,53 +175,47 @@ the new format.

 ## Example

-To explain all this, let's use the following example: the table `services` has a
+To explain all this, let's use the following example: the table `integrations` has a
 field called `properties` which is stored in JSON. For all rows you want to
-extract the `url` key from this JSON object and store it in the `services.url`
-column. There are millions of services and parsing JSON is slow, thus you can't
+extract the `url` key from this JSON object and store it in the `integrations.url`
+column. There are millions of integrations and parsing JSON is slow, thus you can't
 do this in a regular migration.

 To do this using a background migration we'll start with defining our migration
 class:

 ```ruby
-class Gitlab::BackgroundMigration::ExtractServicesUrl
-  class Service < ActiveRecord::Base
-    self.table_name = 'services'
+class Gitlab::BackgroundMigration::ExtractIntegrationsUrl
+  class Integration < ActiveRecord::Base
+    self.table_name = 'integrations'
  end

-  def perform(service_id)
-    # A row may be removed between scheduling and starting of a job, thus we
-    # need to make sure the data is still present before doing any work.
-    service = Service.select(:properties).find_by(id: service_id)
+  def perform(start_id, end_id)
+    Integration.where(id: start_id..end_id).each do |integration|
+      json = JSON.load(integration.properties)

-    return unless service
-
-    begin
-      json = JSON.load(service.properties)
+      integration.update(url: json['url']) if json['url']
    rescue JSON::ParserError
      # If the JSON is invalid we don't want to keep the job around forever,
      # instead we'll just leave the "url" field to whatever the default value
      # is.
-      return
+      next
    end
-
-    service.update(url: json['url']) if json['url']
  end
 end
 ```

 Next we'll need to adjust our code so we schedule the above migration for newly
-created and updated services. We can do this using something along the lines of
+created and updated integrations. We can do this using something along the lines of
 the following:

 ```ruby
-class Service < ActiveRecord::Base
-  after_commit :schedule_service_migration, on: :update
-  after_commit :schedule_service_migration, on: :create
+class Integration < ActiveRecord::Base
+  after_commit :schedule_integration_migration, on: :update
+  after_commit :schedule_integration_migration, on: :create

-  def schedule_service_migration
-    BackgroundMigrationWorker.perform_async('ExtractServicesUrl', [id])
+  def schedule_integration_migration
+    BackgroundMigrationWorker.perform_async('ExtractIntegrationsUrl', [id, id])
  end
 end
 ```
@@ -253,21 +225,20 @@ before the transaction completes as doing so can lead to race conditions where
 the changes are not yet visible to the worker.

 Next we'll need a post-deployment migration that schedules the migration for
-existing data. Since we're dealing with a lot of rows we'll schedule jobs in
-batches instead of doing this one by one:
+existing data.

 ```ruby
-class ScheduleExtractServicesUrl < Gitlab::Database::Migration[1.0]
+class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0]
  disable_ddl_transaction!

-  def up
-    define_batchable_model('services').select(:id).in_batches do |relation|
-      jobs = relation.pluck(:id).map do |id|
-        ['ExtractServicesUrl', [id]]
-      end
+  MIGRATION = 'ExtractIntegrationsUrl'
+  DELAY_INTERVAL = 2.minutes

-      BackgroundMigrationWorker.bulk_perform_async(jobs)
-    end
+  def up
+    queue_background_migration_jobs_by_range_at_intervals(
+      define_batchable_model('integrations'),
+      MIGRATION,
+      DELAY_INTERVAL)
  end

  def down
@@ -284,18 +255,18 @@ jobs and manually run on any un-migrated rows. Such a migration would look like
 this:

 ```ruby
-class ConsumeRemainingExtractServicesUrlJobs < Gitlab::Database::Migration[1.0]
+class ConsumeRemainingExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0]
  disable_ddl_transaction!

  def up
    # This must be included
-    Gitlab::BackgroundMigration.steal('ExtractServicesUrl')
+    Gitlab::BackgroundMigration.steal('ExtractIntegrationsUrl')

    # This should be included, but can be skipped - see below
-    define_batchable_model('services').where(url: nil).each_batch(of: 50) do |batch|
+    define_batchable_model('integrations').where(url: nil).each_batch(of: 50) do |batch|
      range = batch.pluck('MIN(id)', 'MAX(id)').first

-      Gitlab::BackgroundMigration::ExtractServicesUrl.new.perform(*range)
+      Gitlab::BackgroundMigration::ExtractIntegrationsUrl.new.perform(*range)
    end
  end

@@ -313,9 +284,9 @@ If the application does not depend on the data being 100% migrated (for
 instance, the data is advisory, and not mission-critical), then this final step
 can be skipped.

-This migration will then process any jobs for the ExtractServicesUrl migration
+This migration will then process any jobs for the ExtractIntegrationsUrl migration
 and continue once all jobs have been processed. Once done you can safely remove
-the `services.properties` column.
+the `integrations.properties` column.

 ## Testing