Merge branch 'rework-sidekiq-docs' into 'master'

Split Sidekiq development docs into separate files See merge request gitlab-org/gitlab!78186

Merge branch 'rework-sidekiq-docs' into 'master'
Split Sidekiq development docs into separate files See merge request gitlab-org/gitlab!78186
e3fd22d4 · Diana Logan · 364155fc · bb308906 · e3fd22d4 · e3fd22d4
Commit e3fd22d4 authored Jan 20, 2022 by Diana Logan
15 changed files
--- a/doc/administration/operations/extra_sidekiq_processes.md
+++ b/doc/administration/operations/extra_sidekiq_processes.md
@@ -296,7 +296,7 @@ Instead of a queue, a queue namespace can also be provided, to have the process
 automatically listen on all queues in that namespace without needing to
 explicitly list all the queue names. For more information about queue namespaces,
 see the relevant section in the
-[Sidekiq style guide](../../development/sidekiq_style_guide.md#queue-namespaces).
+[Sidekiq development documentation](../../development/sidekiq/index.md#queue-namespaces).

 For example, say you want to start 2 extra processes: one to process the
 `process_commit` queue, and one to process the `post_receive` queue. This can be

--- a/doc/development/code_review.md
+++ b/doc/development/code_review.md
@@ -599,7 +599,7 @@ Enterprise Edition instance. This has some implications:
      - [Background migrations](background_migrations.md) run in Sidekiq, and
        should only be done for migrations that would take an extreme amount of
        time at GitLab.com scale.
-1. **Sidekiq workers** [cannot change in a backwards-incompatible way](sidekiq_style_guide.md#sidekiq-compatibility-across-updates):
+1. **Sidekiq workers** [cannot change in a backwards-incompatible way](sidekiq/compatibility_across_updates.md):
   1. Sidekiq queues are not drained before a deploy happens, so there are
      workers in the queue from the previous version of GitLab.
   1. If you need to change a method signature, try to do so across two releases,

--- a/doc/development/emails.md
+++ b/doc/development/emails.md
@@ -10,8 +10,8 @@ info: To determine the technical writer assigned to the Stage/Group associated w

 A Sidekiq job is enqueued whenever `deliver_later` is called on an `ActionMailer`.
 If a mailer argument needs to be added or removed, it is important to ensure
-both backward and forward compatibility. Adhere to the Sidekiq Style Guide steps for
-[changing the arguments for a worker](sidekiq_style_guide.md#changing-the-arguments-for-a-worker).
+both backward and forward compatibility. Adhere to the Sidekiq steps for
+[changing the arguments for a worker](sidekiq/compatibility_across_updates.md#changing-the-arguments-for-a-worker).

 In the following example from [`NotificationService`](https://gitlab.com/gitlab-org/gitlab/-/blob/33ccb22e4fc271dbaac94b003a7a1a2915a13441/app/services/notification_service.rb#L74)
 adding or removing an argument in this mailer's definition may cause problems

--- a/doc/development/event_store.md
+++ b/doc/development/event_store.md
@@ -185,7 +185,7 @@ Changes to the schema require multiple rollouts. While the new version is being
 - Events get persisted in the Sidekiq queue as job arguments, so we could have 2 versions of the schema during deployments.

 As changing the schema ultimately impacts the Sidekiq arguments, please refer to our
-[Sidekiq style guide](sidekiq_style_guide.md#changing-the-arguments-for-a-worker) with regards to multiple rollouts.
+[Sidekiq style guide](sidekiq/compatibility_across_updates.md#changing-the-arguments-for-a-worker) with regards to multiple rollouts.

 #### Add properties

@@ -274,7 +274,7 @@ A subscription can specify a condition when to accept an event:

 ```ruby
 store.subscribe ::MergeRequests::UpdateHeadPipelineWorker,
-  to: ::Ci::PipelineCreatedEvent, 
+  to: ::Ci::PipelineCreatedEvent,
  if: -> (event) { event.data[:merge_request_id].present? }
 ```


--- a/doc/development/index.md
+++ b/doc/development/index.md
@@ -221,7 +221,7 @@ the [reviewer values](https://about.gitlab.com/handbook/engineering/workflow/rev
 - [Geo development](geo.md)
 - [Redis guidelines](redis.md)
  - [Adding a new Redis instance](redis/new_redis_instance.md)
- [Sidekiq guidelines](sidekiq_style_guide.md) for working with Sidekiq workers
+- [Sidekiq guidelines](sidekiq/index.md) for working with Sidekiq workers
 - [Working with Gitaly](gitaly.md)
 - [Elasticsearch integration docs](elasticsearch.md)
 - [Working with Merge Request diffs](diffs.md)

--- a/doc/development/multi_version_compatibility.md
+++ b/doc/development/multi_version_compatibility.md
@@ -14,14 +14,14 @@ In a sense, these scenarios are all transient states. But they can often persist

 ### When modifying a Sidekiq worker

-For example when [changing arguments](sidekiq_style_guide.md#changing-the-arguments-for-a-worker):
+For example when [changing arguments](sidekiq/compatibility_across_updates.md#changing-the-arguments-for-a-worker):

 - Is it ok if jobs are being enqueued with the old signature but executed by the new monthly release?
 - Is it ok if jobs are being enqueued with the new signature but executed by the previous monthly release?

 ### When adding a new Sidekiq worker

-Is it ok if these jobs don't get executed for several hours because [Sidekiq nodes are not yet updated](sidekiq_style_guide.md#adding-new-workers)?
+Is it ok if these jobs don't get executed for several hours because [Sidekiq nodes are not yet updated](sidekiq/compatibility_across_updates.md#adding-new-workers)?

 ### When modifying JavaScript

@@ -89,7 +89,7 @@ Many users [skip some monthly releases](../update/index.md#upgrading-to-a-new-ma

 - 13.0 => 13.12

-These users accept some downtime during the update. Unfortunately we can't ignore this case completely. For example, 13.12 may execute Sidekiq jobs from 13.0, which illustrates why [we avoid removing arguments from jobs until a major release](sidekiq_style_guide.md#deprecate-and-remove-an-argument). The main question is: Will the deployment get to a good state after the update is complete?
+These users accept some downtime during the update. Unfortunately we can't ignore this case completely. For example, 13.12 may execute Sidekiq jobs from 13.0, which illustrates why [we avoid removing arguments from jobs until a major release](sidekiq/compatibility_across_updates.md#deprecate-and-remove-an-argument). The main question is: Will the deployment get to a good state after the update is complete?

 ## What kind of components can GitLab be broken down into?

@@ -180,7 +180,7 @@ coexists in production.

 ### Changing Sidekiq worker's parameters

-This topic is explained in detail in [Sidekiq Compatibility across Updates](sidekiq_style_guide.md#sidekiq-compatibility-across-updates).
+This topic is explained in detail in [Sidekiq Compatibility across Updates](sidekiq/compatibility_across_updates.md).

 When we need to add a new parameter to a Sidekiq worker class, we can split this into the following steps:


--- a/doc/development/sidekiq/compatibility_across_updates.md
+++ b/doc/development/sidekiq/compatibility_across_updates.md
+---
+stage: none
+group: unassigned
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Sidekiq Compatibility across Updates
+
+The arguments for a Sidekiq job are stored in a queue while it is
+scheduled for execution. During a online update, this could lead to
+several possible situations:
+
+1. An older version of the application publishes a job, which is executed by an
+   upgraded Sidekiq node.
+1. A job is queued before an upgrade, but executed after an upgrade.
+1. A job is queued by a node running the newer version of the application, but
+   executed on a node running an older version of the application.
+
+## Adding new workers
+
+On GitLab.com, we [do not currently have a Sidekiq deployment in the
+canary stage](https://gitlab.com/gitlab-org/gitlab/-/issues/19239). This
+means that a new worker than can be scheduled from an HTTP endpoint may
+be scheduled from canary but not run on Sidekiq until the full
+production deployment is complete. This can be several hours later than
+scheduling the job. For some workers, this will not be a problem. For
+others - particularly [latency-sensitive
+jobs](worker_attributes.md#latency-sensitive-jobs) - this will result in a poor user
+experience.
+
+This only applies to new worker classes when they are first introduced.
+As we recommend [using feature flags](../feature_flags/) as a general
+development process, it's best to control the entire change (including
+scheduling of the new Sidekiq worker) with a feature flag.
+
+## Changing the arguments for a worker
+
+Jobs need to be backward and forward compatible between consecutive versions
+of the application. Adding or removing an argument may cause problems
+during deployment before all Rails and Sidekiq nodes have the updated code.
+
+### Deprecate and remove an argument
+
+**Before you remove arguments from the `perform_async` and `perform` methods.**, deprecate them. The
+following example deprecates and then removes `arg2` from the `perform_async` method:
+
+1. Provide a default value (usually `nil`) and use a comment to mark the
+   argument as deprecated in the coming minor release. (Release M)
+
+    ```ruby
+    class ExampleWorker
+      # Keep arg2 parameter for backwards compatibility.
+      def perform(object_id, arg1, arg2 = nil)
+        # ...
+      end
+    end
+    ```
+
+1. One minor release later, stop using the argument in `perform_async`. (Release M+1)
+
+    ```ruby
+    ExampleWorker.perform_async(object_id, arg1)
+    ```
+
+1. At the next major release, remove the value from the worker class. (Next major release)
+
+    ```ruby
+    class ExampleWorker
+      def perform(object_id, arg1)
+        # ...
+      end
+    end
+    ```
+
+### Add an argument
+
+There are two options for safely adding new arguments to Sidekiq workers:
+
+1. Set up a [multi-step deployment](#multi-step-deployment) in which the new argument is first added to the worker.
+1. Use a [parameter hash](#parameter-hash) for additional arguments. This is perhaps the most flexible option.
+
+#### Multi-step deployment
+
+This approach requires multiple releases.
+
+1. Add the argument to the worker with a default value (Release M).
+
+    ```ruby
+    class ExampleWorker
+      def perform(object_id, new_arg = nil)
+        # ...
+      end
+    end
+    ```
+
+1. Add the new argument to all the invocations of the worker (Release M+1).
+
+    ```ruby
+    ExampleWorker.perform_async(object_id, new_arg)
+    ```
+
+1. Remove the default value (Release M+2).
+
+    ```ruby
+    class ExampleWorker
+      def perform(object_id, new_arg)
+        # ...
+      end
+    end
+    ```
+
+#### Parameter hash
+
+This approach doesn't require multiple releases if an existing worker already
+uses a parameter hash.
+
+1. Use a parameter hash in the worker to allow future flexibility.
+
+    ```ruby
+    class ExampleWorker
+      def perform(object_id, params = {})
+        # ...
+      end
+    end
+    ```
+
+## Removing workers
+
+Try to avoid removing workers and their queues in minor and patch
+releases.
+
+During online update instance can have pending jobs and removing the queue can
+lead to those jobs being stuck forever. If you can't write migration for those
+Sidekiq jobs, please consider removing the worker in a major release only.
+
+## Renaming queues
+
+For the same reasons that removing workers is dangerous, care should be taken
+when renaming queues.
+
+When renaming queues, use the `sidekiq_queue_migrate` helper migration method
+in a **post-deployment migration**:
+
+```ruby
+class MigrateTheRenamedSidekiqQueue < Gitlab::Database::Migration[1.0]
+  def up
+    sidekiq_queue_migrate 'old_queue_name', to: 'new_queue_name'
+  end
+
+  def down
+    sidekiq_queue_migrate 'new_queue_name', to: 'old_queue_name'
+  end
+end
+
+```
+
+You must rename the queue in a post-deployment migration not in a normal
+migration. Otherwise, it runs too early, before all the workers that
+schedule these jobs have stopped running. See also [other examples](../post_deployment_migrations.md#use-cases).
--- a/doc/development/sidekiq/idempotent_jobs.md
+++ b/doc/development/sidekiq/idempotent_jobs.md
+---
+stage: none
+group: unassigned
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Sidekiq idempotent jobs
+
+It's known that a job can fail for multiple reasons. For example, network outages or bugs.
+In order to address this, Sidekiq has a built-in retry mechanism that is
+used by default by most workers within GitLab.
+
+It's expected that a job can run again after a failure without major side-effects for the
+application or users, which is why Sidekiq encourages
+jobs to be [idempotent and transactional](https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-job-idempotent-and-transactional).
+
+As a general rule, a worker can be considered idempotent if:
+
+- It can safely run multiple times with the same arguments.
+- Application side-effects are expected to happen only once
+  (or side-effects of a second run do not have an effect).
+
+A good example of that would be a cache expiration worker.
+
+A job scheduled for an idempotent worker is [deduplicated](#deduplication) when
+an unstarted job with the same arguments is already in the queue.
+
+## Ensuring a worker is idempotent
+
+Make sure the worker tests pass using the following shared example:
+
+```ruby
+include_examples 'an idempotent worker' do
+  it 'marks the MR as merged' do
+    # Using subject inside this block will process the job multiple times
+    subject
+
+    expect(merge_request.state).to eq('merged')
+  end
+end
+```
+
+Use the `perform_multiple` method directly instead of `job.perform` (this
+helper method is automatically included for workers).
+
+## Declaring a worker as idempotent
+
+```ruby
+class IdempotentWorker
+  include ApplicationWorker
+
+  # Declares a worker is idempotent and can
+  # safely run multiple times.
+  idempotent!
+
+  # ...
+end
+```
+
+It's encouraged to only have the `idempotent!` call in the top-most worker class, even if
+the `perform` method is defined in another class or module.
+
+If the worker class isn't marked as idempotent, a cop fails. Consider skipping
+the cop if you're not confident your job can safely run multiple times.
+
+## Deduplication
+
+When a job for an idempotent worker is enqueued while another
+unstarted job is already in the queue, GitLab drops the second
+job. The work is skipped because the same work would be
+done by the job that was scheduled first; by the time the second
+job executed, the first job would do nothing.
+
+### Strategies
+
+GitLab supports two deduplication strategies:
+
+- `until_executing`
+- `until_executed`
+
+More [deduplication strategies have been
+suggested](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/195). If
+you are implementing a worker that could benefit from a different
+strategy, please comment in the issue.
+
+#### Until Executing
+
+This strategy takes a lock when a job is added to the queue, and removes that lock before the job starts.
+
+For example, `AuthorizedProjectsWorker` takes a user ID. When the
+worker runs, it recalculates a user's authorizations. GitLab schedules
+this job each time an action potentially changes a user's
+authorizations. If the same user is added to two projects at the
+same time, the second job can be skipped if the first job hasn't
+begun, because when the first job runs, it creates the
+authorizations for both projects.
+
+```ruby
+module AuthorizedProjectUpdate
+  class UserRefreshOverUserRangeWorker
+    include ApplicationWorker
+
+    deduplicate :until_executing
+    idempotent!
+
+    # ...
+  end
+end
+```
+
+#### Until Executed
+
+This strategy takes a lock when a job is added to the queue, and removes that lock after the job finishes.
+It can be used to prevent jobs from running simultaneously multiple times.
+
+```ruby
+module Ci
+  class BuildTraceChunkFlushWorker
+    include ApplicationWorker
+
+    deduplicate :until_executed
+    idempotent!
+
+    # ...
+  end
+end
+```
+
+Also, you can pass `if_deduplicated: :reschedule_once` option to re-run a job once after
+the currently running job finished and deduplication happened at least once.
+This ensures that the latest result is always produced even if a race condition
+happened. See [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/342123) for more information.
+
+### Scheduling jobs in the future
+
+GitLab doesn't skip jobs scheduled in the future, as we assume that
+the state has changed by the time the job is scheduled to
+execute. Deduplication of jobs scheduled in the feature is possible
+for both `until_executed` and `until_executing` strategies.
+
+If you do want to deduplicate jobs scheduled in the future,
+this can be specified on the worker by passing `including_scheduled: true` argument
+when defining deduplication strategy:
+
+```ruby
+module AuthorizedProjectUpdate
+  class UserRefreshOverUserRangeWorker
+    include ApplicationWorker
+
+    deduplicate :until_executing, including_scheduled: true
+    idempotent!
+
+    # ...
+  end
+end
+```
+
+## Setting the deduplication time-to-live (TTL)
+
+Deduplication depends on an idempotency key that is stored in Redis. This is normally
+cleared by the configured deduplication strategy.
+
+However, the key can remain until its TTL in certain cases like:
+
+1. `until_executing` is used but the job was never enqueued or executed after the Sidekiq
+   client middleware was run.
+
+1. `until_executed` is used but the job fails to finish due to retry exhaustion, gets
+   interrupted the maximum number of times, or gets lost.
+
+The default value is 6 hours. During this time, jobs won't be enqueued even if the first
+job never executed or finished.
+
+The TTL can be configured with:
+
+```ruby
+class ProjectImportScheduleWorker
+  include ApplicationWorker
+
+  idempotent!
+  deduplicate :until_executing, ttl: 5.minutes
+end
+```
+
+Duplicate jobs can happen when the TTL is reached, so make sure you lower this only for jobs
+that can tolerate some duplication.
+
+## Deduplication with load balancing
+
+> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/6763) in GitLab 14.4.
+
+Jobs that declare either `:sticky` or `:delayed` data consistency
+are eligible for database load-balancing.
+In both cases, jobs are [scheduled in the future](#scheduling-jobs-in-the-future) with a short delay (1 second).
+This minimizes the chance of replication lag after a write.
+
+If you really want to deduplicate jobs eligible for load balancing,
+specify `including_scheduled: true` argument when defining deduplication strategy:
+
+```ruby
+class DelayedIdempotentWorker
+  include ApplicationWorker
+  data_consistency :delayed
+
+  deduplicate :until_executing, including_scheduled: true
+  idempotent!
+
+  # ...
+end
+```
+
+### Preserve the latest WAL location for idempotent jobs
+
+> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69372) in GitLab 14.3.
+> - [Enabled on GitLab.com](https://gitlab.com/gitlab-org/gitlab/-/issues/338350) in GitLab 14.4.
+> - [Enabled on self-managed](https://gitlab.com/gitlab-org/gitlab/-/issues/338350) in GitLab 14.6.
+
+The deduplication always take into account the latest binary replication pointer, not the first one.
+This happens because we drop the same job scheduled for the second time and the Write-Ahead Log (WAL) is lost.
+This could lead to comparing the old WAL location and reading from a stale replica.
+
+To support both deduplication and maintaining data consistency with load balancing,
+we are preserving the latest WAL location for idempotent jobs in Redis.
+This way we are always comparing the latest binary replication pointer,
+making sure that we read from the replica that is fully caught up.
+
+FLAG:
+On self-managed GitLab, by default this feature is available. To hide the feature, ask an administrator to
+[disable the feature flag](../../administration/feature_flags.md) named `preserve_latest_wal_locations_for_idempotent_jobs`.
+
+This feature flag is related to GitLab development and is not intended to be used by GitLab administrators, though.
+On GitLab.com, this feature is available.
--- a/doc/development/sidekiq/index.md
+++ b/doc/development/sidekiq/index.md
+---
+stage: none
+group: unassigned
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Sidekiq guides
+
+We use [Sidekiq](https://github.com/mperham/sidekiq) as our background
+job processor. These guides are for writing jobs that will work well on
+GitLab.com and be consistent with our existing worker classes. For
+information on administering GitLab, see [configuring Sidekiq](../../administration/sidekiq.md).
+
+There are pages with additional detail on the following topics:
+
+1. [Compatibility across updates](compatibility_across_updates.md)
+1. [Job idempotency and job deduplication](idempotent_jobs.md)
+1. [Limited capacity worker: continuously performing work with a specified concurrency](limited_capacity_worker.md)
+1. [Logging](logging.md)
+1. [Worker attributes](worker_attributes.md)
+    1. **Job urgency** specifies queuing and execution SLOs
+    1. **Resource boundaries** and **external dependencies** for describing the workload
+    1. **Feature categorization**
+    1. **Database load balancing**
+
+## ApplicationWorker
+
+All workers should include `ApplicationWorker` instead of `Sidekiq::Worker`,
+which adds some convenience methods and automatically sets the queue based on
+the [routing rules](../../administration/operations/extra_sidekiq_routing.md#queue-routing-rules).
+
+## Retries
+
+Sidekiq defaults to using [25
+retries](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry),
+with back-off between each retry. 25 retries means that the last retry
+would happen around three weeks after the first attempt (assuming all 24
+prior retries failed).
+
+For most workers - especially [idempotent workers](idempotent_jobs.md) -
+the default of 25 retries is more than sufficient. Many of our older
+workers declare 3 retries, which used to be the default within the
+GitLab application. 3 retries happen over the course of a couple of
+minutes, so the jobs are prone to failing completely.
+
+A lower retry count may be applicable if any of the below apply:
+
+1. The worker contacts an external service and we do not provide
+   guarantees on delivery. For example, webhooks.
+1. The worker is not idempotent and running it multiple times could
+   leave the system in an inconsistent state. For example, a worker that
+   posts a system note and then performs an action: if the second step
+   fails and the worker retries, the system note will be posted again.
+1. The worker is a cronjob that runs frequently. For example, if a cron
+   job runs every hour, then we don't need to retry beyond an hour
+   because we don't need two of the same job running at once.
+
+Each retry for a worker is counted as a failure in our metrics. A worker
+which always fails 9 times and succeeds on the 10th would have a 90%
+error rate.
+
+## Sidekiq Queues
+
+Previously, each worker had its own queue, which was automatically set based on the
+worker class name. For a worker named `ProcessSomethingWorker`, the queue name
+would be `process_something`. You can now route workers to a specific queue using
+[queue routing rules](../../administration/operations/extra_sidekiq_routing.md#queue-routing-rules).
+In GDK, new workers are routed to a queue named `default`.
+
+If you're not sure what queue a worker uses,
+you can find it using `SomeWorker.queue`. There is almost never a reason to
+manually override the queue name using `sidekiq_options queue: :some_queue`.
+
+After adding a new worker, run `bin/rake
+gitlab:sidekiq:all_queues_yml:generate` to regenerate
+`app/workers/all_queues.yml` or `ee/app/workers/all_queues.yml` so that
+it can be picked up by
+[`sidekiq-cluster`](../../administration/operations/extra_sidekiq_processes.md)
+in installations that don't use routing rules. To learn more about potential changes,
+read [Use routing rules by default and deprecate queue selectors for self-managed](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/596).
+
+Additionally, run
+`bin/rake gitlab:sidekiq:sidekiq_queues_yml:generate` to regenerate
+`config/sidekiq_queues.yml`.
+
+## Queue Namespaces
+
+While different workers cannot share a queue, they can share a queue namespace.
+
+Defining a queue namespace for a worker makes it possible to start a Sidekiq
+process that automatically handles jobs for all workers in that namespace,
+without needing to explicitly list all their queue names. If, for example, all
+workers that are managed by `sidekiq-cron` use the `cronjob` queue namespace, we
+can spin up a Sidekiq process specifically for these kinds of scheduled jobs.
+If a new worker using the `cronjob` namespace is added later on, the Sidekiq
+process also picks up jobs for that worker (after having been restarted),
+without the need to change any configuration.
+
+A queue namespace can be set using the `queue_namespace` DSL class method:
+
+```ruby
+class SomeScheduledTaskWorker
+  include ApplicationWorker
+
+  queue_namespace :cronjob
+
+  # ...
+end
+```
+
+Behind the scenes, this sets `SomeScheduledTaskWorker.queue` to
+`cronjob:some_scheduled_task`. Commonly used namespaces have their own
+concern module that can easily be included into the worker class, and that may
+set other Sidekiq options besides the queue namespace. `CronjobQueue`, for
+example, sets the namespace, but also disables retries.
+
+`bundle exec sidekiq` is namespace-aware, and listens on all
+queues in a namespace (technically: all queues prefixed with the namespace name)
+when a namespace is provided instead of a simple queue name in the `--queue`
+(`-q`) option, or in the `:queues:` section in `config/sidekiq_queues.yml`.
+
+Note that adding a worker to an existing namespace should be done with care, as
+the extra jobs take resources away from jobs from workers that were already
+there, if the resources available to the Sidekiq process handling the namespace
+are not adjusted appropriately.
+
+## Versioning
+
+Version can be specified on each Sidekiq worker class.
+This is then sent along when the job is created.
+
+```ruby
+class FooWorker
+  include ApplicationWorker
+
+  version 2
+
+  def perform(*args)
+    if job_version == 2
+      foo = args.first['foo']
+    else
+      foo = args.first
+    end
+  end
+end
+```
+
+Under this schema, any worker is expected to be able to handle any job that was
+enqueued by an older version of that worker. This means that when changing the
+arguments a worker takes, you must increment the `version` (or set `version 1`
+if this is the first time a worker's arguments are changing), but also make sure
+that the worker is still able to handle jobs that were queued with any earlier
+version of the arguments. From the worker's `perform` method, you can read
+`self.job_version` if you want to specifically branch on job version, or you
+can read the number or type of provided arguments.
+
+## Job size
+
+GitLab stores Sidekiq jobs and their arguments in Redis. To avoid
+excessive memory usage, we compress the arguments of Sidekiq jobs
+if their original size is bigger than 100KB.
+
+After compression, if their size still exceeds 5MB, it raises an
+[`ExceedLimitError`](https://gitlab.com/gitlab-org/gitlab/-/blob/f3dd89e5e510ea04b43ffdcb58587d8f78a8d77c/lib/gitlab/sidekiq_middleware/size_limiter/exceed_limit_error.rb#L8)
+error when scheduling the job.
+
+If this happens, rely on other means of making the data
+available in Sidekiq. There are possible workarounds such as:
+
+- Rebuild the data in Sidekiq with data loaded from the database or
+  elsewhere.
+- Store the data in [object storage](../file_storage.md#object-storage)
+  before scheduling the job, and retrieve it inside the job.
+
+## Job weights
+
+Some jobs have a weight declared. This is only used when running Sidekiq
+in the default execution mode - using
+[`sidekiq-cluster`](../../administration/operations/extra_sidekiq_processes.md)
+does not account for weights.
+
+As we are [moving towards using `sidekiq-cluster` in
+Free](https://gitlab.com/gitlab-org/gitlab/-/issues/34396), newly-added
+workers do not need to have weights specified. They can use the
+default weight, which is 1.
+
+## Tests
+
+Each Sidekiq worker must be tested using RSpec, just like any other class. These
+tests should be placed in `spec/workers`.
--- a/doc/development/sidekiq/limited_capacity_worker.md
+++ b/doc/development/sidekiq/limited_capacity_worker.md
+---
+stage: none
+group: unassigned
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Sidekiq limited capacity worker
+
+It is possible to limit the number of concurrent running jobs for a worker class
+by using the `LimitedCapacity::Worker` concern.
+
+The worker must implement three methods:
+
+- `perform_work`: The concern implements the usual `perform` method and calls
+  `perform_work` if there's any available capacity.
+- `remaining_work_count`: Number of jobs that have work to perform.
+- `max_running_jobs`: Maximum number of jobs allowed to run concurrently.
+
+```ruby
+class MyDummyWorker
+  include ApplicationWorker
+  include LimitedCapacity::Worker
+
+  def perform_work(*args)
+  end
+
+  def remaining_work_count(*args)
+    5
+  end
+
+  def max_running_jobs
+    25
+  end
+end
+```
+
+Additional to the regular worker, a cron worker must be defined as well to
+backfill the queue with jobs. the arguments passed to `perform_with_capacity`
+are passed to the `perform_work` method.
+
+```ruby
+class ScheduleMyDummyCronWorker
+  include ApplicationWorker
+  include CronjobQueue
+
+  def perform(*args)
+    MyDummyWorker.perform_with_capacity(*args)
+  end
+end
+```
+
+## How many jobs are running?
+
+It runs `max_running_jobs` at almost all times.
+
+The cron worker checks the remaining capacity on each execution and it
+schedules at most `max_running_jobs` jobs. Those jobs on completion
+re-enqueue themselves immediately, but not on failure. The cron worker is in
+charge of replacing those failed jobs.
+
+## Handling errors and idempotence
+
+This concern disables Sidekiq retries, logs the errors, and sends the job to the
+dead queue. This is done to have only one source that produces jobs and because
+the retry would occupy a slot with a job to perform in the distant future.
+
+We let the cron worker enqueue new jobs, this could be seen as our retry and
+back off mechanism because the job might fail again if executed immediately.
+This means that for every failed job, we run at a lower capacity
+until the cron worker fills the capacity again. If it is important for the
+worker not to get a backlog, exceptions must be handled in `#perform_work` and
+the job should not raise.
+
+The jobs are deduplicated using the `:none` strategy, but the worker is not
+marked as `idempotent!`.
+
+## Metrics
+
+This concern exposes three Prometheus metrics of gauge type with the worker class
+name as label:
+
+- `limited_capacity_worker_running_jobs`
+- `limited_capacity_worker_max_running_jobs`
+- `limited_capacity_worker_remaining_work_count`
--- a/doc/development/sidekiq/logging.md
+++ b/doc/development/sidekiq/logging.md
+---
+stage: none
+group: unassigned
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Sidekiq logging
+
+## Worker context
+
+> [Introduced](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/9) in GitLab 12.8.
+
+To have some more information about workers in the logs, we add
+[metadata to the jobs in the form of an
+`ApplicationContext`](../logging.md#logging-context-metadata-through-rails-or-grape-requests).
+In most cases, when scheduling a job from a request, this context is already
+deducted from the request and added to the scheduled job.
+
+When a job runs, the context that was active when it was scheduled
+is restored. This causes the context to be propagated to any job
+scheduled from within the running job.
+
+All this means that in most cases, to add context to jobs, we don't
+need to do anything.
+
+There are however some instances when there would be no context
+present when the job is scheduled, or the context that is present is
+likely to be incorrect. For these instances, we've added Rubocop rules
+to draw attention and avoid incorrect metadata in our logs.
+
+As with most our cops, there are perfectly valid reasons for disabling
+them. In this case it could be that the context from the request is
+correct. Or maybe you've specified a context already in a way that
+isn't picked up by the cops. In any case, leave a code comment
+pointing to which context to use when disabling the cops.
+
+When you do provide objects to the context, make sure that the
+route for namespaces and projects is pre-loaded. This can be done by using
+the `.with_route` scope defined on all `Routable`s.
+
+### Cron workers
+
+The context is automatically cleared for workers in the cronjob queue
+(`include CronjobQueue`), even when scheduling them from
+requests. We do this to avoid incorrect metadata when other jobs are
+scheduled from the cron worker.
+
+Cron workers themselves run instance wide, so they aren't scoped to
+users, namespaces, projects, or other resources that should be added to
+the context.
+
+However, they often schedule other jobs that _do_ require context.
+
+That is why there needs to be an indication of context somewhere in
+the worker. This can be done by using one of the following methods
+somewhere within the worker:
+
+1. Wrap the code that schedules jobs in the `with_context` helper:
+
+   ```ruby
+     def perform
+       deletion_cutoff = Gitlab::CurrentSettings
+                           .deletion_adjourned_period.days.ago.to_date
+       projects = Project.with_route.with_namespace
+                    .aimed_for_deletion(deletion_cutoff)
+
+       projects.find_each(batch_size: 100).with_index do |project, index|
+         delay = index * INTERVAL
+
+         with_context(project: project) do
+           AdjournedProjectDeletionWorker.perform_in(delay, project.id)
+         end
+       end
+     end
+   ```
+
+1. Use the a batch scheduling method that provides context:
+
+   ```ruby
+     def schedule_projects_in_batch(projects)
+       ProjectImportScheduleWorker.bulk_perform_async_with_contexts(
+         projects,
+         arguments_proc: -> (project) { project.id },
+         context_proc: -> (project) { { project: project } }
+       )
+     end
+   ```
+
+   Or, when scheduling with delays:
+
+   ```ruby
+     diffs.each_batch(of: BATCH_SIZE) do |diffs, index|
+       DeleteDiffFilesWorker
+         .bulk_perform_in_with_contexts(index *  5.minutes,
+                                        diffs,
+                                        arguments_proc: -> (diff) { diff.id },
+                                        context_proc: -> (diff) { { project: diff.merge_request.target_project } })
+     end
+   ```
+
+### Jobs scheduled in bulk
+
+Often, when scheduling jobs in bulk, these jobs should have a separate
+context rather than the overarching context.
+
+If that is the case, `bulk_perform_async` can be replaced by the
+`bulk_perform_async_with_context` helper, and instead of
+`bulk_perform_in` use `bulk_perform_in_with_context`.
+
+For example:
+
+```ruby
+    ProjectImportScheduleWorker.bulk_perform_async_with_contexts(
+      projects,
+      arguments_proc: -> (project) { project.id },
+      context_proc: -> (project) { { project: project } }
+    )
+```
+
+Each object from the enumerable in the first argument is yielded into 2
+blocks:
+
+- The `arguments_proc` which needs to return the list of arguments the
+  job needs to be scheduled with.
+
+- The `context_proc` which needs to return a hash with the context
+  information for the job.
+
+## Arguments logging
+
+As of GitLab 13.6, Sidekiq job arguments are logged by default, unless [`SIDEKIQ_LOG_ARGUMENTS`](../../administration/troubleshooting/sidekiq.md#log-arguments-to-sidekiq-jobs)
+is disabled.
+
+By default, the only arguments logged are numeric arguments, because
+arguments of other types could contain sensitive information. To
+override this, use `loggable_arguments` inside a worker with the indexes
+of the arguments to be logged. (Numeric arguments do not need to be
+specified here.)
+
+For example:
+
+```ruby
+class MyWorker
+  include ApplicationWorker
+
+  loggable_arguments 1, 3
+
+  # object_id will be logged as it's numeric
+  # string_a will be logged due to the loggable_arguments call
+  # string_b will be filtered from logs
+  # string_c will be logged due to the loggable_arguments call
+  def perform(object_id, string_a, string_b, string_c)
+  end
+end
+```
--- a/doc/development/sidekiq/worker_attributes.md
+++ b/doc/development/sidekiq/worker_attributes.md
--- a/doc/development/sidekiq_style_guide.md
+++ b/doc/development/sidekiq_style_guide.md
--- a/doc/development/stage_group_dashboards.md
+++ b/doc/development/stage_group_dashboards.md
@@ -64,8 +64,8 @@ component can have 2 indicators:
   [customize the request Apdex](application_slis/rails_request_apdex.md), this new Apdex
   measurement is not yet part of the error budget.

-   For Sidekiq job execution, the threshold depends on the [job
-   urgency](sidekiq_style_guide.md#job-urgency). It is
+   For Sidekiq job execution, the threshold depends on the
+   [job urgency](sidekiq/worker_attributes.md#job-urgency). It is
   [currently](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/lib/sidekiq-helpers.libsonnet#L25-38)
   **10 seconds** for high-urgency jobs and **5 minutes** for other
   jobs.

--- a/doc/development/transient/prevention-patterns.md
+++ b/doc/development/transient/prevention-patterns.md
@@ -120,7 +120,7 @@ When there are 2 jobs being worked on at the same time, it is possible that the
 In this example, `Worker B` is meant to set the updated status. But `Worker A` calls `#update_state` a little too late.

 This can be avoided by utilizing either database locks or `Gitlab::ExclusiveLease`. This way, jobs will be
-worked on one at a time. This also allows them to be marked as [idempotent](../sidekiq_style_guide.md#idempotent-jobs).
+worked on one at a time. This also allows them to be marked as [idempotent](../sidekiq/idempotent_jobs.md).

 ### Retry mechanism handling