• Craig Miskell's avatar
    Respect sidekiq timeout when hard-killing workers · d31730c3
    Craig Miskell authored
    As discovered in
    https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10930,
    the 5 second timeout can be too short as during normal shutdowns getppid
    returns "1" sooner than expected.  But even in a "real" failure case
    where the sidekiq-cluster process is terminated hard, we still need to
    respect the sidekiq timeout so that sidekiq will be able to wait for
    running jobs to complete (or termiante them and push them back into the
    queue) before being killed off.  Otherwise we end up with orphaned jobs
    that are only picked up by the reliable fetcher cleanup, up to an hour
    later.
    d31730c3
sidekiq_cluster.rb 917 Bytes