Commit 31fba5e0 authored by Patrick Steinhardt's avatar Patrick Steinhardt

checks: Convert FileSizeCheck push rule to allow for batching

The FileSizeCheck push rule loads all new blobs for a given change and
checks whether any of these blobs is bigger than a certain threshold.
This push rule is quite expensive: loading new blobs requires a revwalk
of all preexisting references and thus scales with the number of refs in
a repository. This may easily take a few dozen seconds to compute, and
repeating this computation for each change doesn't help either.

Refactor the FileSizeCheck to be implemented atop the BaseBulkChecker
such that we can fix this shortcoming in the future: instead of loading
blobs for each change, we may load them once for all changes. While this
would already amortize the costs because we have to perform the walk of
negative refs once only, we can eventually tweak this even further and
use the quarantine directory to enumerate all pushed objects directly
without doing a revwalk at all. This optimization will bring down the
time to load blobs from dozens of seconds to a few milliseconds.

Note that this commit doesn't yet change behaviour, but instead only
prepares the infrastructure to allow for batch loading. Batch loading
will require some additional changes to our infrastructure first, which
will be done in a separate patch series.
parent 885ae6e9
......@@ -33,9 +33,7 @@ module EE
# @return [Nil] returns nil unless an error is raised
# @raise [Gitlab::GitAccess::ForbiddenError] if check fails
def check_file_size!
changes_access.single_change_accesses.each do |single_change_access|
PushRules::FileSizeCheck.new(single_change_access).validate!
end
PushRules::FileSizeCheck.new(changes_access).validate!
nil
end
......
......@@ -4,7 +4,7 @@ module EE
module Gitlab
module Checks
module PushRules
class FileSizeCheck < ::Gitlab::Checks::BaseSingleChecker
class FileSizeCheck < ::Gitlab::Checks::BaseBulkChecker
LOG_MESSAGE = "Checking if any files are larger than the allowed size..."
def validate!
......@@ -12,14 +12,21 @@ module EE
logger.log_timed(LOG_MESSAGE) do
max_file_size = push_rule.max_file_size
blobs = project.repository.new_blobs(newrev, dynamic_timeout: logger.time_left)
large_blob = blobs.find do |blob|
::Gitlab::Utils.bytes_to_megabytes(blob.size) > max_file_size
end
changes_access.changes.each do |change|
newrev = change[:newrev]
next if newrev.blank? || ::Gitlab::Git.blank_ref?(newrev)
blobs = project.repository.new_blobs(newrev, dynamic_timeout: logger.time_left)
large_blob = blobs.find do |blob|
::Gitlab::Utils.bytes_to_megabytes(blob.size) > max_file_size
end
if large_blob
raise ::Gitlab::GitAccess::ForbiddenError, %Q{File "#{large_blob.path}" is larger than the allowed size of #{max_file_size} MB}
if large_blob
raise ::Gitlab::GitAccess::ForbiddenError, %Q{File "#{large_blob.path}" is larger than the allowed size of #{max_file_size} MB}
end
end
end
end
......
......@@ -5,6 +5,29 @@ require 'spec_helper'
RSpec.describe EE::Gitlab::Checks::PushRules::FileSizeCheck do
include_context 'push rules checks context'
let(:changes) do
[
# Update of existing branch
{ oldrev: oldrev, newrev: newrev, ref: ref },
# Creation of new branch
{ newrev: newrev, ref: 'refs/heads/something' },
# Deletion of branch
{ oldrev: oldrev, ref: 'refs/heads/deleteme' }
]
end
let(:changes_access) do
Gitlab::Checks::ChangesAccess.new(
changes,
project: project,
user_access: user_access,
protocol: protocol,
logger: logger
)
end
subject { described_class.new(changes_access) }
describe '#validate!' do
let(:push_rule) { create(:push_rule, max_file_size: 1) }
# SHA of the 2-mb-file branch
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment