Commit edab619a authored by Patrick Steinhardt's avatar Patrick Steinhardt

gitaly: Fix access checks with transactions and quarantine environments

In order to check whether certain operations are allowed to be executed
by a user, Gitaly POSTs to the `/internal/allowed` endpoint. The request
includes information about what change the user wants to perform, but it
also contains information about the environment the change is currently
performed in.

When a user performs a push, git will first store all pushed objects
into a quarantine environment. This is a separate temporary directory
containing all new objects such that if the push gets rejected, new
objects will not persist in the repository. The crux is that in order to
inspect these new objects, git needs to be told that such a quarantine
environment exists. This is why Gitaly sends information about this
quarantine environment to `/internal/allowed`, so that we can again
relay this information back to Gitaly when we want to inspect newly
pushed objects to determine whether they're allowed or not.

While it's a leaky abstraction, it has worked fine until now. But with
transactions, that's not true anymore: when multiple Gitaly nodes take
part in a push, then they'll all generate a randomly named quarantine
environment. But as only the primary node will inject its info into the
request, we are not able to acces quarantine environments of secondary
nodes. If we now route accessor requests to any of the secondary Gitaly
nodes with the quarantine environment of the primary, then the request
will fail as git cannot find quarantined objects.

To fix this, Gitaly has recently grown a new GRPC header which allows us
to force-route requests to the primary via 1102b0b67 (praefect:
Implement force-routing to primary for node-manager router, 2021-02-03)
and 4d877d7d5 (praefect: Implement force-routing to primary for per-repo
router, 2021-02-03). So let's set that header if we know that we're
being executed via a hook, which is the only case where a quarantine
environment may exist.
parent 41a43001
---
title: 'gitaly: Fix access checks with transactions and quarantine environments'
merge_request: 53449
author:
type: fixed
...@@ -226,6 +226,7 @@ module Gitlab ...@@ -226,6 +226,7 @@ module Gitlab
metadata['username'] = context_data['meta.user'] if context_data&.fetch('meta.user', nil) metadata['username'] = context_data['meta.user'] if context_data&.fetch('meta.user', nil)
metadata['remote_ip'] = context_data['meta.remote_ip'] if context_data&.fetch('meta.remote_ip', nil) metadata['remote_ip'] = context_data['meta.remote_ip'] if context_data&.fetch('meta.remote_ip', nil)
metadata.merge!(Feature::Gitaly.server_feature_flags) metadata.merge!(Feature::Gitaly.server_feature_flags)
metadata.merge!(route_to_primary)
deadline_info = request_deadline(timeout) deadline_info = request_deadline(timeout)
metadata.merge!(deadline_info.slice(:deadline_type)) metadata.merge!(deadline_info.slice(:deadline_type))
...@@ -233,6 +234,24 @@ module Gitlab ...@@ -233,6 +234,24 @@ module Gitlab
{ metadata: metadata, deadline: deadline_info[:deadline] } { metadata: metadata, deadline: deadline_info[:deadline] }
end end
# Gitlab::Git::HookEnv will set the :gitlab_git_env variable in case we're
# running in the context of a Gitaly hook call, which may make use of
# quarantined object directories. We thus need to pass along the path of
# the quarantined object directory to Gitaly, otherwise it won't be able to
# find these quarantined objects. Given that the quarantine directory is
# generated with a random name, they'll have different names when multiple
# Gitaly nodes take part in a single transaction. As a result, we are
# forced to route all requests to the primary node which has injected the
# quarantine object directory to us.
def self.route_to_primary
return {} unless Gitlab::SafeRequestStore.active?
return {} unless Gitlab::SafeRequestStore[:gitlab_git_env]
{ 'gitaly-route-repository-accessor-policy' => 'primary-only' }
end
private_class_method :route_to_primary
def self.request_deadline(timeout) def self.request_deadline(timeout)
# timeout being 0 means the request is allowed to run indefinitely. # timeout being 0 means the request is allowed to run indefinitely.
# We can't allow that inside a request, but this won't count towards Gitaly # We can't allow that inside a request, but this won't count towards Gitaly
......
...@@ -296,6 +296,32 @@ RSpec.describe Gitlab::GitalyClient do ...@@ -296,6 +296,32 @@ RSpec.describe Gitlab::GitalyClient do
end end
end end
context 'gitlab_git_env' do
let(:policy) { 'gitaly-route-repository-accessor-policy' }
context 'when RequestStore is disabled' do
it 'does not force-route to primary' do
expect(described_class.request_kwargs('default', timeout: 1)[:metadata][policy]).to be_nil
end
end
context 'when RequestStore is enabled without git_env', :request_store do
it 'does not force-orute to primary' do
expect(described_class.request_kwargs('default', timeout: 1)[:metadata][policy]).to be_nil
end
end
context 'when RequestStore is enabled with git_env', :request_store do
before do
Gitlab::SafeRequestStore[:gitlab_git_env] = true
end
it 'enables force-routing to primary' do
expect(described_class.request_kwargs('default', timeout: 1)[:metadata][policy]).to eq('primary-only')
end
end
end
context 'deadlines', :request_store do context 'deadlines', :request_store do
let(:request_deadline) { real_time + 10.0 } let(:request_deadline) { real_time + 10.0 }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment