Commit 08a21a2f authored by Kirill Smelkov's avatar Kirill Smelkov

X Demo how to support private raw URLs on lab.nexedi.com with pristine gitlab

parent 1fdc23cb
...@@ -22,12 +22,13 @@ try: ...@@ -22,12 +22,13 @@ try:
# Python 3 # Python 3
from urllib.error import HTTPError from urllib.error import HTTPError
from urllib.request import Request, urlopen from urllib.request import Request, urlopen
from urllib.parse import urlparse, urlunparse from urllib.parse import urlparse, urlunparse, quote, urlencode
except ImportError: except ImportError:
# Python 2 # Python 2
from urlparse import urlparse from urlparse import urlparse
from urlparse import urlunparse from urlparse import urlunparse
from urllib2 import HTTPError, Request, urlopen from urllib2 import HTTPError, Request, urlopen, quote
from urllib import urlencode
from zc.buildout.easy_install import realpath from zc.buildout.easy_install import realpath
from base64 import b64encode from base64 import b64encode
...@@ -284,7 +285,70 @@ class Download(object): ...@@ -284,7 +285,70 @@ class Download(object):
if auth: if auth:
return '{0}:{2}'.format(*auth), url return '{0}:{2}'.format(*auth), url
# _labraw_authproxy rewrites URLs to raw files on lab.nexedi.com to contain
# private_token fetched from netrc database.
#
# For example if ~/.netrc contains
#
# machine lab.nexedi.com/kirr/pygolang
# login private_token
# password XXXX
#
# then
#
# https://lab.nexedi.com/kirr/pygolang/-/raw/master/golang/libgolang.h
#
# is rewritten as
#
# https://lab.nexedi.com/api/v4/projects/kirr%2Fpygolang/repository/files/golang%2Flibgolang.h/raw/?ref=master&private_token=XXXX
#
# if the url is not of lab.nexedi.com/raw kind, or if no authentication is
# provided, then original url is returned as is.
#
# NOTE gitlab accepts both projectID and project string in /projects/:id
#
# https://gitlab.com/gitlab-org/gitlab-foss/-/issues/28342
#
# NOTE the machine section of netrc can contain both hostname and path
# which is useful to be able to use project-specific tokens
#
# NOTE see the following links for the details of how to fetch raw file content via gitlab API
#
# https://ericsysmin.com/2024/02/13/accessing-raw-files-on-authenticated-gitlab/
# https://docs.gitlab.com/ee/api/repository_files.html#get-raw-file-from-repository
def _labraw_authproxy(self, url): # -> url'
p = urlparse(url)
if p.hostname != 'lab.nexedi.com':
  • I think for historical reasons, you could include 'lab.node.vifib.com', so older SR can work (no idea if this is useful or not, just saying)

    However I wonder if we could not get it from environ, so we don't hardcore our lab.nexedi.com:

    if p.hostname not in os.environ.get('GITLAB_COMPATIBLE_DOMAIN', 'lab.nexedi.com').split(':'):
      return url

    or something similar.

    Regards, Rafael Monnerat

    PS: Not exactly like this but with some way to configure it.

    Edited by Rafael Monnerat
  • Rafael, thanks for feedback. Sure my patch was only a demonstration and including lab.node.vifib.com and maybe also other our sites should be done. Speaking about environment - indeed the decision whether to go gitlab way or not could be made dependeing on what we see in netrc: if netrc for host/path comes with gitlab-specific private_token or similar, then gitlab mode is activated.

  • An environment variable seems not really practical, because this happens when running non-interactively in slapos node software.

    I feel that we don't even need to check the hostname, if the URL path looks like /user/repo/raw/... we can always assume it's a gitlab URL. We can do something like this:

    • if path matches /user/repo/raw/*:
      • try to download using gitlab raw API
      • if it fails, fallback to downloading original URL
  • Hi Jérôme,

    An environment variable seems not really practical, because this happens when running non-interactively in slapos node software.

    I share this feeling.

    Also, I'm afraid that some other forges may use this URL pattern, and may not use the gitlab auth system. For exemple, my gitea follows this URL pattern. There may be more.

    I feel that the fallback should be to gitlab auth, and not from gitlab auth. But in practice, for our use case, private downloads must happen almost exclusively from gitlab.

  • Thanks, I was not sure in which order we should try and fallback, I wrote gitlab API first because I was thinking that this way we don't make 2 requests when we have to use gitlab API, but that sounds silly now, the typical pattern is that we have one or two extends that are protected URL and all other that are public.

    My next idea was:

    • try to download
    • if it failed¹ and the path matches /user/repo/raw/* and we have credentials:
      • try to download using gitlab raw API

    note ¹ : in the case of gitlab, an unauthorized /raw/ request is a 302 redirect response to login page . In the current production version, the redirect is on localhost ( http://localhost:8080/users/sign_in ), but in the upcoming version it is the correct login page ( https://lab.nexedi.com/users/sign_in ), served by a 200 OK response, so it's not a download error at HTTP level. I guess what we can do here is just be more clever when detecting failures, maybe checking the URL matches /user/repo/raw/* and the downloaded content looks like login page, or adjusting the redirect handler to raise an error when redirected to login page.

    ( cc @yusei not sure you see this thread )

Please register or sign in to reply
return url
pathv = p.path.split('/') # ['', 'kirr', 'pygolang', '-', 'raw', 'master', 'golang', 'libgolang.h']
if pathv[3:5] != ['-', 'raw']:
return url
  • I think the - is still optional at this point, an URL like https://lab.nexedi.com/kirr/pygolang/raw/master/golang/libgolang.h is working and this is what we are still using.

    Maybe we need to change to something like

    if pathv[3:5] != ['-', 'raw'] or pathv[3:4] != ['raw']:

  • @jerome, thanks for feedback. Sure, this is only a draft patch illustrating the idea. It is better to support URLs without - and maybe it is also better to pass private token via headers instead of query so that credentials do not get logged somewhere accidentally. And going this way I thought that maybe there is also a way to just add a header without rewriting URL at all. This should work if unpatched gitlab supports /raw/ URLs with private cookie in the header somehow. But if not we can always stay with rewriting the URL while preserving backward compatibility for users.

    My primary message with this patch is that there is a way to do things simply while preserving backward compatibility.

  • +1 to pass private token via headers instead and thank you very much for this

  • Thanks, Jérome. You are welcome.

Please register or sign in to reply
repo = '/'.join(pathv[1:3]) # kirr/pygolang
auth = netrc.authenticators('%s/%s' % (p.hostname, repo)) # auth for lab.nexedi.com/kirr/pygolang
if auth is None:
return url
# FIXME this does not support refs like y/bstr.
# To support this we will need to do what
# https://lab.nexedi.com/nexedi/gitlab-workhorse/commit/5b8cf10e
# was doing - try to extract all variants for ref from longest to
# shortest and stop on the first variant thay yields good result.
ref = pathv[5] # master
filepath = '/'.join(pathv[6:]) # golang/libgolang.h
qrepo = quote(repo, '') # kirr%2Fpygolang
qfilepath = quote(filepath, '') # golang%2Flibgolang.h
path = '/api/v4/projects/%s/repository/files/%s/raw' % (qrepo, qfilepath)
query = urlencode({'ref': ref, auth[0]: auth[2]})
return urlunparse((p.scheme, p.netloc, path, p.params, query, p.fragment))
def urlretrieve(self, url, tmp_path): def urlretrieve(self, url, tmp_path):
url = self._labraw_authproxy(url)
auth = self._auth(url) auth = self._auth(url)
if auth: if auth:
req = Request(auth[1]) req = Request(auth[1])
......
  • This approach seems really good, I can give it a try next week, unless someone ( @alain.takoudjou ?) can also check

  • Thanks, Jérome.

  • Yes, @jerome I'll try. Thank you @kirr

  • You are welcome, Alain. Good luck with trying.

  • Thank you very much for this patch, Kirill. It is always a pleasure to work with you.

  • Thank you, @kirr !

  • Nicolas, Klaus, thanks for feedback and for kind words. You are welcome. I'm glad if this little fix could be useful.

  • @kirr can you please support username/token inside URL rather than .netrc as we had before, like https://gitlab+deploy-token-x:deploy_token_here@lab.nexedi.com/... ? This way will be quite useful especially for configuring automated test suite for the private repository.

  • @kazuhiko this was also my feeling (discussed on the forum), using netrc if found and falling back on credentials from url.

    But @kirr will not finalize the patch (unless you changed your mind and want to ;)) it will probably be @alain.takoudjou or me as part of our work on updating lab.nexedi.com

  • This way will be quite useful especially for configuring automated test suite for the private repository.

    I was about to say that the patch only affect /raw/ get request and that automated test suite use git clone protocol which supports credentials, but I think you have a project where a private slapos profile extends another private slapos profile ( @Nicolas showed me one and we looked at it but it does not seem needed, the profiles just have xmlsec that is now in public slapos ). I was thinking we keep support for credentials in URL only for backward compatibility, with the idea that it is becoming a deprecated bad practice. In any case, it's not good to have credentials in URL, because they leak easily and it's hard to change them.

    If we need to keep support, then maybe slapos node from testnode could also handle the netrc, but I hope we don't need it.

  • ( this is a public repository, if we need to discuss details we should use the forum )

  • @kazuhiko, thanks for asking. If needed credentials in the URL can be indeed supported and the amendment to hereby patch should be quite easy I think: first check netrc - if credentials are there, use those; if not - check credentials in the URL and, if present, use auth from there.

    I believe it would be good to start from having "credentials in the URL" supported, if they are present, and emit warning, so that people know they will need to move secrets out of their URLs to external storage, but things continue to work without any breakage for some time.

    As @jerome explained by default I'm not active anymore on the particular code development on hereby topic. I primarily wanted to show "there is a way" to do things with backward compatibility, but once shown and accepted - the details of this - I prefer to handover to lab people and switch back to focus on my nexedi/pygolang!21 and LTE topics.

    If lab people really need help I think I should be able to help, but personally I believe for further steps Jérome and Alain can do on their own for now. Please let me know if it is not the case and we need my further involvement.

    Kirill

    Edited by Kirill Smelkov
  • mentioned in merge request nexedi/slapos.buildout!34 (merged)

    Toggle commit list
  • for reference, this is submitted in 08a21a2f

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment