1. 02 Aug, 2016 4 commits
    • Kirill Smelkov's avatar
      NXD blob/auth: Teach it to handle HTTP Basic Auth too · 3f8faed7
      Kirill Smelkov authored
      [ Not sent upstream.
      
        The patch was not sent upstream, because previous 2 raw blob patches
        were not accepted (see details there).
      
        OTOH it is very handy in SlapOS environment to use CI token auth for
        raw downloading, so just carry with us as NXD. ]
      
      There are cases when using user:password for /raw/... access is handy:
      
      - when using query for auth (private_token) is not convenient for some
        reason (e.g. client processing software does not handle queries well
        when generating URLs)
      
      - when we do not want to organize many artificial users and use their
        tokens, but instead just use per-project automatically setup
      
          gitlab-ci-token : <ci-token>
      
        artificial user & "password" which are already handled by auth backend
        for `git fetch` requests.
      
      Handling is easy: if main auth backend rejects access, and there is
      user:password in original request, we retry asking auth backend the way
      as `git fetch` would do.
      
      Access is granted if any of two ways to ask auth backend succeeds. This
      way both private tokens / cookies and HTTP auth are supported.
      3f8faed7
    • Kirill Smelkov's avatar
      NXD blob/auth: Cache auth backend reply for 30s · 19649275
      Kirill Smelkov authored
      [ Sent upstream: https://gitlab.com/gitlab-org/gitlab-workhorse/merge_requests/17
      
        This patch was sent upstream but was not accepted for "complexity"
        reason of auth cache, despite that provides more than an order of magnitude
        speedup. Just carry it with us as NXD ]
      
      In previous patch we added code to serve blob content via running `git cat-file
      ...` directly, but for every such request a request to slow RoR-based auth
      backend is made, which is bad for performance.
      
      Let's cache auth backend reply for small period of time, e.g. 30 seconds, which
      will change the situation dramatically:
      
      If we have a lot of requests to the same repository, we query auth backend only
      for every Nth request and with e.g. 100 raw blob request/s N=3000 which means
      that previous load to RoR code essentially goes away.
      
      On the other hand as we query auth backend only once in a while and refresh the
      cache, we will not miss potential changes in project settings. I mean potential
      e.g. 25 seconds delay for a project to become public, or vise versa to become
      private does no real harm.
      
      The cache is done with the idea to allow the read side codepath to execute in
      parallel and to be not blocked by eventual cache updates.
      
      Overall this improves performance a lot:
      
        (on a 8-CPU i7-3770S with 16GB of RAM, 2001:67c:1254:e:8b::c776 is on localhost)
      
        # request is handled by gitlab-workhorse, but without auth caching
        $ ./wrk -c40 -d10 -t1 --latency http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
        Running 10s test @ http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
          1 threads and 40 connections
          Thread Stats   Avg      Stdev     Max   +/- Stdev
            Latency   458.42ms   66.26ms 766.12ms   84.76%
            Req/Sec    85.38     16.59   120.00     82.00%
          Latency Distribution
             50%  459.26ms
             75%  490.09ms
             90%  523.95ms
             99%  611.33ms
          853 requests in 10.01s, 1.51MB read
        Requests/sec:     85.18
        Transfer/sec:    154.90KB
      
        # request goes to gitlab-workhorse with auth caching (this patch)
        $ ./wrk -c40 -d10 -t1 --latency http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
        Running 10s test @ http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
          1 threads and 40 connections
          Thread Stats   Avg      Stdev     Max   +/- Stdev
            Latency    34.52ms   19.28ms 288.63ms   74.74%
            Req/Sec     1.20k   127.21     1.39k    85.00%
          Latency Distribution
             50%   32.67ms
             75%   42.73ms
             90%   56.26ms
             99%   99.86ms
          11961 requests in 10.01s, 21.24MB read
        Requests/sec:   1194.51
        Transfer/sec:      2.12MB
      
      i.e. it is ~ 14x improvement.
      19649275
    • Kirill Smelkov's avatar
      fixup! NXD Teach gitlab-workhorse to serve requests to get raw blobs · 387a2d45
      Kirill Smelkov authored
      During 0.6.4..0.6.5 upstream reworked the way request about downloading
      archive is replied. Before it was json in body, after it is json in
      headers handled via so-called "senddata" workhorse mechanism:
      
          https://gitlab.com/gitlab-org/gitlab-workhorse/commit/153527fb
      
      Adjust our patch accordingly about requesting whether it is ok to
      download from repository or not.
      387a2d45
    • Kirill Smelkov's avatar
      NXD Teach gitlab-workhorse to serve requests to get raw blobs · 3de00474
      Kirill Smelkov authored
      [ Sent upstream: https://gitlab.com/gitlab-org/gitlab-workhorse/merge_requests/17
      
        This patch was sent upstream but was not accepted for "complexity"
        reason of auth cache (next patch), despite that provides more than an
        order of magnitude speedup. Just carry it with us as NXD ]
      
      Currently GitLab serves requests to get raw blobs via Ruby-on-Rails code and
      Unicorn. Because RoR/Unicorn is relatively heavyweight, in environment where
      there are a lot of simultaneous requests to get raw blobs, this works very slow
      and server is constantly overloaded.
      
      On the other hand, to get raw blob content, we do not need anything from RoR
      framework - we only need to have access to project git repository on filesystem,
      and knowing whether access for getting data from there should be granted or
      not. That means it is possible to handle '.../raw/....' request directly
      in more lightweight and performant gitlab-workhorse.
      
      As gitlab-workhorse is written in Go, and Go has good concurrency/parallelism
      support and is generally much faster than Ruby, moving raw blob serving task to
      it makes sense and should be a net win.
      
      In this patch: we add infrastructure to process GET request for '/raw/...':
      
      - extract project / ref and path from URL
      - query auth backend for whether download access should be granted or not
      - emit blob content via spawning external `git cat-file`
      
      I've tried to mimic the output to be as close as the one emitted by RoR code,
      with the idea that for users the change should be transparent.
      
      As in this patch we do auth backend query for every request to get a blob, RoR
      code is still loaded very much, so essentially there is no speedup yet:
      
        (on a 8-CPU i7-3770S with 16GB of RAM, 2001:67c:1254:e:8b::c776 is on localhost)
      
        # without patch: request eventually goes to unicorn  (9 unicorn workers)
        $ ./wrk -c40 -d10 -t1 --latency http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
        Running 10s test @ http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
          1 threads and 40 connections
          Thread Stats   Avg      Stdev     Max   +/- Stdev
            Latency   461.16ms   63.44ms 809.80ms   84.18%
            Req/Sec    84.84     17.02   131.00     80.00%
          Latency Distribution
             50%  460.21ms
             75%  492.83ms
             90%  524.67ms
             99%  636.49ms
          847 requests in 10.01s, 1.57MB read
        Requests/sec:     84.64
        Transfer/sec:    161.10KB
      
        # with this patch: request handled by gitlab-workhorse
        $ ./wrk -c40 -d10 -t1 --latency http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
        Running 10s test @ http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
          1 threads and 40 connections
          Thread Stats   Avg      Stdev     Max   +/- Stdev
            Latency   458.42ms   66.26ms 766.12ms   84.76%
            Req/Sec    85.38     16.59   120.00     82.00%
          Latency Distribution
             50%  459.26ms
             75%  490.09ms
             90%  523.95ms
             99%  611.33ms
          853 requests in 10.01s, 1.51MB read
        Requests/sec:     85.18
        Transfer/sec:    154.90KB
      
      In the next patch we'll cache requests to auth backend and that will improve
      performance dramatically.
      
      NOTE 20160228: there is internal/git/blob.go trying to get raw data via
          gitlab-workhorse, but still asking Unicorn about blob->sha1 mapping
          etc. That work started in
      
              86aaa133 (Prototype blobs via workhorse, @jacobvosmaer)
      
          and was inspired by this patch. It goes out of line compared to what
          we can do if we serve all blob data just by gitlab-workhorse (see
          next patch), so we just avoid git/blob.go and put our stuff into
          git/xblob.go and tweak routes, essentially deactivating git/blob.go
          code.
      3de00474
  2. 21 Mar, 2016 1 commit
  3. 07 Mar, 2016 1 commit
  4. 04 Mar, 2016 1 commit
  5. 03 Mar, 2016 1 commit
  6. 24 Feb, 2016 2 commits
  7. 17 Feb, 2016 7 commits
  8. 12 Feb, 2016 1 commit
  9. 11 Feb, 2016 3 commits
  10. 08 Feb, 2016 3 commits
  11. 02 Feb, 2016 6 commits
  12. 01 Feb, 2016 8 commits
  13. 28 Jan, 2016 1 commit
  14. 26 Jan, 2016 1 commit