• Kirill Smelkov's avatar
    Teach gitlab-workhorse to serve requests to get raw blobs · 3dfb8ed5
    Kirill Smelkov authored
    Currently GitLab serves requests to get raw blobs via Ruby-on-Rails code and
    Unicorn. Because RoR/Unicorn is relatively heavyweight, in environment where
    there are a lot of simultaneous requests to get raw blobs, this works very slow
    and server is constantly overloaded.
    
    On the other hand, to get raw blob content, we do not need anything from RoR
    framework - we only need to have access to project git repository on filesystem,
    and knowing whether access for getting data from there should be granted or
    not. That means it is possible to handle '.../raw/....' request directly
    in more lightweight and performant gitlab-workhorse.
    
    As gitlab-workhorse is written in Go, and Go has good concurrency/parallelism
    support and is generally much faster than Ruby, moving raw blob serving task to
    it makes sense and should be a net win.
    
    In this patch: we add infrastructure to process GET request for '/raw/...':
    
    - extract project / ref and path from URL
    - query auth backend for whether download access should be granted or not
    - emit blob content via spawning external `git cat-file`
    
    I've tried to mimic the output to be as close as the one emitted by RoR code,
    with the idea that for users the change should be transparent.
    
    As in this patch we do auth backend query for every request to get a blob, RoR
    code is still loaded very much, so essentially there is no speedup yet:
    
      (on a 8-CPU i7-3770S with 16GB of RAM, 2001:67c:1254:e:8b::c776 is on localhost)
    
      # without patch: request eventually goes to unicorn  (9 unicorn workers)
      $ ./wrk -c40 -d10 -t1 --latency http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
      Running 10s test @ http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
        1 threads and 40 connections
        Thread Stats   Avg      Stdev     Max   +/- Stdev
          Latency   461.16ms   63.44ms 809.80ms   84.18%
          Req/Sec    84.84     17.02   131.00     80.00%
        Latency Distribution
           50%  460.21ms
           75%  492.83ms
           90%  524.67ms
           99%  636.49ms
        847 requests in 10.01s, 1.57MB read
      Requests/sec:     84.64
      Transfer/sec:    161.10KB
    
      # with this patch: request handled by gitlab-workhorse
      $ ./wrk -c40 -d10 -t1 --latency http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
      Running 10s test @ http://[2001:67c:1254:e:8b::c776]:7777/nexedi/slapos/raw/master/software/wendelin/software.cfg
        1 threads and 40 connections
        Thread Stats   Avg      Stdev     Max   +/- Stdev
          Latency   458.42ms   66.26ms 766.12ms   84.76%
          Req/Sec    85.38     16.59   120.00     82.00%
        Latency Distribution
           50%  459.26ms
           75%  490.09ms
           90%  523.95ms
           99%  611.33ms
        853 requests in 10.01s, 1.51MB read
      Requests/sec:     85.18
      Transfer/sec:    154.90KB
    
    In the next patch we'll cache requests to auth backend and that will improve
    performance dramatically.
    
    NOTE 20160228: there is internal/git/blob.go trying to get raw data via
        gitlab-workhorse, but still asking Unicorn about blob->sha1 mapping
        etc. That work started in
    
            86aaa133 (Prototype blobs via workhorse, @jacobvosmaer)
    
        and was inspired by this patch. It goes out of line compared to what
        we can do if we serve all blob data just by gitlab-workhorse (see
        next patch), so we just avoid git/blob.go and put our stuff into
        git/xblob.go and tweak routes, essentially deactivating git/blob.go
        code.
    3dfb8ed5
main_test.go 23.8 KB