• Nick Thomas's avatar
    Add a bulk processor for ES incremental updates · a65928cf
    Nick Thomas authored
    Currently, we store bookkeeping information for the elasticsearch index
    in sidekiq jobs. There are four types of information:
    
    * Backfill indexing for repositories
    * Backfill indexing for database records
    * Incremental indexing for repositories
    * Incremental indexing for database records
    
    The first three use elasticsearch bulk requests when indexing. The last
    does not.
    
    This commit introduces a system that uses bulk requests when indexing
    incremental changes to database records. This is done by adding the
    bookkeeping information to a Redis ZSET, rather than enqueuing sidekiq
    jobs for each change. A Sidekiq cron worker takes batches from the ZSET
    and submits them to elasticsearch via the bulk API.
    
    This reduces the responsiveness of indexing slightly, but also reduces
    the cost of indexing, both in terms of the load on Elasticsearch, and
    the size of the bookkeeping information.
    
    Since we're using a ZSET, we also get deduplication of work for free.
    a65928cf
34086-es-bulk-incremental-index-updates.yml 113 Bytes