• Liu Bo's avatar
    Btrfs: fix memory leak in reading btree blocks · de714a8a
    Liu Bo authored
    commit 2571e739 upstream.
    
    So we can read a btree block via readahead or intentional read,
    and we can end up with a memory leak when something happens as
    follows,
    1) readahead starts to read block A but does not wait for read
       completion,
    2) btree_readpage_end_io_hook finds that block A is corrupted,
       and it needs to clear all block A's pages' uptodate bit.
    3) meanwhile an intentional read kicks in and checks block A's
       pages' uptodate to decide which page needs to be read.
    4) when some pages have the uptodate bit during 3)'s check so
       3) doesn't count them for eb->io_pages, but they are later
       cleared by 2) so we has to readpage on the page, we get
       the wrong eb->io_pages which results in a memory leak of
       this block.
    
    This fixes the problem by firstly getting all pages's locking and
    then checking pages' uptodate bit.
    
       t1(readahead)                              t2(readahead endio)                                       t3(the following read)
    read_extent_buffer_pages                    end_bio_extent_readpage
      for pg in eb:                                for page 0,1,2 in eb:
          if pg is uptodate:                           btree_readpage_end_io_hook(pg)
              num_reads++                              if uptodate:
      eb->io_pages = num_reads                             SetPageUptodate(pg)              _______________
      for pg in eb:                                for page 3 in eb:                                     read_extent_buffer_pages
           if pg is NOT uptodate:                      btree_readpage_end_io_hook(pg)                       for pg in eb:
               __extent_read_full_page(pg)                 sanity check reports something wrong                 if pg is uptodate:
                                                           clear_extent_buffer_uptodate(eb)                         num_reads++
                                                               for pg in eb:                                eb->io_pages = num_reads
                                                                   ClearPageUptodate(page)  _______________
                                                                                                            for pg in eb:
                                                                                                                if pg is NOT uptodate:
                                                                                                                    __extent_read_full_page(pg)
    
    So t3's eb->io_pages is not consistent with the number of pages it's reading,
    and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative
    number so that we're not able to free the eb.
    Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
    de714a8a
extent_io.c 127 KB