• akpm@linux-foundation.org's avatar
    memcg: fix mis-accounting of file mapped racy with migration · ac39cf8c
    akpm@linux-foundation.org authored
    FILE_MAPPED per memcg of migrated file cache is not properly updated,
    because our hook in page_add_file_rmap() can't know to which memcg
    FILE_MAPPED should be counted.
    
    Basically, this patch is for fixing the bug but includes some big changes
    to fix up other messes.
    
    Now, at migrating mapped file, events happen in following sequence.
    
     1. allocate a new page.
     2. get memcg of an old page.
     3. charge ageinst a new page before migration. But at this point,
        no changes to new page's page_cgroup, no commit for the charge.
        (IOW, PCG_USED bit is not set.)
     4. page migration replaces radix-tree, old-page and new-page.
     5. page migration remaps the new page if the old page was mapped.
     6. Here, the new page is unlocked.
     7. memcg commits the charge for newpage, Mark the new page's page_cgroup
        as PCG_USED.
    
    Because "commit" happens after page-remap, we can count FILE_MAPPED
    at "5", because we should avoid to trust page_cgroup->mem_cgroup.
    if PCG_USED bit is unset.
    (Note: memcg's LRU removal code does that but LRU-isolation logic is used
     for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is
     not on LRU or page_cgroup->mem_cgroup is NULL.)
    
    We can lose file_mapped accounting information at 5 because FILE_MAPPED
    is updated only when mapcount changes 0->1. So we should catch it.
    
    BTW, historically, above implemntation comes from migration-failure
    of anonymous page. Because we charge both of old page and new page
    with mapcount=0, we can't catch
      - the page is really freed before remap.
      - migration fails but it's freed before remap
    or .....corner cases.
    
    New migration sequence with memcg is:
    
     1. allocate a new page.
     2. mark PageCgroupMigration to the old page.
     3. charge against a new page onto the old page's memcg. (here, new page's pc
        is marked as PageCgroupUsed.)
     4. page migration replaces radix-tree, page table, etc...
     5. At remapping, new page's page_cgroup is now makrked as "USED"
        We can catch 0->1 event and FILE_MAPPED will be properly updated.
    
        And we can catch SWAPOUT event after unlock this and freeing this
        page by unmap() can be caught.
    
     7. Clear PageCgroupMigration of the old page.
    
    So, FILE_MAPPED will be correctly updated.
    
    Then, for what MIGRATION flag is ?
      Without it, at migration failure, we may have to charge old page again
      because it may be fully unmapped. "charge" means that we have to dive into
      memory reclaim or something complated. So, it's better to avoid
      charge it again. Before this patch, __commit_charge() was working for
      both of the old/new page and fixed up all. But this technique has some
      racy condtion around FILE_MAPPED and SWAPOUT etc...
      Now, the kernel use MIGRATION flag and don't uncharge old page until
      the end of migration.
    
    I hope this change will make memcg's page migration much simpler.  This
    page migration has caused several troubles.  Worth to add a flag for
    simplification.
    Reviewed-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
    Tested-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
    Reported-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
    Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Cc: Christoph Lameter <cl@linux-foundation.org>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ac39cf8c
page_cgroup.h 3.88 KB