bigfile/pagemap: Fix non-leaf page iteration
Since the beginning of pagemap (45af76e6 "bigfile/pagemap: specialized {} uint64 -> void * mapping") we had a bug sitting in __pagemap_for_each_leaftab() (non-leaf iterating logic behind pagemap_for_each): After entry to stack-down was found, we did not updated tailv[l] accordingly. Thus if there are non-adjacent entries an entry could be e.g. emitted many times: l 3 __down 0x7f79da1ee000 tailv[4]: 0x7f79da1ee000 -> tailv[4] 0x7f79da1ee000 __down 0x7f79da1ed000 l 4 __down 0x7f79da1ed000 tailv[5]: 0x7f79da1ed000 h 5 l 5 leaftab: 0x7f79da1ed000 <-- lvl 5 idx 169 page 0x55aa ok 9 - pagemap_for_each(0) == 21930 l 5 __down (nil) tailv[4]: 0x7f79da1ee008 -> tailv[4] 0x7f79da1ee008 __down 0x7f79da1ed000 l 4 __down 0x7f79da1ed000 tailv[5]: 0x7f79da1ed000 h 5 l 5 leaftab: 0x7f79da1ed000 <-- lvl 5 idx 169 page 0x55aa not ok 10 - pagemap_for_each(1) == 140724106500272 And many-time-emitted entries are not only incorrect, but can also lead to not-handled segmentation faults in e.g. fileh_close(): https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.6-1-gb0b2c52/bigfile/virtmem.c#L179 /* drop all pages (dirty or not) associated with this fileh */ pagemap_for_each(page, &fileh->pagemap) { /* it's an error to close fileh to mapping of which an access is * currently being done in another thread */ BUG_ON(page->state == PAGE_LOADING); page_drop_memory(page); list_del(&page->lru); <-- HERE bzero(page, sizeof(*page)); /* just in case */ free(page); } ( because after first bzero of a page, the page is all 0 bytes including page->lru{.next,.prev} so on the second time when the same page is emitted by pagemap_for_each, list_del(&page->lru) will try to set page->lru.next = ... which will segfault. ) So fix it by properly updating tailv[l] while we scan/iterate current level. NOTE This applies only to non-leaf pagemap levels, as leaf level is scanned with separate loop in pagemap_for_each. That's why we probably did not noticed this earlier - up until now our usual workloads was to change data in adjacent batches and that means adjacent pages. Though today @Tyagov was playing with wendelin.core in some other way and it uncovered the bug.
Showing
Please register or sign in to comment