Commit ee9bcd00 authored by Kirill Smelkov's avatar Kirill Smelkov

bigfile/pagemap: Fix non-leaf page iteration

Since the beginning of pagemap (45af76e6 "bigfile/pagemap: specialized
{} uint64 -> void * mapping") we had a bug sitting in
__pagemap_for_each_leaftab() (non-leaf iterating logic behind
pagemap_for_each):

After entry to stack-down was found, we did not updated tailv[l]
accordingly. Thus if there are non-adjacent entries an entry could be
e.g. emitted many times:

     l 3  __down 0x7f79da1ee000
     tailv[4]: 0x7f79da1ee000
      -> tailv[4] 0x7f79da1ee000  __down 0x7f79da1ed000

     l 4  __down 0x7f79da1ed000
     tailv[5]: 0x7f79da1ed000
     h 5  l 5  leaftab: 0x7f79da1ed000      <--
      lvl 5  idx 169  page 0x55aa
    ok 9 - pagemap_for_each(0) == 21930

     l 5  __down (nil)
     tailv[4]: 0x7f79da1ee008
      -> tailv[4] 0x7f79da1ee008  __down 0x7f79da1ed000

     l 4  __down 0x7f79da1ed000
     tailv[5]: 0x7f79da1ed000
     h 5  l 5  leaftab: 0x7f79da1ed000      <--
      lvl 5  idx 169  page 0x55aa
    not ok 10 - pagemap_for_each(1) == 140724106500272

And many-time-emitted entries are not only incorrect, but can also lead
to not-handled segmentation faults in e.g. fileh_close():

    https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.6-1-gb0b2c52/bigfile/virtmem.c#L179

    /* drop all pages (dirty or not) associated with this fileh */
    pagemap_for_each(page, &fileh->pagemap) {
        /* it's an error to close fileh to mapping of which an access is
         * currently being done in another thread */
        BUG_ON(page->state == PAGE_LOADING);
        page_drop_memory(page);
        list_del(&page->lru);                           <-- HERE
        bzero(page, sizeof(*page)); /* just in case */
        free(page);
    }

( because after first bzero of a page, the page is all 0 bytes including
  page->lru{.next,.prev} so on the second time when the same page is
  emitted by pagemap_for_each, list_del(&page->lru) will try to set
  page->lru.next = ... which will segfault. )

So fix it by properly updating tailv[l] while we scan/iterate current level.

NOTE

This applies only to non-leaf pagemap levels, as leaf level is scanned
with separate loop in pagemap_for_each. That's why we probably did not
noticed this earlier - up until now our usual workloads was to change
data in adjacent batches and that means adjacent pages.

Though today @Tyagov was playing with wendelin.core in some other way and
it uncovered the bug.
parent b0b2c52e
......@@ -214,6 +214,9 @@ int main()
{I2-1, 199},
{I2, 200},
{I2+1, 201},
/* NOTE on not adjacent page to I2 */
{2*I2+100, 777},
};
......
......@@ -193,17 +193,16 @@ void pagemap_clear(PageMap *pmap);
if (l < __height) { \
\
/* loop over entries in tab - continuing from where we stopped */ \
for (void ***__tail = tailv[l], \
**__tab_prev = PTR_POINTER(*tailv[l-1]), \
for ( void **__tab_prev = PTR_POINTER(*tailv[l-1]), \
**__tab; \
\
(void **)__tail - __tab_prev < PAGEMAP_LEVEL_ENTRIES; \
(void **)tailv[l] - __tab_prev < PAGEMAP_LEVEL_ENTRIES; \
\
++__tail \
++tailv[l] \
) \
\
/* load entry; no entry -> next entry */ \
if (!(__tab = *__tail)) \
if (!(__tab = *tailv[l])) \
continue; \
\
/* go down 1 level with tab */ \
......
  • @Tyagov, I've applied this to your instance and it fixed zope segfault I was seeing there. At least now I can invoke the activity and Zope does not crash. Please let me know how it goes for you. If ok, I think we should try to upgrade all our instances with this fix...

    /cc @klaus

  • @kirr , thanks. I will check today and we see.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment