Commits · d54aa63d0d234621966655bbac4761e47f81ce5b · Levin Zimmermann / wendelin.core

04 Nov, 2024 2 commits

wcfs: Fix false alarm about faulty client whereas client just restarted · d54aa63d
Levin Zimmermann authored Nov 04, 2024
```
This patch fixes false alarm issue which tests were added in previous
patch 775adf73.
```
d54aa63d

wcfs: Add tests to show wcfs kills dead clients · 775adf73

Levin Zimmermann authored Nov 03, 2024

When a WCFS client doesn't respond in time to a pin request by
WCFS, the server attempts to kill this client [1]. However, there is
a possibility that after a server sends a pin request, a client stops
due to other reasons (for instance: restarted by other program). Then
the server shouldn't attempt to kill the client as this emits false
alarm logs about a faulty client whereas client isn't faulty but just
restarted at pin time.

[1] See nexedi/wendelin.core@c559ec1a

775adf73

19 Sep, 2024 2 commits

wendelin.core v2.0.alpha4 · db6fea3d
Levin Zimmermann authored Sep 19, 2024
```
/reviewed-by @kirr
/reviewed-on !32
```
db6fea3d

wcfs: v↑ NEO/go dependency · ed6a71c1

Levin Zimmermann authored Sep 19, 2024

This patch updates NEO/go to include a patch that fixes
non-deterministic crashs for 'wendelin.core'. Its tradeoff is however a
moderate memory leak. This leak is going to be fixed later when a new
golang version is released. See more details about this here:
neo@ee23551d

/reviewed-by @kirr
/reviewed-on nexedi/wendelin.core!32

ed6a71c1

18 Sep, 2024 1 commit

wcfs: Clarify error context when WatchLink.sendReq is waiting for reply · 39d53cbb

Kirill Smelkov authored Sep 18, 2024

sendReq has two phases: a) send request, and b) read reply. When there
is an error on the first phase, e.g. client does not read what wcfs is
trying to send, it returns an error like

    pin #2 @03fb63abd6d65b33: sendReq: send .2: context deadline exceeded

however when there is an error on the second phase, e.g. client does not
reply to wcfs request, it currently returns an error like

    pin #2 @03fb63abd6d65b33: sendReq: context deadline exceeded

which is not clear to interpret about which part was problematic.

After this patch the error for the second case becomes

    pin #2 @03fb63abd6d65b33: sendReq: waiting for reply: context deadline exceeded

which is easier to interpret.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!31

39d53cbb

17 Sep, 2024 28 commits

wcfs: v↑ go dependencies · 764d4da8

Levin Zimmermann authored Sep 17, 2024

This patch updates:

- github.com/golang/glog: we already wanted to do so in
    nexedi/wendelin.core!23,
    but we deferred it to keep go 1.18 support. However in recent patches
    we already dropped go 1.18 support and we can therefore update glog now.
- lab.nexedi.com/kirr/neo/go: add fix in handshake, see here for more information:
    neo@d75f4ac2 and
    neo@03db1d8a

This patch doesn't update:

- github.com/hanwen/go-fuse: This was updated upstream and Kirill already
    reviewed and integrated patches in custom branch. However when updating
    go-fuse to v2.4.3-0.20240904154523-9546fc238dc6 (this is
    go-fuse@9546fc23),
    WCFS tests fail on my machine [1] => let's defer update
- github.com/kisielk/og-rek: there are new patches that will be needed
    in the future, but we didn't update NEO/go og-rek dependency yet,
    so let's defer the update in wendelin.core until we updated og-rek
    in NEO/go
- github.com/johncgriffin/overflow: no update on upstream
- github.com/pkg/errors: no update on upstream
- github.com/stretchr/testify: This was already updated with
    nexedi/wendelin.core@c559ec1a
    'testify' seems to have a major release in the future which may break
    some of our test code, but for now major version 1 is still the
    stable release.

----
kirr: I confirm that
go-fuse@9546fc23 brings in
regression to WCFS tests. It seems I missed some error in that go-fuse
update and it will need to be bisected and debugged.

---

[1] Test failure log:

========================================== FAILURES ==========================================
______________________________________ test_wcfs_basic _______________________________________

    @func
    def test_wcfs_basic():
        t = tDB(); zf = t.zfile
        defer(t.close)

        # >>> lookup non-BigFile -> must be rejected
        with raises(OSError) as exc:
            t.wc._stat("head/bigfile/%s" % h(t.nonzfile._p_oid))
        assert exc.value.errno == EINVAL

        # >>> file initially empty
        f = t.open(zf)
        f.assertCache([])
        f.assertData ([], mtime=t.at0)

        # >>> (@at1) commit data -> we can see it on wcfs
        at1 = t.commit(zf, {2:'c1'})

        f.assertCache([0,0,0])  # initially not cached
        f.assertData (['','','c1'], mtime=t.head)

        # >>> (@at2) commit again -> we can see both latest and snapshotted states
        # NOTE blocks e(4) and f(5) will be accessed only in the end
        at2 = t.commit(zf, {2:'c2', 3:'d2', 5:'f2'})

        # f @head
>       f.assertCache([1,1,0,0,0,0])

wcfs/wcfs_test.py:1341:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

t = <wcfs.wcfs_test.tFile instance at 0x7ff61457b960>, incorev = [1, 1, 0, 0, 0, 0]

    def assertCache(t, incorev):
>       assert t.cached() == incorev
E       assert [0, 0, 0, 0, 0, 0] == [1, 1, 0, 0, 0, 0]
E         At index 0 diff: 0 != 1
E         Use -v to get the full diff

wcfs/wcfs_test.py:791: AssertionError
------------------------------------ Captured stdout call ------------------------------------

M: commit -> @at0 (03fb5dfbe3c1cd55)

M: commit -> @at1 (03fb5dfbe4936a66)
M:      f<0000000000000002>     [2]

M: commit -> @at2 (03fb5dfbe4d01166)
M:      f<0000000000000002>     [2, 3, 5]
>>> Change history by file:

f<0000000000000002>:
                                0 1 2 3 4 5 6 7
                                a b c d e f g h
        @at0 (03fb5dfbe3c1cd55)
        @at1 (03fb5dfbe4936a66)     2
        @at2 (03fb5dfbe4d01166)     2 3   5

------------------------------------ Captured stderr call ------------------------------------
I0917 12:43:53.392222  124283 wcfs.go:2752] start "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
I0917 12:43:53.392282  124283 wcfs.go:2758] (built with go1.21.13)
W0917 12:43:53.392404  124283 storage.go:232] zodb: FIXME: open file:///tmp/testdb_fs.z5ZoMH/1.fs: raw cache is not ready for invalidations -> NoCache forced
W0917 12:43:53.567807  124283 wcfs.go:2331] /head/bigfile: lookup "0000000000000001": bigfopen 0000000000000001 @03fb5dfbe3c1cd55: invalid argument: ZODB.Broken("persistent.Persistent") is not a ZBigFile
I0917 12:43:53.710208  124283 wcfs.go:2933] stop "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
------------------------------------- Captured log call --------------------------------------
WARNING  ZODB.FileStorage:FileStorage.py:412 Ignoring index for /tmp/testdb_fs.z5ZoMH/1.fs
_________________________________ test_wcfs_watch_vs_access __________________________________

    @func
    def test_wcfs_watch_vs_access():
        t = tDB(); zf = t.zfile; at0=t.at0
        defer(t.close)

        f = t.open(zf)
        at1 = t.commit(zf, {2:'c1'})
        at2 = t.commit(zf, {2:'c2', 3:'d2', 5:'f2'})
        at3 = t.commit(zf, {0:'a3', 2:'c3', 5:'f3'})

        f.assertData(['a3','','c3','d2','x','x'])
        f.assertCache([1,1,1,1,0,0])

        # watched + commit -> read -> receive pin messages.
        # read vs pin ordering is checked by assertBlk.
        #
        # f(5) is kept not accessed to check later how wcfs.go handles δFtail
        # rebuild after it sees not yet accessed ZBlk that has change history.
        wl3  = t.openwatch();  w3 = wl3.watch(zf, at3);  assert at3 == t.head
        assert w3.at     == at3
        assert w3.pinned == {}

        wl3_ = t.openwatch();  w3_ = wl3_.watch(zf, at3)
        assert w3_.at     == at3
        assert w3_.pinned == {}

        wl2  = t.openwatch();  w2 = wl2.watch(zf, at2)
        assert w2.at     == at2
        assert w2.pinned == {0:at0, 2:at2}

        # w_assertPin asserts on state of .pinned for {w3,w3_,w2}
        def w_assertPin(pinw3, pinw3_, pinw2):
            assert w3.pinned   == pinw3
            assert w3_.pinned  == pinw3_
            assert w2.pinned   == pinw2

        f.assertCache([1,1,1,1,0,0])
        at4 = t.commit(zf, {1:'b4', 2:'c4', 5:'f4', 6:'g4'})
>       f.assertCache([1,0,0,1,0,0,0])

wcfs/wcfs_test.py:1702:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

t = <wcfs.wcfs_test.tFile instance at 0x7ff614512050>, incorev = [1, 0, 0, 1, 0, 0, ...]

    def assertCache(t, incorev):
>       assert t.cached() == incorev
E       assert [0, 0, 0, 0, 0, 0, ...] == [1, 0, 0, 1, 0, 0, ...]
E         At index 0 diff: 0 != 1
E         Use -v to get the full diff

wcfs/wcfs_test.py:791: AssertionError
------------------------------------ Captured stdout call ------------------------------------

M: commit -> @at0 (03fb5dfc0fd82300)

M: commit -> @at1 (03fb5dfc10b92ecc)
M:      f<0000000000000049>     [2]

M: commit -> @at2 (03fb5dfc10cee9dd)
M:      f<0000000000000049>     [2, 3, 5]

M: commit -> @at3 (03fb5dfc1100c999)
M:      f<0000000000000049>     [0, 2, 5]

C: setup watch f<0000000000000049> @at3 (03fb5dfc1100c999)

C: setup watch f<0000000000000049> @at3 (03fb5dfc1100c999)

C: setup watch f<0000000000000049> @at2 (03fb5dfc10cee9dd)

M: commit -> @at4 (03fb5dfc120ed611)
M:      f<0000000000000049>     [1, 2, 5, 6]
>>> Change history by file:

f<0000000000000049>:
                                0 1 2 3 4 5 6 7
                                a b c d e f g h
        @at0 (03fb5dfc0fd82300)
        @at1 (03fb5dfc10b92ecc)     2
        @at2 (03fb5dfc10cee9dd)     2 3   5
        @at3 (03fb5dfc1100c999) 0   2     5
        @at4 (03fb5dfc120ed611)   1 2     5 6

------------------------------------ Captured stderr call ------------------------------------
I0917 12:44:03.733037  125217 wcfs.go:2752] start "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
I0917 12:44:03.733126  125217 wcfs.go:2758] (built with go1.21.13)
W0917 12:44:03.733418  125217 storage.go:232] zodb: FIXME: open file:///tmp/testdb_fs.z5ZoMH/1.fs: raw cache is not ready for invalidations -> NoCache forced
I0917 12:44:04.475273  125217 wcfs.go:2933] stop "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
============================ 2 failed, 42 passed in 55.81 seconds ============================
I0917 12:44:17.882140  125540 wcfs.go:2933] stop "/dev/shm/wcfs/c4d833a0bdea4c51decf5425b8ad2cc4d017280f" "file:///tmp/testdb_fs.bvHBy9/1.fs"
make: *** [Makefile:174: test.wcfs] Error 1

/reviewed-by @kirr
/reviewed-on nexedi/wendelin.core!30

764d4da8

wcfs: Implement protection against faulty client + related fixes and improvements · 89d653c0

Kirill Smelkov authored Sep 17, 2024

The WCFS documentation specifies [1]:

- - - 8> - - - 8> - - -

If a client, on purpose or due to a bug or being stopped, is slow to respond
with ack to file invalidation notification, it creates a problem because the
server will become blocked waiting for pin acknowledgments, and thus all
other clients, that try to work with the same file, will get stuck.

[...]

Lacking OS primitives to change address space of another process and not
being able to work it around with ptrace in userspace, wcfs takes approach
to kill a slow client on 30 seconds timeout by default.

- - - <8 - - - <8 - - -

But before, this protection wasn't implemented yet: one
faulty client could therefore freeze the whole system. With this work
this protection is implemented now: faulty clients are killed after the
timeout or any other misbehaviour in their pin handlers.

Working on this topic also resulted in several fixes and improvements
around isolation protocol implementation on the server side.

See individual patches for details.

[1] https://lab.nexedi.com/nexedi/wendelin.core/blob/38dde766/wcfs/wcfs.go#L186-208Co-authored-by: Levin Zimmermann <levin.zimmermann@nexedi.com>

/reviewed-on nexedi/wendelin.core!18

89d653c0

wcfs: Require Go 1.19 + go mod tidy · 1fcef9c9

Levin Zimmermann authored Sep 16, 2024

I would only suggest one very tiny change. In go.mod we have:

    module lab.nexedi.com/nexedi/wendelin.core/wcfs

    go 1.14

I think this needs to be updated to go 1.19 due to atomic.Int64.

And maybe we just need general go mod tidy update.

/reviewed-by @kirr
/reviewed-on nexedi/wendelin.core!18

1fcef9c9

wcfs: Implement protection against faulty client · c559ec1a

Kirill Smelkov authored Sep 16, 2024

The WCFS documentation specifies [1]:

- - - 8> - - - 8> - - -

If a client, on purpose or due to a bug or being stopped, is slow to respond
with ack to file invalidation notification, it creates a problem because the
server will become blocked waiting for pin acknowledgments, and thus all
other clients, that try to work with the same file, will get stuck.

[...]

Lacking OS primitives to change address space of another process and not
being able to work it around with ptrace in userspace, wcfs takes approach
to kill a slow client on 30 seconds timeout by default.

- - - <8 - - - <8 - - -

But before this patch, this protection wasn't implemented yet: one
faulty client could therefore freeze the whole system. With this patch
this protection is implemented now: faulty clients are killed after the
timeout or any other misbehaviour in their pin handlers.

[1] https://lab.nexedi.com/nexedi/wendelin.core/blob/38dde766/wcfs/wcfs.go#L186-208

Preliminary history:

    levin.zimmermann/wendelin.core@24904e82
    levin.zimmermann/wendelin.core@b02dcadcCo-authored-by: Levin Zimmermann <levin.zimmermann@nexedi.com>

/discussed-on nexedi/wendelin.core!18

c559ec1a

wcfs: Shutdown WatchLink on any pin error · 007d53db

Kirill Smelkov authored Sep 16, 2024

If a pin misbehaves or there is IO error or anything else, we want to
stop all communication on the watchlink, cancel on in-flight pin
handlers, and (TODO) kill the client with SIGBUS.

This patch organizes WatchLink shutdown on any pin error.
This functionality is indirectly tested by test_Wcfs_watch_robust and
will be also indirectly tested by faultyprotection tests.

It would be good to have dedicated tests probably, but that is,
hopefully, TODO.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

007d53db

wcfs: Do not cancel pin handler by a READ interrupt · 7a79bdd6

Kirill Smelkov authored Sep 16, 2024

Pinning is critical operation whose failure will soon lead to client
being killed with SIGBUS. WCFS correctness also depend fundamentally on
pin operation, if started, to be handled by the client.

-> rework the READ handler not to cancel pin if a READ interrupt comes
   in from the OS client.

Do this via organizing WatchLink.serveCtx and running pins under this
context instead of under READ context. Later we will adjust pins to also
cancel this context on any error.

Test is, hopefully, TODO.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

7a79bdd6

wcfs: Fix potential stuck in WatchLink.serve exit codepath · a6dd7806

Kirill Smelkov authored Sep 16, 2024

When serve is completing and going to exit, it sends an error message to
the client without any timeout. So if the client is not reading from the
channel, wcfs will get stuck waiting for the message to be consumed.

-> Fix that by trying to send that last error only during 1 second and
   ignoring errors if any

Test is, hopefully, TODO.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

a6dd7806

wcfs: Rework WatchLink.serve exit codepath for better clarity · 08b011f5

Kirill Smelkov authored Sep 16, 2024

Bring in more structure:

- final watchlink cleanup is done in its own block
- cancelling spawned handlers is done in another block
- add more comments explaining things

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

08b011f5

wcfs: Rework WatchLink.serve to rely on context cancellation to stop reading · c7c3b82a

Kirill Smelkov authored Sep 15, 2024

Previously we were using .sk.CloseRead() to interrupt sk.Read(), but
that is not necessary since .sk, relying on xio.Pipe, implements
xio.Reader natively with full support for cancellation.

The original code to cancel via CloseRead comes from mid 2019 and predates

go123@7ad867a3
go123@0e368363
go123@0bdac628
go123@9db4dfac
go123@d2dc6c09

And in b17aeb8c and
6f0cdaff (wcfs: Provide isolation to clients), it seems, I missed to
update WatchLink.serve code to that.

Do that now because it simplifies code flow organization a bit.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

c7c3b82a

wcfs: tests: Extend faulty protection tests with more kinds of faulty clients · c91fb14e

Kirill Smelkov authored Sep 16, 2024

So far we were testing only against faulty client that reads pin
notification ok, but does not reply to the notification. But there could
be more problems:

1) a client does not read pin notification at all
2) a client closes watchlink abruptly after reading pin notification
3) a client replies to pin notification but the reply is not "ack"

The first problem, if not handled leads to whole set of clients to
become stuck on reading the same block as the faulty client. The other
problems also indicate breakage of the isolation protocol from the client
side and that wcfs can no longer be sure that it provides good
uncorrupted data to the client.

In the first case, similarly to "no reply" situation we need to kill the
client to make progress while maintaining safety as well. In the cases 2
and 3 we cannot maintain safety if the faulty client remains in the set
of live and served clients, so it is also logical to send SIGBUS/SIGKILL
to it.

Killing a client with SIGBUS is similar to how OS kernel sends SIGBUS when
a memory-mapped file is accessed and loading file data results in EIO. It is
also similar to wendelin.core 1 where SIGBUS is raised if loading file block
results in an error.

Extend tests to cover all explained scenarios.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

c91fb14e

wcfs: tests: Add test to exercies faulty client that does not reply to pin... · 0c35ae45

Kirill Smelkov authored Sep 16, 2024

wcfs: tests: Add test to exercies faulty client that does not reply to pin triggered by readPinWatchers

Levin writes:

    This patch extends the test scope of 'test_wcfs_pintimeout_kill'. Before
    this patch, the test only ensured that a client that does not
    respond to pin requests during the initial watch request [1] is
    killed. Now it also tests that a faulty client is killed when a block
    is invalidated. Since there are no other situations where the WCFS
    server sends pin requests to a client, the tests now cover all situations
    where a faulty client might not respond. This patch therefore aims to
    increase the security that WCFS is not blocked by a faulty client.

    [1] See nexedi/wendelin.core!18

Preliminary history:

    levin.zimmermann/wendelin.core@9d42efffCo-authored-by: Levin Zimmermann <levin.zimmermann@nexedi.com>

/discussed-on nexedi/wendelin.core!18

0c35ae45

wcfs: tests: Factor assertion that process should be killed into assertKilled · 008211fb

Kirill Smelkov authored Sep 16, 2024

We will need to use this utilitin from several places in the next patch.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

008211fb

wcfs: tests: Allow to adjust tDB.assertBlk timeout · 82aa5949

Kirill Smelkov authored Sep 16, 2024

Currently assertBlk uses default timeout() to wait for READ operation to
complete. That works well everywhere except that in faulty
protection tests wcfs server will first need to wait for its own
pintimeout time to kill the faulty client and only then	return read
result to all non-faulty clients.

This way corresponding test, when one client fails to handle pin
notification well triggered due to READ operations, will need to use
adjusted longer timeout for the good client when doing assertBlk.

Adjust assertBlk to allow specifying custom timeout as preparatory step
for that.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

82aa5949

wcfs: tests: Extend faultyprotect test with good client · 001e2e7e

Kirill Smelkov authored Sep 16, 2024

And make sure that that good client can setup its watch ok even
through there simultaneously is a faulty client that should get killed.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

001e2e7e

wcfs: tests: Move client to be pinkill'ed into separate process · 33ea7769

Kirill Smelkov authored Sep 16, 2024

If we don't the whole testing process will become killed when wcfs
becomes taught to kill clients that do not handle pin notifications
well.

Use multiprocessing to do so and to be able to interoperate with spawned
test process by sending/receiving objects to/from it.

Preliminary history:

    levin.zimmermann/wendelin.core@aef0f0e1Co-authored-by: Levin Zimmermann <levin.zimmermann@nexedi.com>

/discussed-on nexedi/wendelin.core!18

33ea7769

wcfs: tests: Fix thinko in "sleep > wcfs pin timeout - wcfs must kill us" · 1303799e

Kirill Smelkov authored Sep 16, 2024

If wcfs kills client that did not respond to pin notification in
pintimeout time, we need to wait strictly _more_ than that time to detect
whether client was killed or not. And in practice, due to noise in
operating system load and other factors, that waiting time should be
significantly greater to detect lack of expected event. However we were
waiting for exactly 1·pintimeout time and were claiming that there was
no pinkill event right after that.

-> Wait for 2·pintimeout instead of 1·pintimeout to make pinkill detection robust.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

1303799e

wcfs: tests: Use small "pin timeout" for faulty protection tests · e8a3f34a

Kirill Smelkov authored Sep 16, 2024

The default "pin timeout" is 30s and we are going to add many tests that
exercise pinkilling functionality soon. If every such test takes
2·pintimeout time = 60s, it will result in significant time increase
needed to run WCFS tests. Avoid that by adjusting pin timeout to
one order of magnitude smaller pintimeout=3s during faulty protection
tests.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

e8a3f34a

wcfs: tests: Add context to tWCFS · 869e597d

Kirill Smelkov authored Sep 16, 2024

This testing helper limits whole test time to detect FUSE-related
deadlocks via aborting FUSE connection on timeout. It is working good so
far. But soon we will need pinkill-related tests, where timeout will
need to be detected independently of FUSE connection. Expose tWCFS.ctx
for tests to be able to use this context and do things limited in time.
Adjust FUSE aborting to correlate exactly with this context
cancellation.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

869e597d

wcfs: tests: Move test for verifying protection against faulty/slow clients to dedicated file · ab38f971

Kirill Smelkov authored Sep 16, 2024

We are going to add more tests on this topic + supporting infrastructure.
It makes sense to move everything related to dedicated test file first
as a preparatory step because wcfs_test.py feels already overloaded.

Plain code movement.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

ab38f971

wcfs: Fix setupWatch vs setupWatch race on the same file · 64468d47

Kirill Smelkov authored Sep 15, 2024

WCFS allows issuing simultaneous watch requests and when two watch
requests are simultaneously issued for the same file there was a race in
their handling: the code was relying on w.atMu.W to protect setupWatch
from concurrent readPinWatcher, and also, seemingly from another
setupWatch running on the same file.

But there is a bug about that: lacking atomic primitive to downgrade
RWMutex from wlock to rlock, atMu.W was first fully unlocked and then
rlocked again. The code prepare wrt readPinWatcher to start running in
that unlock->rlock time window, but it was not prepared wrt another
setupWatch starting to run on the same file in that pause time.

-> Fix that via using dedicated Watch.setupMu lock that protects
   setupWatch from setupWatch.

Test is, hopefully, TODO.

My mistake from 6f0cdaff (wcfs: Provide isolation to clients)

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

64468d47

wcfs: Fix readPinWatchers error path · 7bbd6177

Kirill Smelkov authored Sep 15, 2024

Inside readPinWatchers:

    https://lab.nexedi.com/nexedi/wendelin.core/-/blob/wendelin.core-2.0.alpha3-26-g79e6f7b9/wcfs/wcfs.go#L1536-1591

if δFtail.BlkRevAt would return an error, then f.watchMu was not
RUnlocked back, and wg.Wait was not called at all.

-> Fix that by scheduling unlock and wg wait right after f.watchMu is
   rlocked and workgroup is created.

Test is, hopefully, TODO.

My mistake from 6f0cdaff (wcfs: Provide isolation to clients)

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

7bbd6177

wcfs: Cleanup wlinkTab entry when client drops opened head/watch handle · b20a26cb

Kirill Smelkov authored Sep 15, 2024

The code was already behaving like that but there was XXX to do it. Add
test to verify it is actually done.

Opened WatchLink handle is released after RELEASE because
read in WatchLink.serve, after RELEASE, returns EOF and then the code
inside WCFS does all necessary WatchLink-related cleanup:

https://lab.nexedi.com/nexedi/wendelin.core/-/blob/wendelin.core-2.0.alpha3-26-g79e6f7b9/wcfs/wcfs.go#L1828-1872

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

b20a26cb

wcfs: Cleanup zheadSockTab entry when client drops opened .wcfs/zhead handle · 87818b0d

Kirill Smelkov authored Sep 15, 2024

This was marked as TODO in server code and not implemented.
Without this cleanup zheadSockTab was growing indefinitely after every
open/close and leaking memory.

-> Fix it via registering RELEASE handler to FUSE and removing
corresponding zheadSockTab entry from there.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

87818b0d

wcfs: Add .wcfs/stats file with basic usage statistics · 8abfd27d

Kirill Smelkov authored Sep 15, 2024

Report there number of inside-WCFS instances, e.g. number of tracked
BigFiles, WatchLinks etc, and also number of counted events, for example
how many times a pin event happened.

Soon we will need this statistics to implement tests e.g. for pinkilling
and other functionalities, and it might be also useful to have in general.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

8abfd27d

wcfs: Fix wlinkTab locking · 96b216f6

Kirill Smelkov authored Sep 15, 2024

ZWatcher says it does not need to lock wlinkMu because it is already
holding zheadMu and setupWatch runs with zheadMu locked. That is indeed
true, but the mistake here is that it i not only setupWatch that makes
access to wlinkTab. For example WatchNode.Open registers new entries
there only under wlinkMu:

https://lab.nexedi.com/nexedi/wendelin.core/-/blob/wendelin.core-2.0.alpha3-26-g79e6f7b9/wcfs/wcfs.go#L1819-1822

-> Fix it by always using wlinkMu when accessing wlinkTab.

My mistake from 6f0cdaff (wcfs: Provide isolation to clients)

Test is, hopefully, TODO.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

96b216f6

wcfs: Switch debug.zheadSockTab to fine-grained locking · 82359abe

Kirill Smelkov authored Sep 15, 2024

Previously we were protecting access to zheadSockTab with zheadMu
because this table was accessed from only two places: when opening
.wcfs/zhead and in zwatcher. Soon we are going to add another place that
will access this table and still using big zheadMu seem less and less
logical.

-> Switch to using dedicated lock to protect table of .wcfs/zhead opens
   as preparatory step for that.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

82359abe

wcfs: Switch filesystem to EIO mode on zwatcher failure · a36b5562

Kirill Smelkov authored Sep 15, 2024

Currently zwatcher failure leads to wcfs starting to provide stale data
instead of uptodate data. Fix that by detecting zwatcher failures and
explicitly switching the filesystem to a mode where any access to
anything returns "input/output error".

Zwatcher can fail on e.g. failure to retrieve transactions from ZODB
storage or any other failure. With this patch we make sure this does not
go unnoticed.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

a36b5562

wcfs: Remove TODO to teach go-fuse about Init.MaxPages · 6dfcb69e

Kirill Smelkov authored Sep 15, 2024

go-fuse added functionality to handle Init.MaxPages in
https://github.com/hanwen/go-fuse/commit/265a39266958.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!18

6dfcb69e

23 Jul, 2024 3 commits

lib/zodb: Drop client-only parameters from normalized NEO URI · 79e6f7b9

Levin Zimmermann authored Jul 19, 2024

We need to drop client-specific options so that NEO URI that only differ
due to client options while actually pointing to the same NEO server
are equal after normalization.

--------
kirr: See nexedi/neoppod!18 for
the discussion on this subject.

/reviewed-by @kirr
/reviewed-on nexedi/wendelin.core!28

79e6f7b9

lib/zodb: Update NEO URI format to be in sync with upstream NEO · 2c0968e4

Levin Zimmermann authored Jul 19, 2024

NEO/go and NEO/py URI format diverged over time:

- neo@8c974485

However with nexedi/neoppod!21 a
common solution was found. With neo!7 NEO/go and NEO/py
URI formats are in sync again. We therefore now need to update
'wendelin.core' to support the finally agreed on URI format.

/reviewed-by @kirr
/reviewed-on nexedi/wendelin.core!28

2c0968e4

wcfs: Update NEO/go to sync URI format · 921ad362

Levin Zimmermann authored Jul 22, 2024

With neo@95572d6a we synchronized
NEO/go URI format with NEO/py URI format. We need this new
NEO/go version to apply this synchronization to 'wendelin.core'
ZODB tools (what we'll do in the next patches).

/reviewed-by @kirr
/reviewed-on nexedi/wendelin.core!28

921ad362

22 Jul, 2024 1 commit

bigfile/zodb: Apply auto format as default only in WCFS mode · 34309058

Kirill Smelkov authored Jul 22, 2024

This semantically reverts 99f262dd (bigfile/zodb: Make auto format the
default) for wendelin.core-1 mode because in non-WCFS mode there are
known problems with data corruption on BTree topology changes(*) and
auto mode actually does change those topologies with first setting
ZBigFile[blk] -> ZBlk1 and then updating the same block to point to
ZBlk0 object.

Avoid pressuring those problems and use auto as default only in WCFS
mode that should handle invalidations with all those BTree topology
changes well.

The patch is based on suggestion by Levin Zimmermann: nexedi/wendelin.core!20 (comment 212405)

We have to move _default_use_wcfs because now it is invoked at module
import time and needs to be already defined at the time of the call.

(*) see nexedi/wendelin.core@8c32c9f6 for details.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/wendelin.core!29

34309058

25 Jun, 2024 3 commits

wcfs: _mntpt_4zurl: Fix it to accept strings. · 07087ec8

Carlos Ramos Carreño authored Jun 24, 2024

Strings cannot be directly hashed without encoding them first, or
an error will be raised:

```python
______________________________ test_zsync_resync _______________________________

    @func
    def test_zsync_resync():
        zstor = testdb.getZODBStorage()
        defer(zstor.close)

>       db, zconn, wconn = _zsync_setup(zstor)

wcfs/client/_wczsync_test.py:112:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../venvs/wendelin.core/lib/python3.9/site-packages/decorator.py:232: in fun
    return caller(func, *(extras + args), **kw)
../pygolang/golang/__init__.py:125: in _
    return f(*argv, **kw)
wcfs/client/_wczsync_test.py:53: in _zsync_setup
    wc = wcfs.join(zurl)
wcfs/__init__.py:201: in join
    mntpt = _mntpt_4zurl(zurl)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

zurl = 'file:///srv/slapgrid/slappart66/tmp/testdb_fs.xstpbg49/1.fs'

    def _mntpt_4zurl(zurl):
        # normalize zurl so that even if we have e.g. two neos:// urls coming
        # with different paths to ssl keys, or with different order in the list of
        # masters, we still have them associated with the same wcfs mountpoint.
        zurl = zurl_normalize_main(zurl)

        m = hashlib.sha1()
>       m.update(zurl)
E       TypeError: Strings must be encoded before hashing
```

We fix this error by encoding the string as UTF8 before hashing it.

--------
kirr:

Use b instead of doing

    if isinstance(zurl, six.text_type):
      zurl = zurl.encode("utf-8")

wcfs already takes this approach of using b in other places - for
example in tDB.change:

    # change schedules zf to be changed according to changeDelta at commit.
    #
    # changeDelta: {} blk -> data.
    # data can be both bytes and unicode.              <-- NOTE
    def change(t, zf, changeDelta):
        assert isinstance(zf, ZBigFile)
        zfDelta = t._changed.setdefault(zf, {})
        for blk, data in six.iteritems(changeDelta):
            data = b(data)                             <-- NOTE
            ...

/reviewed-by @kirr
/reviewed-on nexedi/wendelin.core!27

07087ec8

wcfs: tests: Adapt changed modules/methods to Python 3. · 594ff3fa

Carlos Ramos Carreño authored Jun 24, 2024

Some modules and methods have changed names in Python 3.
The `thread` module has been renamed to `_thread` and the old name
gives error when run on Python 3:

```python
Traceback:
/opt/slapgrid/b0df76c24a1d2728ccf3e276f07c1790/parts/python3/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
wcfs/client/client_test.py:32: in <module>
    from wendelin.wcfs.wcfs_test import tDB, tAt, timeout, eprint
wcfs/wcfs_test.py:44: in <module>
    from thread import get_ident as gettid
E   ModuleNotFoundError: No module named 'thread'
```

In a similar vein, the `items` method of dictionaries plays the same
role as the old `iteritems`.

We use the `six` module to paper over these differences.

/reviewed-by @kirr
/reviewed-on nexedi/wendelin.core!27

594ff3fa

wcfs: tests: xbtree.py: Execute `zip` eagerly when we need list. · d014045b

Carlos Ramos Carreño authored Jun 24, 2024

The builtin `zip` in Python 3 returns an iterator, not a list.
Thus, one cannot directly use the `len` method on the object returned
by `zip`, or we will have errors like the following one:

```python
Traceback (most recent call last):
  File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 617, in <module>
    main()
  File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 613, in main
    cmd(argv)
  File "/srv/slapgrid/slappart66/venvs/wendelin.core/lib/python3.9/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/srv/slapgrid/slappart66/git/pygolang/golang/__init__.py", line 125, in _
    return f(*argv, **kw)
  File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 589, in cmd_trees
    TreesSrv(zstor, r)
  File "/srv/slapgrid/slappart66/venvs/wendelin.core/lib/python3.9/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/srv/slapgrid/slappart66/git/pygolang/golang/__init__.py", line 125, in _
    return f(*argv, **kw)
  File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 234, in TreesSrv
    treetxtPrev = zctx.ztreetxt(ztree)
  File "/srv/slapgrid/slappart66/venvs/wendelin.core/lib/python3.9/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/srv/slapgrid/slappart66/git/pygolang/golang/__init__.py", line 125, in _
    return f(*argv, **kw)
  File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 536, in ztreetxt
    return zctx.TopoEncode(xbtree.StructureOf(ztree))
  File "/srv/slapgrid/slappart66/venvs/wendelin.core/lib/python3.9/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/srv/slapgrid/slappart66/git/pygolang/golang/__init__.py", line 125, in _
    return f(*argv, **kw)
  File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 542, in TopoEncode
    return xbtree.TopoEncode(tree, zctx.vencode)
  File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree.py", line 797, in TopoEncode
    for nodev in _walkBFS(tree):
  File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree.py", line 701, in _walkBFS
    for level in __walkBFS(tree):
  File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree.py", line 724, in __walkBFS
    assert len(rv) == len(rn.node.children)
TypeError: object of type 'zip' has no len()
```

Thus, we have to create a list from the result of `zip` before calling
`len` on it.

--------
kirr:

There were only two places where zip was used to build a list. All other
places where zip is used - both in wcfs/xbtree and in other packages -
are calling zip to iterate over zip result:

    (py39.venv) kirr@deca:~/src/wendelin/wendelin.core$ git grep -w zip
    bigarray/__init__.py:        for n, s in zip(self.shape, self.stridev):
    bigarray/__init__.py:        for n, s in zip(a.shape, a.strides):
    bigarray/array_zodb.py:BigArray_defaults = dict(zip(reversed(_.args), reversed(_.defaults)))
    wcfs/internal/xbtree.py:            for i, (klo, khi) in enumerate(zip(v[:-1], v[1:])): # (klo, khi) = [] of (k_i, k_{i+1})
    wcfs/internal/xbtree.py:                kvv = ['%s:%s' % (k,v) for (k,v) in zip(b.keyv, b.valuev)]
    wcfs/internal/xbtree.py:        for (j,i) in zip(jv, iv):
    wcfs/internal/xbtree.py:                    for (child, k) in zip(node.children[1:], node.keyv):
    wcfs/internal/xbtree.py:                    for (k,v) in zip(node.keyv, node.valuev):
    wcfs/internal/xbtree.py:            for (xlo, xhi) in zip(ksplitv[:-1], ksplitv[1:]): # (klo, s1), (s1, s2), ..., (sN, khi)
    wcfs/internal/xbtree.py:            for (xlo, xhi) in zip(ksplitv[:-1], ksplitv[1:]): # (klo, s1), (s1, s2), ..., (sN, khi)
    wcfs/internal/xbtree.py:                                    for (k,vtxt) in zip(node.keyv, vtxtv)])
    wcfs/internal/xbtree/xbtreetest/treegen.py:                    for (k,v) in zip(node.keyv, node.valuev):
    wcfs/internal/xbtree_test.py:    for (child, childOK) in zip(kids, children):
    wcfs/internal/xbtree_test.py:        for (i,(k,v)) in enumerate(zip(keys, values)):

    # handled in hereby patch
    wcfs/internal/xbtree.py:                rv = list(zip(v[:-1], v[1:]))  # (klo,k1), (k1,k2), ..., (kN,khi)
    wcfs/internal/xbtree.py:                rv = list(zip(v[:-1], v[1:]))  # (klo,k1), (k1,k2), ..., (kN,khi)

/reviewed-by @kirr
/reviewed-on nexedi/wendelin.core!27

d014045b