Commits · master · Kirill Smelkov / neo

20 Feb, 2025 1 commit

go: Proper support for py3-generated data and py3 · 48f8faef

Kirill Smelkov authored Feb 20, 2025

- Add robust support for data generated by python3 based on ogórek [StrictUnicode mode](https://github.com/kisielk/og-rek/commit/b28613c2).
- Test handling of all ZODB kinds of data we care about: py2/pickle{1,2,3} and py3/pickle3, fix issues discovered along the way.
- Fix interoperability with py3 programs: for example previously index, that FileStorage/go saves, was not readable on py3 side.
- Switch encoding protocol for pickles, that ZODB/go and other go places emit, from 2 to 3.
- Switch unpickling to work in [PyDict=y](https://github.com/kisielk/og-rek/pull/75) mode.
- Add tests for zodb.Map and zodb.List .
- minor fixes.

Robust support for py3-generated data is needed for Wendelin.core .

This work depends on recent zodbtools update from nexedi/zodbtools!29.

/cc @jerome, @levin.zimmermann, @vnmabus
/proposed-for-review-on !8

48f8faef

24 Sep, 2024 21 commits

go/zodb: Fix _TestPersistentMapListLoad on py3_pickle3 · 0a58eb7a

Kirill Smelkov authored Aug 20, 2024

plist is generated to come with str in [0]. On py2 that is bytestring,
while on py3 it is unicode. However were trying to decode it as
bytestring only and so it was failing as

    --- FAIL: TestPersistentMapListLoad/py3_pickle3 (0.73s)
        persistent_x_test.go:82: plist[0]: got string  ; expect str

-> Fix it by using ogórek.AsString helper as suggested by
StrictUnicode=y mode documentation in https://github.com/kisielk/og-rek/commit/b28613c2.

0a58eb7a

go/zodbpickle: Switch to unpickle with PyDict=y mode · 1e87fe5b

Kirill Smelkov authored Aug 20, 2024

Previously we were unpickling dicts into map[any]any and it was making a
difference from which Python version the dict was saved: if it was saved
from py2 with str keys, then the dict needed to be accessed with ByteString
keys, while if it was saved from py3 with str keys, the dict needed to
be accessed with string keys. This peculiarity is one of the reasons why
Map test currently fails on py3_pickle3 - because there - both in the
test and inside Map.PySetState, dictionaries are accessed via ByteString
keys which works only wrt py2-saved data.

-> Fix it by switching the unpickler to work in PyDict=y mode(*) which
results in dicts unpickled as ogórek.Dict instead of map[any]any
with the ogórek.Dict considering ByteString to be equal to both string and Bytes
with the same underlying content. This allows programs to access Dict via
string/bytes keys following Python3 model, while still being able to handle
dictionaries generated from under Python2.

For ZODB/go this is change in behaviour exposed to outside. However
there is currently only one known ZODB/go user - WCFS in Wendelin.core -
and that user will be updated correspondingly as well.

The change fixes the following failure in go/zodb tests

    --- FAIL: TestPersistentMapListLoad/py3_pickle3 (0.70s)
        persistent_x_test.go:57: persistent.mapping.PersistentMapping(0000000000000000): activate: pysetstate: noone of ["data" "_container"] is present in state dict

but one more remains there which will be addressed in the next patch.

(*) see https://github.com/kisielk/og-rek/pull/75 for details.

1e87fe5b

go/go.mode: v↑ github.com/kisielk/og-rek to pre-release with PyDict patches · 90799e25

Kirill Smelkov authored Aug 20, 2024

We will need PyDict mode in the next patch to fix Map test on
py3_pickle3 and so that working with unpickled dictionaries is easy
independently from which Python version - py2 or py3 - they come.

PyDict mode patch: https://github.com/kisielk/og-rek/commit/3bf6c92d
		  (https://github.com/kisielk/og-rek/pull/75)

Whole set of brought changes:

	https://github.com/kisielk/og-rek/compare/b1bd3f2e3d5e...e691997e3596

90799e25

go/zodb/btree: go/zodb/fs1tools: Test it on all py2/py3 ZODB kinds of data we care about · 6841930a

Kirill Smelkov authored Jul 18, 2024

Previously, like with FileStorage, we were testing btree/go only with
data generated by python2 and pickle protocol=2. However even on py2
there are more pickle protocols that are in use, and also there is python3.

-> Modernize testdata/gen-testdata to use run_with_all_zodb_pickle_kinds
   that was recently added as part of nexedi/zodbtools@f9d36ba7
   and generate test data with both python2 and python3. It is handy to
   use py2py3-venv(*) to prepare python environment to do that.

   Adjust tests on Go side to verify how btree handles all generated zkinds.

All py2_pickle1, py2_pickle2, py2_pickle3 and py3_pickle3 are handled well out of the box.

(*) see nexedi/zodbtools@fac2f190

6841930a

go/zodb/fs1: Remove old testdata files · 0dd81a41

Kirill Smelkov authored Jul 18, 2024

We recently modrnized FileStorage/go tests to have all ZODB kinds of
data which live in testdata/<zkind>/ for example testdata/py2_pickle3/ -
see recent patch "go/zodb/fs1: Test FileStorage on all py2/py3 ZODB
kinds of data we care about".

We also updated all other packages to use that new FileStorage testdata.

-> Now is the time for old testdata to go.

0dd81a41

go/internal/xtesting: Switch to updated FileStorage testdata to validate LoadDBHistory · bfd9eeb4

Kirill Smelkov authored Jul 18, 2024

Old fs1/testdata/1.fs will be going away -> we need to switch to using
newly generated multi-zkind testdata. However for this test using only
one zkind should be enough. Let's pick what we currently primarily use:
py2_pickle3.

bfd9eeb4

go/zodbpickle: Switch to pickling with protocol=3 · 1bb0eb24

Kirill Smelkov authored Jul 18, 2024

Pickle protocol 3 allows to natively support all py bytes and string
types and is supported by zodbpickle/py on both py2 and py3 out of the
box. ZODB/py3 is using protocol=3 since 2013 and ZODB/py2 since 2018. In
other words nowdays protocol=3 is the default protocol with which
ZODB emits pickles, and it can be read practically everywhere.

The difference in between protocol=2 and protocol=3 is addition of
BINBYTES and SHORT_BINBYTES opcodes to represent bytes. Without those
opcodes bytes are encoded as

    `_codecs.encode(byt.decode('latin1'), 'latin1')`

which, when unpickled, results in bytes on py3 and str on py2.

Compared to using `BINBYTES byt` this form is much bigger in size, but
what is more important might turn bytes back into str when decoded and
reencoded on py2.

This form of bytes encoding is also not accepted by ZEO/py which rejects
it with

    2024-07-18T20:44:39 ERROR ZEO.asyncio.server Can't deserialize message
    Traceback (most recent call last):
      File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/server.py", line 100, in message_received
        message_id, async_, name, args = self.decode(message)
      File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/marshal.py", line 119, in pickle_server_decode
        return unpickler.load()  # msgid, flags, name, args
      File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/marshal.py", line 176, in server_find_global
        raise ImportError("Module not allowed: %s" % (module,))
    ImportError: Module not allowed: _codecs

All in all using pickle protocol=3 is ok from both py2 and py3 point of
view, brings size optimization and correctness, and fixes ZEO and
probably other issues.

So let the pickles ZODB/go saves and other Go places emit be encoded with protocol=3 now as well.

For the reference, the

    // TODO 2 -> 3 since ZODB5 switched to it and uses zodbpickle.

is from 2019 added in a16c9e06 (go/zodb: Teach Persistent to serialize
itself).

1bb0eb24

go/zodb/zeo: Fix ZRPC/pickle binary and string encoding/decoding wrt py3 · 9f402fcc

Kirill Smelkov authored Jul 18, 2024

ZEO/go client was asserting "method" field to be bytestring. As the
result interoperation with ZEO/py3 server was failing as e.g.

    2024-07-18T20:41:14 INFO ZEO.asyncio.server received handshake 'Z5'
    2024/07/18 20:41:14 zlink @ - /tmp/zeo2281880331/1.fs.zeosock: serveRecv1: decode: .0: method: got string; expected str
        zeo_test.go:221: openzeo /tmp/zeo2281880331/1.fs.zeosock: zeo: open /tmp/zeo2281880331/1.fs.zeosock: zlink @ - /tmp/zeo2281880331/1.fs.zeosock: call register: zlink is closed

because ZEO/py3 sends "method" as unicode.

-> Fix it by using pickle.AsString when decoding "method" field.

Similarly ZEO/go client was sending strings as bytestr, which on py3 decode to
unicode ok only if they are ASCII. Adjust string encoding to be emitted
as unicode always which works wrt both ZEO/py3 and ZEO/py2.

--------

ZEO/go client was sending binary ZRPC arguments as bytestr as well,
which was resulting in the following exception on ZEO/py3 side:

    2024-07-18T20:42:31 ERROR ZEO.asyncio.server Can't deserialize message
    Traceback (most recent call last):
      File "/home/kirr/src/wendelin/z/ZEO/src/ZEO/asyncio/server.py", line 102, in message_received
        message_id, async_, name, args = self.decode(message)
      File "/home/kirr/src/wendelin/z/ZEO/src/ZEO/asyncio/marshal.py", line 122, in pickle_server_decode
        return unpickler.load()  # msgid, flags, name, args
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 1: ordinal not in range(128)
    2024/07/18 20:42:31 zlink @ - /tmp/zeo627526011/1.fs.zeosock: recvPkt: EOF
        xtesting.go:361: load 0285cbac70a3d733:0000000000000001: returned err unexpected:
            have: /tmp/zeo627526011/1.fs.zeosock: load 0285cbac70a3d733:0000000000000001: zlink @ - /tmp/zeo627526011/1.fs.zeosock: call loadBefore: zlink is closed
            want: nil

-> Fix it by using pickle.Bytes which should be encoded as bytes in the
   pickle stream.

This fix is ok but depends on the next patch to upgrade ZODB
serialization protocol to 3, because on pickle protocol 2 bytes are
serialized as `_codecs.encode(byt.decode('latin1'), 'latin1')` and
ZEO/py forbids import of _codecs:

    2024-07-18T20:44:39 ERROR ZEO.asyncio.server Can't deserialize message
    Traceback (most recent call last):
      File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/server.py", line 100, in message_received
        message_id, async_, name, args = self.decode(message)
      File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/marshal.py", line 119, in pickle_server_decode
        return unpickler.load()  # msgid, flags, name, args
      File "/home/kirr/src/wendelin/z/ZEO5/src/ZEO/asyncio/marshal.py", line 176, in server_find_global
        raise ImportError("Module not allowed: %s" % (module,))
    ImportError: Module not allowed: _codecs

We will address this in the next patch.

9f402fcc

go/zodb/zeo: Test it on all py2/py3 ZODB kinds of data we care about and wrt... · df8d5cc3

Kirill Smelkov authored Jul 18, 2024

go/zodb/zeo: Test it on all py2/py3 ZODB kinds of data we care about and wrt both ZEO/py2 and ZEO/py3

Similarly to FileStorage, fs1tools and other Go packages previously we
were testing ZEO/go client only with old FileStorage testdata generated
by python2 and pickle protocol=2. However even on py2 there are more
pickle protocols that are in use, and also there is python3.

We were also testing our ZEO/go client only wrt ZEO/py2 but there is
also ZEO/py3.

-> Adjust ZEO/go testing to automatically load and test against all ZODB
kinds from recently updated FileStorage testdata and wrt both ZEO/py2
and ZEO/py3.

All py2_pickle1, py2_pickle2, py2_pickle3 and py3_pickle3 are handled well out of the box.
However only ZEO/py2 succeeds: tests wrt ZEO/py3 server currently fail
and so are marked with "xfail".

We will fix tests for ZEO/py3 in the next patch.

df8d5cc3

go/zodb/demo: Test it on all py2/py3 ZODB kinds of data we care about · e76c9501

Kirill Smelkov authored Jul 18, 2024

Similarly to FileStorage, fs1tools and zodbtools previously we were testing
demo only with old FileStorage testdata generated by python2 and pickle
protocol=2. However even on py2 there are more pickle protocols that
are in use, and also there is python3.

-> Adjust demo testing to automatically load and test agains all ZODB
kinds from recently updated FileStorage testdata.

All py2_pickle1, py2_pickle2, py2_pickle3 and py3_pickle3 are handled well out of the box.

e76c9501

go/zodb/zodbtools: Test it on all py2/py3 ZODB kinds of data we care about · c67e2fec

Kirill Smelkov authored Jul 18, 2024

Like with FileStorage and fs1tools previously we were testing zodbtools/go only
with data generated by python2 and pickle protocol=2. However even on py2 there
are more pickle protocols that are in use, and also there is python3.

-> Modernize testdata generation to automatically pick up all ZODB kinds
   from recently-updated FileStorage/go testdata.

   Adjust tests on Go side to verify how zodbtools/go handle all generated zkinds.

All py2_pickle1, py2_pickle2, py2_pickle3 and py3_pickle3 are handled well out of the box.

c67e2fec

go/zodb/fs1tools: test: Normalize dumpOk from py3 to py2 syntax · 93e2f065

Kirill Smelkov authored Jul 18, 2024

Because else, py3_pickle3 tests fail e.g. as

        --- FAIL: TestFsDump/py3_pickle3/db=1 (0.00s)
            dump_test.go:70: fsdump: dump different:
                 Trans #00000 tid=0285cbac70a3d733 size=211 time=1979-01-03 21:00:26.400001 offset=117
                -    status='p' user=b'user0.12' description=b'step 0.12'
                +    status='p' user='user0.12' description='step 0.12'

        --- FAIL: TestFsDumpv/py3_pickle3/db=1 (0.00s)
            dump_test.go:70: fsdumpv: dump different:
                 ************************************************************
                -file identifier: b'FS30'
                +file identifier: 'FS30'

        ...

fs1tools/go sticks to py2 syntax because in my view it is more user
friendly. The preference of py2 or py3 syntax needs to be made anyway
because for fs1tools/go it is not possible to know beforehand to which
py kind its output will be compared against.

I think it is not very good for fstools/py output to be dependent on
python version. For the reference on this topic: output of zodbtools/py
is designed to be the same on both py2 and py3.

93e2f065

go/zodb/fs1tools: Test it on all py2/py3 ZODB kinds of data we care about · 117172cf

Kirill Smelkov authored Jul 18, 2024

Like with FileStorage previously we were testing fs1tools only with data
generated by python2 and pickle protocol=2. However even on py2 there are
more pickle protocols that are in use, and also there is python3.

-> Modernize testdata generation to automatically pick up all ZODB kinds
   from recently-updated FileStorage/go testdata.

   Adjust tests on Go side to verify how fs1tools handles all generated zkinds.

py2_pickle1, py2_pickle2 and py2_pickle3 are handled well.
Tests for py3_pickle3 currently fail and so are marked with "xfail".

We will fix tests for py3_pickle3 in follow-up patches.

117172cf

go/zodb/fs1tools: DumperFsDump: Print size of every transaction record · 9d10dcc5

Kirill Smelkov authored Jul 18, 2024

Like fsdump/py does.

https://github.com/zopefoundation/ZODB/commit/403f9869 started to print
size of every transaction record saying that it readded it. And indeed
this printing was there starting from
https://github.com/zopefoundation/ZODB/commit/06e757b3 authored in 2003.

Both commits use `size(txn) = trans._tend - trans._tpos` which is wrong
by 8 because during file iteration tpos points to the beginning of
transaction and tend points to tpos + tlen, but tlen is full
transaction length - 8:

https://github.com/zopefoundation/ZODB/blob/5.8.1/src/ZODB/FileStorage/FileStorage.py#L1997-L1998
https://github.com/zopefoundation/ZODB/blob/5.8.1/src/ZODB/FileStorage/format.py#L28

Mimic fsdump/py behaviour exactly for compatibility, even if it is a bit
buggy, since it was there for such a long time.

Without the fix TestFsDump was failing like this:

    --- FAIL: TestFsDump/db=1 (0.00s)
        dump_test.go:70: fsdump: dump different:
            -Trans #00000 tid=0285cbac258bf266 size=151 time=1979-01-03 21:00:08.800000 offset=52
            +Trans #00000 tid=0285cbac258bf266 time=1979-01-03 21:00:08.800000 offset=52
                 status=' ' user='' description='initial database creation'
               data #00000 oid=0000000000000000 size=61 class=persistent.mapping.PersistentMapping
             ...

9d10dcc5

go/zodb/fs1tools: Regenerate testdata with ZODB 5.8.1 · ce122ee7

Kirill Smelkov authored Jul 18, 2024

https://github.com/zopefoundation/ZODB/commit/403f9869 adjusted fsdump
to emit size of transaction record. Everything else remains unchanged.

Our fsdump becomes broken, because it does not yet emit that size(txn).
We will fix it in the next patch.

ce122ee7

go/zodb/fs1: Fix index save wrt py3 · ef615203

Kirill Smelkov authored Jul 18, 2024

We were saving index data with bytestrings inside index records. This
works for py2 but on py3 it causes failure on unpickling because, by default,
*STRING opcodes are unpickled into unicode on py3:

    === RUN   TestIndexSaveToPy/py2_pickle1/py3
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x8d in position 25: ordinal not in range(128)

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/./py/indexcmp", line 43, in <module>
        main()
      File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/./py/indexcmp", line 30, in main
        d1 = fsIndex.load(path1)
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/fsIndex.py", line 138, in load
        v = unpickler.load()
    SystemError: <built-in method read of _io.BufferedReader object at 0x7fb631a86670> returned a result with an error set
        index_test.go:229: zodb/py read/compare index: exit status 1

-> Fix it by explicitly emitting index entries in binary form.

To be able to compare "index from py2 -> loaded into go -> saved by go ->
compared by py3" implement a bit of backward compatibility for  loading py2
index files on py3. Do this compatibility only for known-good py2 files, not
index produced by go code which is loaded by py3 without any compatibility
shims.

ef615203

go/zodb/fs1: Fix index load from py3 data · 82098c49

Kirill Smelkov authored Jul 18, 2024

Python3 emits index entries using bytes, but index.go was expecting only
bytestrings. As the result loading index produced by py3 was failing
e.g. as

    index_test.go:201: index load: testdata/py3_pickle3/1.fs.index: pickle @6: invalid oidPrefix: type ogórek.Bytes

-> Fix it by accepting both bytes and bytestrings inside the index.

82098c49

go/zodb/fs1: Accept both FS21 and FS30 magics · 8bae132e

Kirill Smelkov authored Jul 18, 2024

FileStorage/py2 saves data with FS21 magic, while FileStorage/py3 with
FS30 magic:

https://github.com/zopefoundation/ZODB/blob/5.8.1/src/ZODB/_compat.py#L25-L77

Up till now FileStorage/go was accepting only FS21 and so with
py3-generated data it was leading to e.g. the following failure:

    === RUN   TestEmptyDB/py3_pickle3
        filestorage_test.go:90: testdata/py3_pickle3/empty.fs: invalid fs1 magic "FS30"

-> Fix it by accepting both py2 and py3 FileStorage magics.

We do accept them both without any other change because FileStorage
format, even when it comes with those two different magic, is really the
same.

8bae132e

go/zodb/fs1: Test FileStorage on all py2/py3 ZODB kinds of data we care about · ede4b3bb

Kirill Smelkov authored Jul 18, 2024

Previously we were testing FileStorage/go only with data generated by
python2 and pickle protocol=2. However even on py2 there are more pickle
protocols that are in use, and also there is python3.

-> Modernize py/gen-testdata to use run_with_all_zodb_pickle_kinds
   that was recently added as part of nexedi/zodbtools@f9d36ba7
   and generate test data with both python2 and python3. It is handy to
   use py2py3-venv(*) to prepare python environment to do that.

   Adjust tests on Go side to verify how FileStorage handles all generated zkinds.

py2_pickle1, py2_pickle2 and py2_pickle3 are handled well.
Tests for py3_pickle3 currently fail and so are marked with "xfail".

We will fix tests for py3_pickle3 in follow-up patches.

Old testdata are not yet removed because e.g. fs1tools and zodbdump
tests depend on them. We will remove old fs1 testdata after adjusting
tests in dependent packages step-by-step.

(*) see nexedi/zodbtools@fac2f190

ede4b3bb

go/zodb: Fix PyData.ClassName for py3_pickle3 · a865708a

Kirill Smelkov authored Jul 18, 2024

Test for PyData.ClassName currently fails for py3_pickle3 zkind:

    === RUN   TestPyClassName/py3_pickle3
        pydata_test.go:50: class name for "\x80\x03X\x18\x00\x00\x00ZODB.tests.testSerializeq\x00X\x13\x00\x00\x00ClassWithoutNewargsq\x01\x86q\x02N\x86q\x03.":
            have: "?.?"
            want: "ZODB.tests.testSerialize.ClassWithoutNewargs"
        pydata_test.go:50: class name for "\x80\x03X\x18\x00\x00\x00ZODB.tests.testSerializeq\x00X\x10\x00\x00\x00ClassWithNewargsq\x01\x86q\x02K\x01\x85q\x03\x86q\x04.":
            have: "?.?"
            want: "ZODB.tests.testSerialize.ClassWithNewargs"

This happens because classname decoder was expecting class and module
names to come as py2 bytestrings, while py3 emits them as unicode.

-> Fix it by accepting both py2 bytestring and unicode for class and
module names.

a865708a

go/zodb: Test PyData and Map/List on all py2/py3 kinds of data we care about · ed2b96ca

Kirill Smelkov authored Jul 18, 2024

Previously we were testing PyData and Map/List only with data generated by python2
and pickle protocol=2. However even on py2 there are more pickle
protocols that are in use, and also there is python3.

-> Modernize py/pydata-gen-testdata to use run_with_all_zodb_pickle_kinds
   that was recently added as part of nexedi/zodbtools@f9d36ba7
   and generate test data with both python2 and python3. It is handy to
   use py2py3-venv(*) to prepare python environment to do that.

   Adjust tests on Go side to verify how PyData and Map/List handle all generated
   zkinds.

py2_pickle1, py2_pickle2 and py2_pickle3 are handled well.
Tests for py3_pickle3 currently fail and so are marked with "xfail".

We will fix tests for py3_pickle3 in the next patches.

(*) see nexedi/zodbtools@fac2f190

ed2b96ca

18 Sep, 2024 3 commits

go/zodb/internal/weak: Disable support for weak references · ee23551d

Kirill Smelkov authored Sep 18, 2024

Experience shows that we cannot provide reliably working weak references
support without crashing GC because there needs to be a coordination in
between object resurrection and at least GC mark phase[1,2] with [3] and
[4] showing an idea what kind it coordination it needs to be.

And since support for weak references is upcoming in standard library
with hopefully Go 1.24 [5], it does not make sense to continue pushing
hard to make weak references work on our standalone side.

By disabling support for custom weak references we will avoid crashes,
but will start to leak unused objects in zodb.Connection live cache.
This should be relatively ok for now as WCFS in wendelin.core is
currently the only known ZODB/go user and it cares to drop ZBlk* state
after loading the data. The leak, thus, should be modest with,
hopefully, not creating problems in practice.

And once std package weak is out there we will switch to that restoring
automatic LiveCache cleanup.

[1] https://github.com/golang/go/issues/41303
[2] wendelin.core@9b44fc23
[3] https://github.com/golang/go/commit/dfc86e922cd0
[4] https://github.com/golang/go/commit/79fd633632cd
[5] https://github.com/golang/go/issues/67552

/reviewed-by @levin.zimmermann
/reviewed-on !11

ee23551d

go/zodb/internal/weak: Assert that the object was not moved in the finalizer · c48f7008

Kirill Smelkov authored Sep 18, 2024

This is good to do since we are relying on non-moving property of
current GC. It never triggered but it removed one uncertainty while
debugging GC crash issue. Good to have it in place.

/reviewed-by @levin.zimmermann
/reviewed-on !11

c48f7008

go/zodb/internal/weak: Try to fix GC crashes via reworking Ref to keep only one word instead of two · 55491547

Kirill Smelkov authored Sep 18, 2024

We are observing garbage-collector crashes due to weak package under
load(*) with GC crashing as e.g.

    runtime: full=0xc0001f10000005 next=205 jobs=204 nDataRoots=1 nBSSRoots=1 nSpanRoots=16 nStackRoots=184
    panic: non-empty mark queue after concurrent mark
    fatal error: panic on system stack

    runtime stack:
    runtime.throw({0x5c60fe?, 0x601d70?})
            /home/kirr/src/tools/go/go1.21/src/runtime/panic.go:1077 +0x5c fp=0xc000051e88 sp=0xc000051e58 pc=0x436efc
    panic({0x585100?, 0x601d70?})
            /home/kirr/src/tools/go/go1.21/src/runtime/panic.go:840 +0x6ea fp=0xc000051f38 sp=0xc000051e88 pc=0x436e0a
    runtime.gcMark(0x118946?)
            /home/kirr/src/tools/go/go1.21/src/runtime/mgc.go:1464 +0x40c fp=0xc000051fb0 sp=0xc000051f38 pc=0x41bd6c
    runtime.gcMarkTermination.func1()
            /home/kirr/src/tools/go/go1.21/src/runtime/mgc.go:962 +0x17 fp=0xc000051fc8 sp=0xc000051fb0 pc=0x41b277
    runtime.systemstack()
            /home/kirr/src/tools/go/go1.21/src/runtime/asm_amd64.s:509 +0x4a fp=0xc000051fd8 sp=0xc000051fc8 pc=0x468eea

One problem in current implementation is that weak.Ref keeps two words
for the copy of original interface object and recreates that interface
object on Ref.Get from those two words _nonatomically_. Which is
explicitly documented by Go memory model as something that can lead to
corrupted memory and crashes. From https://go.dev/ref/mem#restrictions:

    This means that races on multiword data structures can lead to
    inconsistent values not corresponding to a single write. When the values
    depend on the consistency of internal (pointer, length) or (pointer,
    type) pairs, as can be the case for interface values, maps, slices, and
    strings in most Go implementations, such races can in turn lead to
    arbitrary memory corruption.

We can avoid doing multiword writes during object resurrection by using
concrete type T* instead of interface{}.

Unfortunately as wendelin.core@9b44fc23
shows it does not help with the above issue and the GC continues to
crash with the same "panic: non-empty mark queue after concurrent mark"
message. This is because weak.Ref implementation needs tight integration
with concurrent GC that Go does and in practice we are unable to do that
from outside of Go runtime internals. See e.g.
https://github.com/golang/go/commit/dfc86e922cd0 and
https://github.com/golang/go/commit/79fd633632cd to get an idea what
kind of integration that needs to be.

Anyway before disabling weak references support I wanted to commit this
change to show that one-word approach was tried and it does not work.
This leaves a trace in the history.

On the good side we are going to use standard package weak in the future
hopefully (https://github.com/golang/go/issues/67552). That needs to
wait for at least Go 1.24 though.

(*) see https://github.com/golang/go/issues/41303 for details

/reviewed-by @levin.zimmermann
/reviewed-on !11

55491547

20 Aug, 2024 12 commits

go/neo/proto: Update 'Compression' to int to support different compression algorithms · 12a6a923

Levin Zimmermann authored May 06, 2024

With nexedi/neoppod@fd80cc30 NEO/py added support to encode the compression
algorithm with the 'Compression' parameter. Before this, compression could
only be true (= with compression) or false (= without compression). Now
the absence of compression is encoded with 0. Any other number than 0
encodes a compression algorithm. The mapping is currently:

	1 = zlib

In the future, 2 could mean zstd [1].

[1] https://github.com/facebook/zstd/issues/1134

/reviewed-by @kirr
/reviewed-on !6

12a6a923

go/internal/xtesting: Prepare to run user's tests with both py2 and py3 · 15401594

Kirill Smelkov authored Jul 18, 2024

Tests, that want to verify something wrt both py2 and py3 will do
WithEachPy(tested_func) which will run tested_func two times:

1) under environment where python on $PATH points to python2, and
2) under environment where python on $PATH points to python3.

Client tests will now need to use just "python" instead of e.g.
"python2" or "python3" to run some python program, and each time
"python" will correspond to current phase of WithEachPy execution.

This will be soon handy to test things wrt e.g. ZEO/py2 and ZEO/py3
servers and similar situations.

Tests that merely want to use some python program just for their inner
working, for example to run `zodb commit`, no longer indicate their
preference for py2. For such tests it is a matter of preference in
pre-setup environment to where "python" points.

For the reference under environments created with py2py3-venv(*) default
"python" currently points to "python2".

(*) see nexedi/zodbtools@fac2f190

15401594

go/internal/xtesting += ZTestData & friends · 07e1263a

Kirill Smelkov authored Jul 17, 2024

ZTestData is Go counterpart of ztestdata on zodbtools/py side from
https://lab.nexedi.com/nexedi/zodbtools/-/blob/f9d36ba7/zodbtools/test/conftest.py#L31-65
that was recently added in nexedi/zodbtools@bf772ce0.

On Go side each ZTestData describes testdata files on the filesystem +
carry arbitrary extra information. For example FileStorage tests will
use it to place there "what is expected to load" information
corresponding to on-filesystem database.

07e1263a

go/internal/xmaps: New package that complements standard package maps · 28489790

Kirill Smelkov authored Jul 17, 2024

Currently with the only utility to retrieve map keys in sorted order.

We will need this in several tests in follow-up patches.

28489790

go/zodbpickle: Factor/generalize pickletools.Xstrbytes8 into here as zodbpickle.Unpack64 · 8da1234e
Kirill Smelkov authored Jul 17, 2024
```
Similarly to ZODB/py which provides u64 and p64 utilities. Add
zodbpickle.Pack64 for symmetry.
```
8da1234e

go/*: Switch to StrictUnicode mode for pickling/unpickling · 078b6262

Kirill Smelkov authored Jul 17, 2024

Because without StrictUnicode both bytestrings (str from py2) and
unicode (unicode from py2 and str from py3) load into the same Go type
string so it becomes impossible to distinguish them from each other.
Re-saving data, thus, also generally introduces changes as e.g.
string loaded via bytestring will be saved as unicode when pickling with
protocol 3. Or a loaded unicode will be saved as bytestring with
pickling via protocol=2.

-> Switching to StrictUnicode mode solves all those problems.

Please see updated documentatin for zodbpickle.go and
https://github.com/kisielk/og-rek/commit/b28613c2 for more details about
StrictUnicode mode.

For ZODB/go this is change in behaviour exposed to outside. However
there is currently only one known ZODB/go user - WCFS in Wendelin.core -
and that user will be updated correspondingly as well.

078b6262

go/*: Move creation of a Pickler and Unpickler to central place · 02076be2

Kirill Smelkov authored Jul 17, 2024

We already have 3 places where we create picklers and unpicklers:
zodb/pydata.go, fs1/index.go and zeo/proto.go and they are already
diverging a bit: for example pydata was explicitly specifying protocol
for pickle encoder, while other places were using defaults.

We will soon want to use StrictUnicode option for all picklers and
unpicklers we create, which means it is better to keep this
customization only in one place instead of copy-pasting it in between
callsites increasing the possibility for divergence and errors.

-> Move the code to create pickler and unpickler to internal zodbpickle
package as a preparatory step for that.

For the reference: this package is merely a wrapper around ogórek, not a
fork of any code like zodbpickle/py is.

02076be2

go/go.mod: v↑ go 1.14 -> go 1.18 · 8bc46475

Kirill Smelkov authored Jul 17, 2024

We will soon want to use generics and `any` which are available only for go >= 1.18 .

The other changes in go.sum and go.mod are there because after just

    -go 1.14
    +go 1.18

the build stopped to work and asked to do `go mod tidy`

8bc46475

go/*: Switch pickletools.Xint64 to ogórek.AsInt64 · e0063782

Kirill Smelkov authored Jul 17, 2024

ogórek.AsInt64 was added in https://github.com/kisielk/og-rek/commit/010fbd2e
with functionality similar to what we had in pickletools.Xint64 before.

e0063782

go/go.mod: v↑ github.com/kisielk/og-rek to pre-release · 2d54cad5

Kirill Smelkov authored Jul 17, 2024

This brings in support for StrictUnicode mode, AsInt64, AsBytes and
AsString helpers, and other fixes.

See https://github.com/kisielk/og-rek/compare/v1.2.0...b1bd3f2e3d5e for
details.

2d54cad5

go: Add support for Go modules · cb174b5e
Kirill Smelkov authored Jul 17, 2024
```
So that building and using ZODB/go works not only in GOPATH mode.
```
cb174b5e

go/zodb: Add tests for Map and List · 11f1a2bd

Kirill Smelkov authored Aug 20, 2024

As promised in 6c162438 (go/zodb: Draft support for Map and List) add
test that Map and List implemented on Go side can load PersistentMapping
and PersistentList data generated by Python.

Test that Py side can load Map and List data generated by Go is still pending.

Testdata is still generated with zodbtools f9d36ba7~ i.e. before
nexedi/zodbtools!29 was merged.

11f1a2bd

19 Aug, 2024 2 commits

go/neo/proto: Fix KnownMasterList nesting · dc844465

Levin Zimmermann authored Jun 29, 2024

Before this patch, the 'KnownMasterList' field of the 'NotPrimaryMaster'
was expected to be structured in the following way:

	ArrayHeader (KnownMasterList)

	    ArrayHeader (KnownMaster)

		ArrayHeader (Address)

		    Host (string)
		    Port (uint16)

However NEO/py sends the following structure:

	ArrayHeader (KnownMasterList)

	    ArrayHeader (Address)

		Host (string)
		Port (uint16)

This also makes sense, as 'KnownMaster' doesn't need to add another
nesting, because it only includes the address.

This patch amends the NEO/go protocol definition to transparently
represent the nesting as it's send by NEO/py. See also
levin.zimmermann/neoppod@18287612 for a similar issue.

----
kirr: tests currently live only on t branch.

/reviewed-by @kirr
/reviewed-on !6

dc844465

go/neo/proto/RowInfo: Fix representation on the wire · 7f96ad77

Levin Zimmermann authored Apr 08, 2024

Some NEO protocol packets have the field 'RowList'. This field contains
information about each row of a partition table. In NEO/go the information
of each row is represented with the 'RowInfo' type [1]. This type is defined
as a struct with the field ‘CellList’. ‘CellList’ is defined as a list of
'CellInfo' [1] (e.g. an entry for each cell).

NEO/go {en,de}codes struct types with ‘genStructHead’ (structures in
golang are encoded as arrays in msgpack) [2]. From the 'RowList'
definition, the msgpack decoder currently expects the following msgpack
array structure:

    ArrayHeader (RowList)

        ArrayHeader (RowInfo)

            ArrayHeader (CellList)

                ArrayHeader (CellInfo)

                    int32 (NID)
                    enum (State)

However NEO/py actually sends:

    ArrayHeader (RowList)

        ArrayHeader (CellList)

            ArrayHeader (CellInfo)

                int32 (NID)
                enum (State)

In other words, currently the NEO/go msgpack encoder expects one more
nesting, which NEO/py doesn’t provide (and which also doesn’t seem to
be necessary as the outer nesting would always only contain one
element). We could adjust the msgpack {en,de}coder to introduce an
exception for the 'RowInfo' type, however as the protocol definition in
'proto.go' aims to transparently reflect the structure of the packets on
the wire, it seems to be more appropriate to fix this straight in the
protocol definition. This is also less error-prone as we don't have to
fix all the different positions of the encoder, decoder & sizer and it's
less code (particularly if 'RowInfo' doesn't stay the only case for such
an issue).

[1] https://lab.nexedi.com/kirr/neo/-/blob/1ad088c8/go/neo/proto/proto.go#L391-394
[2] https://lab.nexedi.com/kirr/neo/-/blob/1ad088c8/go/neo/proto/protogen.go#L1770-1775

--------

kirr: I've applied the following interdiff to the original patch of
levin.zimmermann/neoppod@c93d5dbc :

    --- a/go/neo/neo_test.go
    +++ b/go/neo/neo_test.go
    @@ -100,7 +100,7 @@ func _TestMasterStorage(t0 *tEnv) {
                    PTid:           1,
                    NumReplicas:    0,
                    RowList:        []proto.RowInfo{
    -                       proto.RowInfo{proto.CellInfo{proto.NID(proto.STORAGE, 1), proto.UP_TO_DATE}},
    +                       {proto.CellInfo{proto.NID(proto.STORAGE, 1), proto.UP_TO_DATE}},
                    },
            }))

    @@ -173,7 +173,7 @@ func _TestMasterStorage(t0 *tEnv) {
                    PTid:           1,
                    NumReplicas:    0,
                    RowList:        []proto.RowInfo{
    -                       proto.RowInfo{proto.CellInfo{proto.NID(proto.STORAGE, 1), proto.UP_TO_DATE}},
    +                       {proto.CellInfo{proto.NID(proto.STORAGE, 1), proto.UP_TO_DATE}},
                    },
            }))

    --- a/go/neo/proto/proto_test.go
    +++ b/go/neo/proto/proto_test.go
    @@ -210,9 +210,9 @@ func TestMsgMarshal(t *testing.T) {
                            PTid:        0x0102030405060708,
                            NumReplicas: 34,
                            RowList: []RowInfo{
    -                               {CellInfo{11, UP_TO_DATE}, CellInfo{17, OUT_OF_DATE}},
    -                               {CellInfo{11, FEEDING}},
    -                               {CellInfo{11, CORRUPTED}, CellInfo{15, DISCARDED}, CellInfo{23, UP_TO_DATE}},
    +                               {{11, UP_TO_DATE}, {17, OUT_OF_DATE}},
    +                               {{11, FEEDING}},
    +                               {{11, CORRUPTED}, {15, DISCARDED}, {23, UP_TO_DATE}},
                            },
                            },

    @@ -229,9 +229,9 @@ func TestMsgMarshal(t *testing.T) {
                            hex("cf0102030405060708") +
                            hex("22") +
                            hex("93") +
    -                               hex("92"+"920bd40001"+"9211d40000") +
    -                               hex("91"+"920bd40002") +
    -                               hex("93"+"920bd40003"+"920fd40004"+"9217d40001"),
    +                               hex("92"+"920bd40401"+"9211d40400") +
    +                               hex("91"+"920bd40402") +
    +                               hex("93"+"920bd40403"+"920fd40404"+"9217d40401"),
                    },

                    // map[Oid]struct {Tid,Tid,bool}

for cosmetics and because the tests were failing as

    --- FAIL: TestMsgMarshal (0.00s)
        proto_test.go:106: M/proto.AnswerPartitionTable: encode result unexpected:
        proto_test.go:107:  have: 93cf0102030405060708229392920bd404019211d4040091920bd4040293920bd40403920fd404049217d40401
        proto_test.go:108:  want: 93cf0102030405060708229392920bd400019211d4000091920bd4000293920bd40003920fd400049217d40001

/reviewed-by @kirr
/reviewed-on !6

7f96ad77

04 Aug, 2024 1 commit

go/neo/neonet: DialLink: Fix SIGSEGV in case client handshake fails · d75f4ac2

Levin Zimmermann authored Aug 02, 2024

In case the last 'handshakeClient' call returns an error, 'DialLink'
returns 'link = nil, err = nil'. Callers of 'DialLink' then don't
recognize that 'link' is 'nil', as it's the convention to only check if
'err' is 'nil', which leads to a 'segmentation violation' as soon as
subsequent code tries to access fields of 'link':

```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x14 pc=0x7087ae]

goroutine 5 [running]:
lab.nexedi.com/kirr/neo/go/neo/neonet.(*NodeLink).NewConn(0x0)
	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/neo/neonet/connection.go:404 +0x4e
lab.nexedi.com/kirr/neo/go/neo/xneo.Dial.func1()
	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/neo/xneo/connect.go:138 +0x52
lab.nexedi.com/kirr/neo/go/internal/xio.WithCloseOnErrCancel.func2()
	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/internal/xio/xio.go:114 +0x6a
created by lab.nexedi.com/kirr/neo/go/internal/xio.WithCloseOnErrCancel in goroutine 21
	/srv/slapgrid/slappart82/srv/runner/instance/slappart6/software_release/parts/wendelin.core/wcfs/neo/go/internal/xio/xio.go:109 +0x1ad
```

This patch fixes this issue so that now 'err' and 'link' are never both
'nil' again.

/reviewed-by @kirr
/reviewed-on kirr/neo!10

d75f4ac2