golang/golang_str_test.py · 2bb971ba618fc3fbfdbbbd6f855a606c961bf6d9 · Kirill Smelkov / pygolang

Initially I implemented things in such a way that (b|u)str.__bytes__ were giving bstr and ustr.encode() was giving bstr as well. My logic here was that bstr is based on bytes and it is ok to give that. However this logic did not pass backward compatibility test: for example when LXML is imported it does cdef bytes _FILENAME_ENCODING = (sys.getfilesystemencoding() or sys.getdefaultencoding() or 'ascii').encode("UTF-8") and under gpython it breaks with File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/bin/runwsgi", line 4, in <module> from Products.ERP5.bin.zopewsgi import runwsgi; sys.exit(runwsgi()) File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5/__init__.py", line 36, in <module> from Products.ERP5Type.Utils import initializeProduct, updateGlobals File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5Type/__init__.py", line 42, in <module> from .patches import pylint File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5Type/patches/pylint.py", line 524, in <module> __import__(module_name, fromlist=[module_name], level=0)) File "src/lxml/sax.py", line 18, in init lxml.sax File "src/lxml/etree.pyx", line 154, in init lxml.etree TypeError: Expected bytes, got golang.bstr The breakage highlights a thinko in my previous reasoning: yes bstr is based on bytes, but bstr has different semantics compared to bytes: even though e.g. __getitem__ works the same way for bytes on py2, it works differently compared to py3. This way if on py3 a program is doing bytes(x) or x.encode() it then expects the result to have bytes semantics of current python which is not the case if the result is bstr. -> Fix that by adjusting .encode() and .__bytes__() to produce bytes type of current python and leave string domain. I initially was contemplating for some time to introduce a third type, e.g. bvec also based on bytes, but having bytes semantic and that bvec.decode would return back to pygolang strings domain. But due to the fact that bytes semantic is different in between py2 and py3, it would mean that bvec provided by pygolang would need to have different behaviours dependent on current python version which is undesirable. In the end with leaving into native bytes the "bytes inconsistency" problem is left to remain under std python with pygolang targeting only to fix strings inconsistency in between py2 and py3 and providing the same semantic for bstr and ustr on all python versions. It also does not harm that bytes.decode() returns std unicode instead of str: for programs that run under unpatched python we have u() to convert the result to ustr, while under gpython std unicode is actually ustr which makes bytes.decode() behaviour still quite ok. P.S. we enable bstr.encode for consistency and because under py2, if not enabled, it will break when running pytest under gpython in File ".../_pytest/assertion/rewrite.py", line 352, in <module> RN = "\r\n".encode("utf-8") AttributeError: unreadable attribute

golang_str_test.py 110 KB

Replace golang_str_test.py