golang_str: Start exposing Pygolang string types publicly
In 2020 in edc7aaab (golang: Teach qq to be usable with both bytes and str format whatever type qq argument is) I added custom bytes- and unicode- like types for qq to return instead of str with the idea for qq's result to be interoperable with both bytes and unicode. Citing that patch: qq is used to quote strings or byte-strings. The following example illustrates the problem we are currently hitting in zodbtools with Python3: >>> "hello %s" % qq("мир") 'hello "мир"' >>> b"hello %s" % qq("мир") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str' >>> "hello %s" % qq(b("мир")) 'hello "мир"' >>> b"hello %s" % qq(b("мир")) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str' i.e. one way or another if type of format string and what qq returns do not match it creates a TypeError. We want qq(obj) to be useable with both string and bytestring format. For that let's teach qq to return special str- and bytes- derived types that know how to automatically convert to str->bytes and bytes->str via b/u correspondingly. This way formatting works whatever types combination it was for format and for qq, and the whole result has the same type as format. For now we teach only qq to use new types and don't generally expose _str and _unicode to be returned by b and u yet. However we might do so in the future after incrementally gaining a bit more experience. So two years later I gained that experience and found that having string type, that can interoperate with both bytes and unicode, is generally useful. It is useful for practical backward compatibility with Python2 and for simplicity of programming avoiding constant stream of encode/decode noise. Thus the day to expose Pygolang string types for general use has come. This patch does the first small step: it exposes bytes- and unicode- like types (now named as bstr and ustr) publicly. It switches b and u to return bstr and ustr correspondingly instead of bytes and unicode. This is change in behaviour, but hopefully it should not break anything as there are not many b/u users currently and bstr and ustr are intended to be drop-in replacements for standard string types. Next patches will enhance bstr/ustr step by step to be actually drop-in replacements for standard string types for real. See nexedi/zodbtools!13 (comment 81646) for preliminary discussion from 2019. See also "Python 3 Losses: Nexedi Perspective"[1] and associated "cost overview"[2] for related presentation by Jean-Paul from 2018. [1] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20/1 [2] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20
Showing
Please register or sign in to comment