• Kirill Smelkov's avatar
    golang_str: Start exposing Pygolang string types publicly · 1f99393d
    Kirill Smelkov authored
    In 2020 in edc7aaab (golang: Teach qq to be usable with both bytes and
    str format whatever type qq argument is) I added custom bytes- and
    unicode- like types for qq to return instead of str with the idea for
    qq's result to be interoperable with both bytes and unicode. Citing that patch:
    
        qq is used to quote strings or byte-strings. The following example
        illustrates the problem we are currently hitting in zodbtools with
        Python3:
    
            >>> "hello %s" % qq("мир")
            'hello "мир"'
    
            >>> b"hello %s" % qq("мир")
            Traceback (most recent call last):
              File "<stdin>", line 1, in <module>
            TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str'
    
            >>> "hello %s" % qq(b("мир"))
            'hello "мир"'
    
            >>> b"hello %s" % qq(b("мир"))
            Traceback (most recent call last):
              File "<stdin>", line 1, in <module>
            TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str'
    
        i.e. one way or another if type of format string and what qq returns do not
        match it creates a TypeError.
    
        We want qq(obj) to be useable with both string and bytestring format.
    
        For that let's teach qq to return special str- and bytes- derived types that
        know how to automatically convert to str->bytes and bytes->str via b/u
        correspondingly. This way formatting works whatever types combination it was
        for format and for qq, and the whole result has the same type as format.
    
        For now we teach only qq to use new types and don't generally expose
        _str and _unicode to be returned by b and u yet. However we might do so
        in the future after incrementally gaining a bit more experience.
    
    So two years later I gained that experience and found that having string
    type, that can interoperate with both bytes and unicode, is generally
    useful. It is useful for practical backward compatibility with Python2
    and for simplicity of programming avoiding constant stream of
    encode/decode noise. Thus the day to expose Pygolang string types for
    general use has come.
    
    This patch does the first small step: it exposes bytes- and unicode-
    like types (now named as bstr and ustr) publicly. It switches b and u to
    return bstr and ustr correspondingly instead of bytes and unicode. This
    is change in behaviour, but hopefully it should not break anything as
    there are not many b/u users currently and bstr and ustr are intended to
    be drop-in replacements for standard string types.
    
    Next patches will enhance bstr/ustr step by step to be actually drop-in
    replacements for standard string types for real.
    
    See nexedi/zodbtools!13 (comment 81646)
    for preliminary discussion from 2019.
    
    See also "Python 3 Losses: Nexedi Perspective"[1] and associated "cost
    overview"[2] for related presentation by Jean-Paul from 2018.
    
    [1] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20/1
    [2] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20
    1f99393d
_golang_str.pyx 11.5 KB