• Kirill Smelkov's avatar
    golang_str: bstr/ustr iteration · a72c1c1a
    Kirill Smelkov authored
    Even though bstr is semantically array of bytes, while ustr is array of
    unicode characters, iterating them _both_ yields unicode characters.
    This goes in line with Go approach described in "Strings, bytes, runes
    and characters in Go"[1] and allows for both ustr _and_ bstr to be used
    as strings in unicode world.
    
    Even though this diverges (just a bit) from str/py2 str behaviur, and
    diverges more from bytes/py3 behaviour, I have not hit any problem in
    practice due to this divergence. In other words the semantics of
    bytestring used in Go - to iterate them as unicode characters - is
    sound. For the reference it is the authors of Go who originally invented
    UTF-8 - see [2] for details.
    
    See also [3] for our discussion with Jérome on this topic.
    
    [1] https://blog.golang.org/strings
    [2] https://www.cl.cam.ac.uk/~mgk25/ucs/UTF-8-Plan9-paper.pdf
    [3] nexedi/zodbtools!13 (comment 81646)
    a72c1c1a
_golang_str.pyx 27.4 KB