• Kirill Smelkov's avatar
    golang: Provide b, u for strings · bcb95cd5
    Kirill Smelkov authored
    With Python3 I've got tired to constantly use .encode() and .decode();
    getting exception if original argument was unicode on e.g. b.decode();
    getting exception on raw bytes that are invalid UTF-8, not being able to
    use bytes literal with non-ASCII characters, etc.
    
    So instead of this pain provide two functions that make sure an object
    is either bytes or unicode:
    
    - b converts str/unicode/bytes s to UTF-8 encoded bytestring.
    
    	Bytes input is preserved as-is:
    
    	   b(bytes_input) == bytes_input
    
    	Unicode input is UTF-8 encoded. The encoding always succeeds.
    	b is reverse operation to u - the following invariant is always true:
    
    	   b(u(bytes_input)) == bytes_input
    
    - u converts str/unicode/bytes s to unicode string.
    
    	Unicode input is preserved as-is:
    
    	   u(unicode_input) == unicode_input
    
    	Bytes input is UTF-8 decoded. The decoding always succeeds and input
    	information is not lost: non-valid UTF-8 bytes are decoded into
    	surrogate codes ranging from U+DC80 to U+DCFF.
    	u is reverse operation to b - the following invariant is always true:
    
    	   u(b(unicode_input)) == unicode_input
    
    NOTE: encoding _and_ decoding *never* fail nor loose information. This
    is achieved by using 'surrogateescape' error handler on Python3, and
    providing manual fallback that behaves the same way on Python2.
    
    The naming is chosen with the idea so that b(something) resembles
    b"something", and u(something) resembles u"something".
    
    This, even being only a part of strings solution discussed in [1],
    should help handle byte- and unicode- strings in more robust and
    distraction free way.
    
    Top-level documentation is TODO.
    
    [1] nexedi/zodbtools!13
    bcb95cd5
strconv.py 8.49 KB