Fix support for files containing non-ascii chars.
pyhton's stdin encoding is based on external information which may not related to the actual encoding of the file. For example, on a system this is: encoding='UTF-8' errors='surrogateescape' which then cause an exception to be raised if any surrogate was produced: UnicodeEncodeError: 'utf-8' codec can't encode character '\udca3' in position 4134: surrogates not allowed Reading the same file directly (instead of going through stdin) succeeds, because the replacement char is used instead. Reconfigure stdin encoding and error handling so it is consistent with files being opened by this tool directly. This is not to say that "ascii" and "replace" are the ultimate best choice (of which I am not completely convinced...) but at least this makes stdin work in exactly the same way as named files.
Showing
Please register or sign in to comment