Commit d7f80021 authored by Vincent Pelletier's avatar Vincent Pelletier

Fix support for files containing non-ascii chars.

pyhton's stdin encoding is based on external information which may not
related to the actual encoding of the file. For example, on a system this
is:
  encoding='UTF-8'
  errors='surrogateescape'
which then cause an exception to be raised if any surrogate was produced:
  UnicodeEncodeError: 'utf-8' codec can't encode character '\udca3' in position 4134: surrogates not allowed
Reading the same file directly (instead of going through stdin) succeeds,
because the replacement char is used instead.

Reconfigure stdin encoding and error handling so it is consistent with
files being opened by this tool directly.
This is not to say that "ascii" and "replace" are the ultimate best choice
(of which I am not completely convinced...) but at least this makes stdin
work in exactly the same way as named files.
parent dafe8536
...@@ -1642,6 +1642,10 @@ def main(): ...@@ -1642,6 +1642,10 @@ def main():
file=sys.stderr) file=sys.stderr)
if filename == '-': if filename == '-':
logfile = sys.stdin logfile = sys.stdin
logfile.reconfigure(
encoding=INPUT_ENCODING,
errors=INPUT_ENCODING_ERROR_HANDLER,
)
logfile_context = nullcontext() logfile_context = nullcontext()
else: else:
for opener, exc in FILE_OPENER_LIST: for opener, exc in FILE_OPENER_LIST:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment