• Kirill Smelkov's avatar
    decoder: Don't treat \r\n as combined EOL · 57137139
    Kirill Smelkov authored
    Currently we use bufio.Reader.ReadLine which accepts either \n or \r\n
    as line ending. That is however not correct:
    
    - we should not accept e.g. "S'abc'\r\n." pickle, because it is
      invalid:
    
    	In [32]: pickle.loads(b"S'abc'\r\n.")
    	---------------------------------------------------------------------------
    	UnpicklingError                           Traceback (most recent call last)
    	<ipython-input-32-b1da1988bae1> in <module>()
    	----> 1 pickle.loads(b"S'abc'\r\n.")
    
    	UnpicklingError: the STRING opcode argument must be quoted
    
    - we should not accept e.g. "L123L\r\n.", because it is also invalid:
    
    	In [33]: pickle.loads(b"L123L\r\n.")
    	---------------------------------------------------------------------------
    	ValueError                                Traceback (most recent call last)
    	<ipython-input-33-7231ec07f5c4> in <module>()
    	----> 1 pickle.loads(b"L123L\r\n.")
    
    	ValueError: invalid literal for int() with base 10: '123L\r\n'
    
    - treating \r as part of EOL in e.g. UNICODE pickle would just drop encoded
      information:
    
    	# python
    	In [34]: pickle.loads(b"Vabc\r\n.")
    	Out[34]: 'abc\r'
    
      while ogórek currently decodes it as just 'abc' (no trailing \r).
    
    For this reason let's fix Decoder.readLine to treat only \n as EOL.
    
    Besides this fix, we now get another property: previously, when internally
    using bufio.Reader.ReadLine we were not able to distinguish two situations:
    
    - a line was abruptly ended without any EOL characters at all,
    - a line was properly ended with EOL character.
    
    Now after we switched to internally using bufio.Reader.ReadSlice, we will be
    able to properly detect EOF and return that as error. This property will be
    needed in the following patch.
    57137139
ogorek_test.go 28.7 KB