Switch tee from threading.Thread to sync.WorkGroup
The reason is that with threading.Thread if exception happens in that spawned thread, this error is not propagated to main driver, while with sync.WorkGroup an exception from any spawned worker is propagated back to main. For example with the following injected error --- a/nxdtest/__init__.py +++ b/nxdtest/__init__.py @@ -267,6 +267,7 @@ def main(): # tee, similar to tee(1) utility, copies data from fin to fout appending them to buf. def tee(ctx, fin, fout, buf): + 1/0 while 1: before this patch nxdtest behaves like ... (neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ nxdtest date: Tue, 24 Nov 2020 14:55:08 MSK xnode: kirr@deco.navytux.spb.ru uname: Linux deco 5.9.0-2-amd64 #1 SMP Debian 5.9.6-1 (2020-11-08) x86_64 cpu: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz >>> pytest $ python -m pytest Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 270, in tee 1/0 ZeroDivisionError: integer division or modulo by zero Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 270, in tee 1/0 ZeroDivisionError: integer division or modulo by zero error pytest 0.583s # 1t 1e 0f 0s (neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ echo $? 0 Here the error in another thread is only printed, but nxdtest is not aborted. Above it reported "error", but e.g. when testing pygolang/py3 and raising an error in tee it even reported it was succeeding ( nexedi/nxdtest!6 (comment 121393) ): slapuser34@vifibcloud-rapidspace-hosting-007:~/srv/runner/instance/slappart0$ ./bin/runTestSuite date: Tue, 24 Nov 2020 12:51:23 MSK xnode: slapuser34@vifibcloud-rapidspace-hosting-007 uname: Linux vifibcloud-rapidspace-hosting-007 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 cpu: Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz >>> thread $ python -m pytest Exception in thread Thread-1: Traceback (most recent call last): File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/srv/slapgrid/slappart34/srv/runner/software/44fe7dd3f13ecd100894c6368a35c055/parts/nxdtest/nxdtest/__init__.py", line 268, in tee fout.write(data) TypeError: write() argument must be str, not bytes ok thread 9.145s # 1t 0e 0f 0s >>> gevent $ gpython -m pytest Exception in thread Thread-3: Traceback (most recent call last): File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/srv/slapgrid/slappart34/srv/runner/software/44fe7dd3f13ecd100894c6368a35c055/parts/nxdtest/nxdtest/__init__.py", line 268, in tee fout.write(data) TypeError: write() argument must be str, not bytes ok gevent 21.980s # 1t 0e 0f 0s After this patch nxdtest correctly handles and propagates an error originating in spawned thread back to main driver: (neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ nxdtest date: Tue, 24 Nov 2020 14:54:19 MSK xnode: kirr@deco.navytux.spb.ru uname: Linux deco 5.9.0-2-amd64 #1 SMP Debian 5.9.6-1 (2020-11-08) x86_64 cpu: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz >>> pytest $ python -m pytest Traceback (most recent call last): File "/home/kirr/src/wendelin/venv/z4-dev/bin/nxdtest", line 11, in <module> load_entry_point('nxdtest', 'console_scripts', 'nxdtest')() File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 230, in main wg.wait() File "golang/_sync.pyx", line 237, in golang._sync.PyWorkGroup.wait pyerr_reraise(pyerr) File "golang/_sync.pyx", line 217, in golang._sync.PyWorkGroup.go.pyrunf f(pywg._pyctx, *argv, **kw) File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 270, in tee 1/0 ZeroDivisionError: integer division or modulo by zero (neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ echo $? 1 NOTE sync.WorkGroup requires every worker to handle context cancellation, so that whenever there is an error, all other workers are canceled. We add such cancellation handling to tee but only lightly: before going to block in read/write syscalls we check for whether ctx is canceled or not. However the proper handling would be to switch file descriptors into non-block mode and to select at every IO point on both potential IO events and potential cancellation. This is left as TODO for the future. /reviewed-on nexedi/nxdtest!7
Showing
-
mentioned in merge request !7 (closed)
-
mentioned in commit kirr/slapos@4c721272
-
mentioned in merge request slapos!918 (closed)
-
mentioned in commit slapos@82560932
-
mentioned in merge request !14 (merged)
Please register or sign in to comment