pbs: promises redirect ssh output to /dev/null
-
Developer
Hello @kirr ,
I'm not really proud of this commit, but still we pushed it because the subprocess is writing some binary on the output. As the output is processed afterwards by other script, I prefered hiding the output (and because I couldn't find a way to just clean it). Still, the promise is doing its advertising job, and most of the time the explanation of why this promise was failing can be found somewhere else (ie: "pbs promised failed" -> on runner0, "resilient-sshd" promise tells me that the ssh server is down).
The binary outputs appears on
ssh.stdin.write(quitcommand)
and displays :b^@^@^@^@^@^@^@^Hquitting
. I tried to pipe the output of the subprocess to then apply some decoding function to it, but nothing I tried worked.If you have some idea on how to handle this problem, I'll be glad to improve this commit.
-
Owner
@Nicolas ok, I see, thanks for feedback. The promise here talks to rdiff-backup server just giving to it its "QUIT" command and the server actually replies via rdiff-backup protocol probably about some header and then that it is quiting. So in this particular case hiding stdout maybe makes sense. But:
-
we should not hide stderr, because if there are errors they could be emitted there. And normally stderr is expected to be empty.
-
hiding details, or forcing people to indirectly see what is going on is not really helpful in my experience. e.g. in this case if this promise does not work can mean several things:
- sshd is not working
- rdiffbackup server behind this sshd is not working
- probably something else (e.g. all configured fine, but e.g. due to low memory situation rdiff-backup server just cannot be started or starts and is immediately killed by OOM killer or something like that).
I think we want to see the details about why promise failed in the promise stderr output itself.
P.S. here are some links about rdiff-backup protocol I've briefly grasped:
- the server is reading requests in a loop here:
https://lab.nexedi.com/nexedi/rdiff-backup/blob/15c5cb5b/rdiff_backup/connection.py#L305 -
_get()
is where request decoding happens:
https://lab.nexedi.com/nexedi/rdiff-backup/blob/15c5cb5b/rdiff_backup/connection.py#L229 - upon receiving ConnectionQuit the server does
self._put("quitting", self.get_new_req_num())
-
_put()
is where objects encoding happens:
https://lab.nexedi.com/nexedi/rdiff-backup/blob/15c5cb5b/rdiff_backup/connection.py#L128 - for string types you can trace how
_put()
calls_putbuf()
and then_write()
in general if one wants to decode what happens with rdiff-backup on the wire, I think it is possible to call
_get()
in a loop and then you have all objects sent in one direction. Both directions could be probably handled too, if needed, with timestamps and proper intermixing, I think.If you want to decode just one reply I think you can call
_get()
on the output and check whether it is expected "quitting" reply.Kirill
-
-
Developer
Hello @kirr ,
Indeed hiding stderr was a very poor decision from myself. As a temporary solution, I've decided to only hide stdout, as for the moment it's the one which causes problems. I'll try to put some logs in the targeted functions to see what's causing this binary output (I still don't know if it's rdiff-backup related, or openssh related).
Thank you for the pointers,
Nicolas