resilient: find doesn't fail if files have wrong permissions
we have webrunners where some files don't belong to correct user in partitions. In this case, find fails and we get this error:
Traceback (most recent call last):
File "/opt/slapgrid/01ce71861bb9c5883cdbdf8af74c5d1f/bin/runner-exporter", line 83, in <module>
sys.exit(slapos.resilient.runner_exporter.runExport())
File "/opt/slapgrid/01ce71861bb9c5883cdbdf8af74c5d1f/eggs/slapos.toolbox-0.100-py2.7.egg/slapos/resilient/runner_exporter.py", line 182, in runExport
modified_file_list = getBackupFilesModifiedDuringExportList(args, export_start_date)
File "/opt/slapgrid/01ce71861bb9c5883cdbdf8af74c5d1f/eggs/slapos.toolbox-0.100-py2.7.egg/slapos/resilient/runner_exporter.py", line 106, in getBackupFilesModifiedDuringExportList
'find', 'instance', '-cmin', str(export_time / 60), '-type', 'f', '-path', '*/srv/backup/*'
File "/opt/slapgrid/01ce71861bb9c5883cdbdf8af74c5d1f/parts/python2.7/lib/python2.7/subprocess.py", line 223, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '('find', 'instance', '-cmin', '0.335502366225', '-type', 'f', '-path', '*/srv/backup/*')' returned non-zero exit status 1
Make find don't go into directories with wrong permissions
- Owner
- Maintainer
Hi Thomas,
I have a thought : when we rsync files, we use the options "-og" which preserve the group and the owner of the files. Using the
-perm
option seems the correct way to do, but can it create other issues in the resiliency process (iow : I think it's not OK to create a file belonging to root) ?Maybe it is OK, as in
man find
there is :-o, --owner preserve owner (super-user only)
which would mean this switch is useless in our case.
- Owner
How is it possible that a non root unix user created directories owned by root ? wasn't someone using root user by mistake ? (in which case maybe we don't want to support this and we just chown back the files to this webrunner user)
- Owner
- Owner
@jerome indeed, it's not possible by default and the user owning this partition has special rights in the sudoers file because compiling ChromiumOS necessitate root permission. I agree this is an odd case and shouldn't happen most of time. Still it exists and this webrunner instance is failing.
@Nicolas the rsync process is not launched with root user but with slapuser so preserve owner and group will only work if the file belongs to that user. Anyway, the rsync is used later to rsync what is backuped. Here the problem is just find going through all the instance directories and failing on some root files. Those files are not backuped at all.
added 1 commit
- 9e9ca4e1 - resilient: find doesn't fail if files have wrong permissions
- Maintainer
Hi @rafael ,
I think if there are files that don't belong to the slapuser, resiliency should crash (then user will know something is wrong in their slaprunner).
This said, I know in the case of the slaprunner building nayuos, we need to be root for some compilation steps, which lead to this special case. Then going the way of backuping files that the slapuser can at least read seems ok to me FOR THIS SPECIAL CASE.
To think further (I don't say we shouldn't do anything about this, I just put my thoughts here so we have a base of discussion) : these files belonging to root will after copy to runner1 belong to the slapuser (as I said, I don't think root is preserved). Thus after takeover building nayuos won't work, until someone spends time to find which permission should be put on which file.
So if we want to go the most correct way, we should indeed check what are these files belonging to root :
- if they are compilation residues, then they should be excluded of the backup.
- if they are compilation results (iow, important to keep), then they should be chown at the end of the compilation (as we are root at that moment).
- if they are binaries used for the compilation, they should or belong to the nayuos SR, with whatever it takes to make them belong to root later (promise raising, documentation on how to build nayuos, ...), but then they don't need to be backuped (they will be recreated on runner 1 by resiliency), or be created at the system level, and then they don't belong to the backup neither (maybe they can be created by an ansible playbook ?).
I don't know how much effort we want to put in this, so as I said before, backuping the files that we can at least read is OK for me.
- Owner
@Nicolas.... if resilience crash (regardless of the reason) .... it is not resilient anymore. So we loose the purpose of having it... remember “it should work even if it does not work” (by @jp)
Edited by Rafael Monnerat - Owner
Thanks @tomo I understand now. I don't know how resiliency should address this, but I won't be shocked if we say this scenario is not supported by resiliency because as far as I know using sudo is not supported in SlapOS.
- Owner
Everyone, please remember that this find is used in
getBackupFilesModifiedDuringExportList
function. This function searches for files with path looking like*/srv/backup/*
modified recently in all instance directories. But while doing this search, the find goes inside all the instance directories (backup, no backup, etc.). The problem here is not that we want to backup files belonging to root. We just want resiliency not to fail because a few filles belong to root.Another way maybe is to explicitly look into
instance/slappart*/srv/backup
something like:'find', 'instance/slappart*/srv/backup', '-cmin', str(export_time / 60), '-type', 'f'
- Maintainer
Hi,
Thanks for making me notice that the issue happens during
getBackupFilesModifiedDuringExportList
, I totally missed that. Could you also tell me if we want to backup these files or not ? If not, I think we should exclude the problematic folders from the backup by adding their path in the filesrv/exporter.exclude`` and update
getBackupFilesModifiedDuringExportListto add the support of excluded files in the call to
findand not after (by using the switch
! -path```). If we want them in the backup, then there is another bug as the rsync shouldn't have returned an error...@rafael : sorry, I shouldn't have used "crash", but "fail gracefully" :) . Then the failure is found out by a promise to alert the user. I mean, exporting a backup that we can't rebuild on the other site and discovering this too late is much worst than having the resiliency alert the user that something is wrong (and in that case we are still in the it-works case because we have monitoring).
The more I think about this case, the more I am convinced that the issue is with the NayuOS software release, and not with the resiliency code (for exemple, why does it store files that the slapuser can't read in the backup folder ? If it is a real backup, then it's good that
getBackupFilesModifiedDuringExportList
raise on them. If they are not to backup, then they shouldn't be stored there.). - Owner
Those files belonging to root are not backuped at all (actually, almost nothing is backuped there):
$ cat instance/slappart0/srv/exporter.exclude parts/ parts/**
The files belonging to root are under
instance/slappart0/parts/chromiumos/release-*/chroot/build
.@Nicolas the problem is that find doesn't have a exclude file list option like rsync. Beside, for me it make no sense to look inside the
srv/backup
directory and exclude the files insrv/exporter.exclude
(because the exporter.exclude is listing what we should exclude to copy inside srv/backup). What do you think about my suggestion:'find', 'instance/slappart*/srv/backup', '-cmin', str(export_time / 60), '-type', 'f'
? Do you remember why we didn't do like this from the beginning ? This restricts the find to only meaningful directories.Edited by Thomas Gambier - Maintainer
Hi Thomas,
The solution
'find', 'instance/slappart*/srv/backup'
seems very good to me.Why it wasn't done before I don't know, but as far as I know there is no strong reason for the pattern
'*/srv/backup/*'
- Maintainer
Sorry, I meant :
for the pattern
'-path', '*/srv/backup/*'
added 1 commit
- 70727638 - resilient: find only look into instance/*/srv/backup directories
added 93 commits
-
70727638...a9f3cb5f - 92 commits from branch
master
- f6d1bda7 - resilient: find only look into instance/*/srv/backup directories
-
70727638...a9f3cb5f - 92 commits from branch
- Owner
Hello all,
I finally did the change we mentioned. The biggest change is that find won't look into
srv/backup
directories not located at the top of the instance directory (like in theia or webrunners) but I think it is consistent with the purpose ofgetBackupFilesModifiedDuringExportList
function. closed
- You're only seeing other activity in the feed. To add a comment, switch to one of the following options.
Files with large changes are collapsed by default.