In the context of the project [GNU/Linux System files on-boot Tamper Detection System](https://www.erp5.com/group_section/forum/GNU-Linux-System-files-on-boot-Tamper-Detection-System-94DGdYfmx1), we need to create an agent that will be run inside an initramfs to report as much metadata as useful while keeping system boot times acceptable. It must then report that metadata to a remote service for later analysis.
## Current performance properties
- Reads file system metadata from the main thread (stat, xattrs (SELinux, ..), POSIX ACLs)
- Reads files and hashes them across multiple processes (as many as core count) with the `multiprocessing` python module
- Maximizes disk I/O utilization successfully, the Python code's performance is not a bottleneck, the disk is (good sign)
## Desired performance properties
- Reduce memory usage
- Avoid storing all the collected data in memory at the same time
- encode and output JSON as the program runs (incompatible with tree-like data structure like now)
- discard data after output so that memory usage can be deterministic
- Beware of stack overflows
- Currently the file system traverse function is recursive, Python does not have tail recursion optimization so it potentially could overflow the stack. But due to file system paths being limited in size (is it always true? is it file system specific?), probably it's unlikely it ever will.