idmap: Don't skip 1-99 hole in IDs
Instead remap user IDs that we have to continous [0, ...] IDs in the target namespace. We need to be able to use e.g. tty group from inside (gid=5), because glibc.openpty wants to chwon files in /dev/pts to that group. See next patch for /dev/pts setup.
-
@kirr This is interesting. I want to ask an unrelated question, because I also debugged similar issues without so much success in understanding what was happening under the scenes: How did you find out that it was glibc.openpty doing the chown ? is it possible to get a backtrace with gdb (or similar tracing tool) or did you just read the source code ?
-
Maintainer
@jerome, thanks for asking. I knew it because I debugged this problem some time ago. The way to debug was:
- the program fails
- I run it under
strace
to see what it is trying to do - there I saw some strange chown and before that access to /etc/groups
- there is also
ltrace
which shows which dynamic functions are called (e.g. not syscalls but something in glibc). I don't recall whether it showed it was openpty or not. - then I was seeing openpty source code and found that place in the sources where the code was trying to do the chown.
Here is related exceprt from jabber exchange with @alain.takoudjou:
|2017-06-05T18:26:08|1|to|N---|I've compared the strace with another one |2017-06-05T18:26:16|1|to|N---|where I ssh into my another webrunner |2017-06-05T18:26:30|1|to|N---|it all look the same till stat(pts) |2017-06-05T18:26:41|1|to|N---|and on my intance then there is no chown |2017-06-05T18:26:50|1|to|N---|and on yours chown is invoked |2017-06-05T18:26:55|1|to|N---|I will read a bit about pts |2017-06-05T18:26:56|1|to|N---|1s |2017-06-05T18:27:27|1|from|N---|ok |2017-06-05T18:31:51|1|to|N---|so seems I've found |2017-06-05T18:32:06|1|to|N---|https://linux.die.net/man/4/ptmx |2017-06-05T18:32:13|1|to|N---|Before opening the pseudoterminal slave, you must pass the master's file descriptor to grantpt(3) and unlockpt(3) |2017-06-05T18:32:18|1|to|N---|now let's see grantpt: |2017-06-05T18:33:03|1|from|N---|https://linux.die.net/man/4/ptmx > I read. |2017-06-05T18:33:12|1|to|N---|https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/grantpt.c;h=a73020e69a70f07a57cefc277c0f051a488a3290;hb=HEAD#l130 |2017-06-05T18:33:20|1|to|N---| 159 /* Make sure the group of the device is that special group. */\n 160 if (st.st_gid != gid)\n 161 {\n 162 if (__chown (buf, uid, gid) < 0)\n 163 goto helper;\n 164 }\n |2017-06-05T18:33:40|1|to|N---|-> thus we should check what is the git the sshd process is running under |2017-06-05T18:33:44|1|to|N---|I wait till you read |2017-06-05T18:33:47|1|to|N---|let me know when done |2017-06-05T18:36:28|1|from|N---|Thanks I finished to read |2017-06-05T18:36:42|1|to|N---|so do you see my idea why it happens? |2017-06-05T18:37:41|1|from|N---|it still not clear for me yet |2017-06-05T18:37:53|1|to|N---|in the glibc sources we have: |2017-06-05T18:38:02|1|to|N---| 130 /* Make sure that we own the device. */\n 131 uid_t uid = __getuid ();\n 132 if (st.st_uid != uid)\n 133 {\n 134 if (__chown (buf, uid, st.st_gid) < 0)\n 135 goto helper;\n 136 }\n |2017-06-05T18:38:03|1|to|N---|and |2017-06-05T18:38:15|1|to|N---| 159 /* Make sure the group of the device is that special group. */\n 160 if (st.st_gid != gid)\n 161 {\n 162 if (__chown (buf, uid, gid) < 0)\n 163 goto helper;\n 164 }\n |2017-06-05T18:38:49|1|to|N---|so grantpt checks that uid and gid of the running process are "expected" and if not invokes the chown |2017-06-05T18:39:04|1|to|N---|now we have to figure out what "expected" means and compare it to the sshd process running |2017-06-05T18:39:14|1|to|N---|on problematic instance and on a working instance
if running under gdb is possible, then yes, we can add a breakpoint on chown, and when it gets to chown with interesting arguments - see the whole backtrace.
Also nowdays I think one can add a dynamic tracepoint to chown and get the user stacks when the program hits it with BPF. See e.g. here:
https://github.com/iovisor/bcc
https://github.com/iovisor/bcc/issues/1070
...(there should be a ready utility for that, probably it is https://github.com/iovisor/bcc/blob/master/tools/trace.py, but I do not recall the details offhand)
Today, I would say that BPF tracing is the most efficient way to understand. For example I used it to understand how kernel behaves for processing networked packets, and why/from where latencies come from:
https://lab.nexedi.com/kirr/bcc/blob/43cfc13b2f759f1771d97b401d40fb9d05015937/tools/pinglat.py
bcc@0f4b6d6d
bcc@0f3e7237
...Hope this helps a bit,
Kirill -
Thanks a lot @kirr . I usually use
strace
and then try to guess :)I remember watching this video http://www.brendangregg.com/blog/2016-12-27/linux-tracing-in-15-minutes.html . It was a nice introduction on this (but a bit fast)
-
let's also add this introduction https://jvns.ca/blog/2017/07/05/linux-tracing-systems/ to the list
-
Maintainer
@jerome, thanks for feedback. ex. Sun / DTrace Brendan Gregg indeed has many good materials on his page regarding Linux tracing: