Before I begin, there are other questions that mention that ls -l produces questions marks but those are due to permissions issue. This question is different. Hopefully!
I have a decommissioned Docker host:
- Kernel 3.10
- docker 18.06
- glibc 2.17
- libseccomp 2.3.1
- coreutils 8.22
I have a SLES 15 docker image
- glibc 2.31
- coreutils 8.32
I start the container using docker run -it --rm -u root <docker-image> bash
The home directory I land in has a bin directory which I can see using ls but if I use ls -l I get a lot of question marks.
$ ls
bin
$ ls -l
ls: cannot access 'bin': Operation not permitted
total 0
d????????? ? ? ? ? ? bin
From my research, ls in coreutils 8.32 onwards uses statx syscall. statx was added to Linux in kernel 4.11; library support was added in glibc 2.28. I thought this explains the output of the ls -l command - Docker uses the host's kernel and the host's kernel is 3.10 which doesn't have statx implemented.
When I start the Docker container without any seccomp profile, ls -l works fine!
docker run -it --rm --security-opt seccomp=unconfined -u root <docker-image> bash
$ ls
bin
$ ls -l
total 0
drwxr-xr-x 2 abcuser abcuser 6 Jul 4 2022 bin
Now it appears that this is not really the kernel nor the statx support but this is due to the seccomp profile. However, statx was whitelisted in Docker 18.04 and the host in my example is running 18.06.
I did read a commit message somewhere (forgot to save the link) that said the ls implementation defaults to stat if statx is not available. If so, ls -l should have worked with the default seccomp profile.
Can anyone explain why ls -l doesn't work with the default seccomp profile? Also, can anyone explain how ls -l works without a seccomp profile when the underlying kernel doesn't have statx implemented?
I do have strace captured. Parts that are of interest is below.
Strace with the default seccomp profile:
statx(AT_FDCWD, "bin", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_UID|STATX_GID|STATX_MTIME|STATX_SIZE, 0x7ffcb567a4f0) = -1 ENOSYS (Function not implemented)
ls: write(2, "ls: ", 4) = -1 ENOSYS (Function not implemented)
cannot access 'bin'write(2, "cannot access 'bin'", 19) = -1 ENOSYS (Function not implemented)
: Operation not permittedwrite(2, ": Operation not permitted", 25) = -1 ENOSYS (Function not implemented)
write(2, "\n", 1) = -1 ENOSYS (Function not implemented)
getdents64(3, 0x560b1d8ff920, 32768) = -1 ENOSYS (Function not implemented)
close(3) = -1 ENOSYS (Function not implemented)
fstat(1, 0x7ffcb567a890) = -1 ENOSYS (Function not implemented)
total 0
write(1, "total 0\n", 8) = -1 ENOSYS (Function not implemented)
openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = -1 ENOSYS (Function not implemented)
d????????? ? ? ? ? ? bin
write(1, "d????????? ? ? ? ? ? "..., 36) = -1 ENOSYS (Function not implemented)
close(1) = -1 ENOSYS (Function not implemented)
close(2) = -1 ENOSYS (Function not implemented)
Strace without any seccomp profile:
statx(AT_FDCWD, "bin", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_UID|STATX_GID|STATX_MTIME|STATX_SIZE, 0x7ffec5a21b10) = -1 ENOSYS (Function not implemented)
newfstatat(AT_FDCWD, "bin", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
lgetxattr("bin", "security.selinux", 0x55d9b494d930, 255) = -1 ENODATA (No data available)
getxattr("bin", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available)
...
<I can see a lot more calls including calls to stat multiple times but I have cut it short. >
...
As you can see after the statx call, the next call is different. If this is indeed a problem with the seccomp profile not whitelisting statx, is there a way to find out what syscalls are whitelisted by running any specific command on the docker host or the container? I do not have any custom seccomp profiles files so I'm using the default profile.