The second example policy from the PodSecurityPolicy documentation consists of the following PodSecurityPolicy snippet
...
spec:
privileged: false
# Required to prevent escalations to root.
allowPrivilegeEscalation: false
# This is redundant with non-root + disallow privilege escalation,
# but we can provide it for defense in depth.
requiredDropCapabilities:
- ALL
...
Why is dropping all capabilities redundant for non-root + disallow privilege escalation? You can have a container process without privilege escalation that is non-root but has effective capabilities right?
It seems like this is not possible with Docker:
$ docker run --cap-add SYS_ADMIN --user 1000 ubuntu grep Cap /proc/self/status
CapInh: 00000000a82425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a82425fb
CapAmb: 0000000000000000
All effective capabilities have been dropped even when trying to explicitly add them. But other container runtimes could implement it, so is this comment just Docker specific?
Because you need privilege escalation to be able to use 'new' capabilities, an effectively
allowPrivilegeEscalation: falseis disabling setuid in the execve system call that prevents the use of any new capabilities.Also as shown in the docs: "Once the bit is set, it is inherited across fork, clone, and execve and cannot be unset". More info here.
This in combination with
privileged: falserendersrequiredDropCapabilities: [ALL]redundant.The equivalent Docker options here are:
--user=whatever=>privileged: false--security-opt=no-new-privileges=>allowPrivilegeEscalation: false--cap-drop=all=>requiredDropCapabilities: [ALL]That's what looks like Docker is doing, the moment you specify a non-privileged user all of the effective capabilities are dropped (
CapEff: 0000000000000000), even if you specify--cap-add SYS_ADMINThis combined with the
--security-opt=no-new-privilegesas an option renders--cap-drop=allredundant.Note that it seems like the default capability mask for docker includes
SYS_ADMINWhich would make sense why the
00000000a82425fbis the same without specifying any--cap-addoption.I suppose, so you could have a case where
privileged: falseandallowPrivilegeEscalation: falsenot effectively disabling capabilities and that could be dropped withrequiredDropCapabilities:(Although, I don't see why another runtime would want to change the Docker behavior).