The second example policy from the PodSecurityPolicy documentation consists of the following PodSecurityPolicy snippet
...
spec:
privileged: false
# Required to prevent escalations to root.
allowPrivilegeEscalation: false
# This is redundant with non-root + disallow privilege escalation,
# but we can provide it for defense in depth.
requiredDropCapabilities:
- ALL
...
Why is dropping all capabilities redundant for non-root + disallow privilege escalation? You can have a container process without privilege escalation that is non-root but has effective capabilities right?
It seems like this is not possible with Docker:
$ docker run --cap-add SYS_ADMIN --user 1000 ubuntu grep Cap /proc/self/status
CapInh: 00000000a82425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a82425fb
CapAmb: 0000000000000000
All effective capabilities have been dropped even when trying to explicitly add them. But other container runtimes could implement it, so is this comment just Docker specific?
Because you need privilege escalation to be able to use 'new' capabilities, an effectively
allowPrivilegeEscalation: false
is disabling setuid in the execve system call that prevents the use of any new capabilities.Also as shown in the docs: "Once the bit is set, it is inherited across fork, clone, and execve and cannot be unset". More info here.
This in combination with
privileged: false
rendersrequiredDropCapabilities: [ALL]
redundant.The equivalent Docker options here are:
--user=whatever
=>privileged: false
--security-opt=no-new-privileges
=>allowPrivilegeEscalation: false
--cap-drop=all
=>requiredDropCapabilities: [ALL]
That's what looks like Docker is doing, the moment you specify a non-privileged user all of the effective capabilities are dropped (
CapEff: 0000000000000000
), even if you specify--cap-add SYS_ADMIN
This combined with the
--security-opt=no-new-privileges
as an option renders--cap-drop=all
redundant.Note that it seems like the default capability mask for docker includes
SYS_ADMIN
Which would make sense why the
00000000a82425fb
is the same without specifying any--cap-add
option.I suppose, so you could have a case where
privileged: false
andallowPrivilegeEscalation: false
not effectively disabling capabilities and that could be dropped withrequiredDropCapabilities:
(Although, I don't see why another runtime would want to change the Docker behavior).