I have been asked recently about a design issue with HSMs in Kubernetes infrastructure (and cloud application in general): if an HSM contains all the keys, how can a pod authenticates to the HSM, without having itself a secret or private key? another common one is what’s the point of having an HSM in the cloud if we end up trusting the provider for key management? This article aims to shed some light on this issue.
A Hardware Security Module is a piece of specialized equipment dedicated to the management, storage, and exploitation of cryptographic secrets. The functions of an HSM can be easily performed by standard computers, using common cryptographic libraries like OpenSSL, so functionality is not a driver to use an HSM. performance is not the reason neither, as a mid-range server can perform more cryptographic operations and even faster than most HSMs.
The real added value of an HSM is to fulfill a fundamental security objective: the non-extractability of secret cryptographic material. A cryptographic key is said to be non-extractable if it is technically impossible to get a copy of it. It means that the only option for an attacker to abuse that key is to have physical or network access to the HSM.
To achieve non-extractability, HSM manufacturers combine several security mechanisms:
HSMs also provide secure audit logs of all actions. In the event of an attack, this provides a better view of the extent of the compromise. Let’s compare a situation of suspicion of compromise of a signing key:
HSMs can be fitted in a server expansion slot, but most of them are available today in the form of a network appliance, which is very appropriate for data centers. They can be accessed from applications using appropriate libraries provided by the manufacturer, typically with a PKCS#11 interface or in the form of an OpenSSL engine.
As noted earlier, there is no difficulty to perform cryptographic functions in a traditional server. Modern CPUs even embed hardware acceleration for AES, the infamous cryptographic algorithm. In many cases, we can process ephemeral keys only in RAM and for the duration of a session. The issue arises when there is a need to manipulate a long-term secret, like a signature key used for authentication for instance.
To benefit from non-extractability, we would like to store such a key in a network HSM, but any machine having access to the HSM could authenticate with that key. How can we control the access to that key so that only legitimate pods are authorized? The access control can be performed either by the network equipment or by the HSM. Let’s consider both approaches.
The network equipment can filter packets to limit access to specific IP addresses or ports. Practically, the sole criterion to distinguish the applicant is the source IP address. To distinguish which key is requested, the destination port is a good match. But in a Kubernetes environment and most modern cloud architecture, the IP address of the pod is dynamic, so this is not a straightforward solution.
The HSM can perform similar network filtering as the network equipment, but with the same shortcomings. It can also authenticate the client at the session or application level. However, this requires a long-term secret on the client side. But avoiding that situation was the whole point of using an HSM, thus the dilemma.
The naked truth is that we need to trust the Kubernetes cluster to solve this chicken and egg problem. Here are a few possibilities:
HSMs are no silver bullet, and the administrators of the Kubernetes cluster and infrastructure still need to be trusted. HSMs facilitate monitoring of sensitive key usage, significantly reduce the pitfalls of manipulating secrets for developers and operators, and help to recover more quickly in case of compromise. So clearly HSM technology makes sense in any cloud infrastructure running sensitive applications, especially for regulated industries.
In my personal experience, in most case a software-based solution for secret management such as the infamous Vault (Hashicorp) is largely enough, and where we can focus on securing the master key (using an HSM for instance if required by regulation).
References