Bugs Found During My Initial Explorations of the SEV Firmware
This blog post discusses two recently disclosed low-severity vulnerabilities in the SEV firmware. Both vulnerabilities were published in AMD-SB-3007.
CVE-2023-31346
AMD SEV-SNP guests can request services from the SEV firmware by exchanging encrypted messages. The content and layout of the request and response messages are defined in the SEV Secure Nested Paging Firmware ABI Specification in chapter 7.
One of these services is CPUID reporting. CPUID data contains important information about the processor, its supported features, its limits, and the positions of special address bits among other things. Unfortunately, a guest can’t get this data itself, so it has to ask the host. A SEV-SNP guest doesn’t want to trust the host though, so it can use the CPUID reporting service to check whether the untrusted, host-supplied CPUID data is correct. The guest writes some values for some CPUID leaves into the request and sends them to the SEV firmware. The SEV firmware reads those values and checks them. If any of the values are not within the bounds defined in the processor’s CPUID Policy, the values are corrected. All entries as well as a success indicator are sent back to the guest in the response message.
What’s interesting about CPUID reporting requests and responses is that they’re not fixed in size. The requests contain a COUNT
field specifying how many CPUID leaves the guest has placed in the request so the sizes of the requests and responses vary accordingly. The spec caps COUNT
at 64
(COUNT_MAX
) and so also places an upper bound on the size of requests and responses. This is convenient for the firmware because it can avoid dynamically sized types and represent a requests and responses as fixed size structs called snp_msg_cpuid_req_t
and snp_msg_cpuid_rsp_t
. The firmware uses the size of the response struct when zeroing out the response before filling it with data.
While it would be possible to derive the size of the requests and responses from the COUNT
field, this is not the case when they’re encrypted. To solve this the encrypted messages are preceded by an unencrypted header that contains the size of the message among other information.
Because the size of the response varies depending on how many CPUID leaves it contains, the firmware needs to calculate its size at runtime. Fortunately, this is very simple: It turns out that for CPUID reporting the layout of requests and responses is exactly the same, so to make things simple the SEV firmware simply uses the size of the encrypted request as the response’s size. While this is mostly fine, there’s one edge case the firmware didn’t account for: It’s possible for a malicious guest to send encrypted CPUID reporting requests that are larger than the snp_msg_cpuid_req_t
. On first thought, this doesn’t seem like a problem: Regardless of the size of the message, the COUNT
field is still capped at 64
so the firmware will just ignore the extra data. The problem is that the size is also used for the response and so suddenly the firmware reads more data from the response buffer than expected. In more concrete terms any CPUID leaves past COUNT_MAX
are not zeroed when the response is initialized and contains uninitialized data in the firmware’s memory that is leaked as part of the response.
Fortunately, the leaked memory is unlikely to contain sensitive information because the SEV firmware always zeroes out sensitive information when it’s no longer needed. AFAICT this bug can only be used to leak non-sensitive data like launch pages of SEV-ES guests. For this reason, I’d classify this as a low-severity vulnerability.
There are two more variations of this bug:
- It’s also possible to send a smaller request than would be expected for a given
COUNT
. In that case, the firmware would interpret uninitialized data as CPUID leaves and try to check them. This would leak the same data as the main variant but is likely harder to exploit. - The header for encrypted response messages contains some reserved fields. These reserved fields are never initialized and so also leak memory when they are sent back to the guest.
These bugs should be fixed by either properly initializing all response fields and/or checking the size of requests or restricting the size of responses.
Proof of Concept code for this bug is available on GitHub.
CVE-2023-31347
CPUs have internal clocks that can be read out by programs. On x86 the timestamp is stored in the internal TSC
register. It can be read out using the rdtsc
or rdtscp
instructions. If a hypervisor wants to change a guest’s perception of time, it can set some fields in the VM’s control block to apply a scaling factor and offset to the guest’s view of the TSC
register.
In SEV-SNP’s threat model, the hypervisor is untrusted, and so ideally it shouldn’t be able to change a guest’s view of time. This is implemented using a SEV-SNP feature called SecureTSC. If SecureTSC is enabled, the hypervisor can set the TSC scaling factor once when the guest is launched, but crucially can never change it again after the guest has been launched. The offset will always be 0
.
SEV-SNP guests can be live-migrated to other hosts. This is very cool, but this opens up a problem: What if the old and new hosts have different TSC frequencies? If the guest was just migrated over without any changes, this would change its perception of time because the TSC frequency would suddenly change. To solve this, the SEV-SNP firmware adjusts the scaling factor such that the TSC frequency stays the same. Lastly, the TSC scaling factor is only adjusted if the SecureTSC feature is enabled in the guest.
Enough theory, let’s talk about how things are actually implemented: Both the set of enabled SEV-SNP features and TSC scaling factor are stored in a structure called the VMCB Save Area (VMSA). This region is encrypted and integrity-protected and contains state like registers, so it’s only natural for it to contain the set of features and the TSC scaling factor as well. So to check whether the SEV firmware needs to adjust the TSC scaling factor, it needs to decrypt the VMSA and check the enabled features. If SecureTSC, it can then update the scaling factor, before re-encrypting the changed VMSA. This is likely what the SEV firmware was meant to to do, but it’s not what it’s actually doing: A mistake was made and the set of enabled features is checked before the VMSA is decrypted and so the check can’t possibly be correct. AFAICT the buffer that’s checked for the SecureTSC feature is never initialized when the migration is happening and so the check is done against stale data. The result of all of this is that the scaling factor may not be adjusted like expected when a SEV-SNP guest is migrated to another host.
I’m not aware of anyone actually using SecureTSC (the Linux patches for enabling SecureTSC haven’t been upstreamed yet) and I’m also not aware of an exploit that requires messing with a guest’s perception of time, and so given the lack of potential for abuse, I’d classify as a low-severity vulnerability.
On the Publishing of the SEV Firmware
IMHO publishing the source code of the SEV firmware was a good decision.
As a security researcher, I want to not just have to trust AMD blindly, but actually check for myself that the SEV firmware is working correctly. I found and reported these two bugs shortly after the SEV firmware was published at the end of August 2023, but that’s not the whole story: While writing code for mushroom, I first noticed that some reserved fields are not zero and worked around that even in mushroom’s first public commit. The oldest reference to this workaround I was able to find in my code was from February 19, 2023, while mushroom was still private. The problem is that at the time I didn’t understand what was going on. I saw the non-zero reserved fields, but just assumed that they’d contain undocumented values and didn’t realize that they were leaking uninitialized memory from the SEV firmware. Only after the SEV firmware was published did I notice that this wasn’t intentional and was able to report this.
As a user of the SEV firmware, I hope that the increased visibility of the SEV firmware will lead to more bugs being fixed, and hope to benefit from the increased security. Having access to the source code is also invaluable when debugging code that interacts with the firmware and explaining unexpected behavior.