Almost all of our presentations and write-ups on the VMDK File Inclusion Vulnerability contained a slide stating something like
“we’re rather sure that DoS is possible as well ;-)”
including the following screenshot of the ESX purple screen of death:
So it seems like we still owe you that one — sorry for the delay! However the actual attack to trigger this purple screen was rather simple: Just include multiple VMDK raw files that cannot be aligned with 512 Byte blocks — e.g. several files of 512 * X + [0 < Y < 512] Bytes. Writing to a virtual hard drive composed of such single files for a short amount of time (typically one to three minutes, this is what we observed in our lab) triggered the purple screen on both ESXi4 and ESXi5 — at least for a patch level earlier than Releasebuild-515841/March 2012: it seems like this vulnerability was patched in Patch ESXi500-201203201-UG.
In our last series of posts regarding the VMDK file inclusion attack, we focused on read access and prerequisites for the attack, but avoided stating too much about potential write access. But as we promised to cover write access in the course of our future research, the following post will describe our latest research results.
First of all, the same prerequisites (which will be refined a little bit more later on) as for read access must be fulfilled and the same steps have to be performed in order to carry out the attack successfully. If that is the case, there are several POIs (Partitions Of Interest) on a ESXi hypervisor that are interesting to include:
/bootbank → contains several archives which build the hypervisors filesystem once they are unpacked
/altbootbank → backup of the prior version of bootbank, e.g. copied before a firmware update
/scratch → mainly log files and core dumps stored here
/vmfs/volumes/datastoreX → storage for virtual machine files
The root filesystem is stored on a ramdisk which is populated at boot time using the archives stored in the bootbank partition. As this means that there is no actual root partition (since it is generated dynamically at boot time and there is no such thing like a device descriptor for the ramdisk), this excludes the root file system from our attack tree, at least at first sight.
While trying to write to different partitions, we noticed that the writing sometimes fails. Evaluating the reason for the failure, we also noticed that this is only the case for certain partitions, such as /scratch. After monitoring the specific write process, we noticed the following errors:
attx kernel: [93.238762] sd 0:0:0:0: [sda] Unhandled sense code
attx kernel: [93.238767] sd 0:0:0:0: [sda] Result: hostbyte=invalid
attx kernel: [93.238771] sd 0:0:0:0: [sda] Sense Key : Data Protect [current]
attx kernel: [93.238776] sd 0:0:0:0: [sda] Add. Sense: No additional sense information
attx kernel: [93.238780] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 02 04 bd 20 00 00 08 00
attx kernel: [93.238790] end_request: critical target error, dev sda, sector 33864992
attx kernel: [93.239029] Buffer I/O error on device sda, logical block 4233124
attx kernel: [93.239181] lost page write due to I/O error on sda
cpu0:2157)WARNING: NMP: nmpDeviceTaskMgmt:2210:Attempt to issue lun reset on device
naa.600508b1001ca97740cc02561658c136. This will clear any
SCSI-2 reservations on the device.
cpu0:2157)<4>hpsa 0000:05:00.0: resetting device 6:0:0:1
cpu0:2157)<4>hpsa 0000:05:00.0: device is ready
Assuming that the hypervisor hard drive is not broken for exactly the cases we try to write from within a guest machine, we performed further tests (using various bash scripts and endless writing loops) and found out that this error occurs when the hypervisor and a guest machine are trying to write at the same time to the same device. Due to the fact that the hypervisor continuously writes log files to the /scratch file system and generates all kind of I/O to the /vmfs data stores (due to the running virtual machines stored on that partition), it was not possible to write to those devices in a reliable way.
Fortunately (at least from an attacker’s point of view 😉 ) the remaining partitions, /bootbank and /altbootbank, are only accessed at boot time and it is hence possible to write to those partitions in a reliable way. At that point, the initially mentioned fact about the lacking root partition gets important again: As the root partition would be the first and most promising target of write access, it would most probably also be locked by the hypervisor as there might also be different files that would be written on periodically. So when we were searching for a way to write to the dynamically generated root partition, we came up with the following steps:
Include the device holding the /bootbank partition.
Write to the /bootbank partition.
Wait for the hypervisor to reboot (or perform the potential DoS attack we identified, which will be described in a future post).
Enjoy the files from the /bootbank partition being populated to the dynamically created root partition.
The last step is of particular relevance. /bootbank holds several files that contain archives of system-critical files and directories of the ESXi hypervisor. For example the /bootbank/s.v00 contains an archive of the directories and files, including parts of /etc. The hypervisor restores the particular directories (such as /etc) at startup from the files stored in /bootbank. As we are able to write files to /bootbank, it is possible to replace contents of /bootbank/s.v00 and thus contents of /etc of the hypervisor ramdisk. In order to make sure that certain files in /etc are replaced, we can access the file /bootbank/boot.cfg which holds a list of archives which get extracted at boot time. As we have all necessary information to write to the root partition of the hypervisor, these are the steps to be performed:
Obtain /bootbank archive, in this example /bootbank/s.v00, using the well-known attack vector.
Convert/extract archive: The archives in /bootbank are packed with a special version of tar which is incompatible with the GNU tar. However this vmtar version can be ported to a GNU/Linux by copying the vmtar binary and libvmlibs.so from any ESXi installation.
Modify or add files.
Repack the archive.
Deploy the modified archive to /bootbank using the write access.
Following this generic process, we were able to install a backdoor on our ESXi5 hypervisor. In a first step, we opened a port in the ESXi firewall (which has a drop-all policy) as we wanted to deploy a bind shell (even though we could have used a connect-back shell instead, but we also want to demonstrate that is possible to modify system-critical settings). The firewall is configured by xml files stored in /etc/vmware/firewall. These xml files are built as follows:
The xml format is kind of self-explanatory. Every service has a unique identifier id, and can have inbound and outbound rules. To enable the rule on system boot the enabled field has to be set to true.
Based on this schema it is easy to deploy a new firewall rule. Simply place a new xml file to /etc/vmware/firewall in the archive which will be written to the bootbank later on.
This ruleset will be applied the next time the hypervisor reboots, after overwriting one of the archives in bootbank with our altered one.
The next step is to bind a shell to the opened port. Unfortunately the netcat installed per default on the hypervisor is not capable of the “-e” option, which executes a command after an established connection. The most basic netcat bind shell just listens on a port and forwards all input to the binary specified by the -e switch:
netcat -l -p 4444 -e /bin/sh.
Luckily the netcat version of 32bit BackTrack distribution is compatible with the ESXi platform and supports the -e switch. After copying this binary to the hypervisor, we just need to make sure that the backdoor is started during the boot process by adding the following line to /etc/rc.local:
/etc/netcat –l –p 4444 –e /bin/sh &
Next time the hypervisor starts, a remote shell will listen on port 4444 with root privileges. The following steps summarize the process of fully compromising the ESXi hypervisor:
Add our bind shell port to the firewall by adding the described file to /path/to/extracted/etc/vmware/firewall
Add the netcat binary to /path/to/extracted/etc/nc
Add the above line to /path/to/extracted/etc/rc.local
Wait for the next reboot of the hypervisor (or our post on the potential DoS 😉 )
At the end of the day, this means that once attacker is able to upload VMDK files to an ESXi environment (in a way that fulfills the stated requirements), it is possible to alter the configuration of the underlying hypervisor and even to install a backdoor which grants command line access.
As we are receiving a lot of questions about our VMDK has left the building post, we’re compiling this FAQ post — which will be updated as our research goes on.
How does the attack essentially work?
By bringing a specially crafted VMDK file into a VMware ESXi based virtualization environment. The specific attack path is described here.
What is a VMDK file?
A combination of two different types of VMDK files, the plain-text descriptor file containing meta data and the actual binary disk file, describes a VMware virtual hard disk. A detailed description can be found here.
Are the other similar file formats used in virtualization environments?
Yes, for example the following ones:
VDI (used by e.g. Xen, VirtualBox)
VHD (used by e.g. HyperV, VirtualBox)
QCOW (used by e.g. KVM)
Are those vulnerable too?
We don’t know yet and are working on it.
Which part of VMDKs files is responsible for the attack/exposure?
The so-called descriptor file, describing attributes and structure of the virtual disk (See here for a detailed description).
How is this to be modified for a successful attack?
The descriptor file contains paths to filenames which, combined, resemble the actual disk. This path must be modified so that a file on the hypervisor is included (See here for a detailed description).
How would you call this type of attack?
In reference to web hacking vulnerabilities, we would call it a local file inclusion attack.
What is, in your opinion, the root cause for this vulnerability?
More details can be found in a whitepaper to be published soon. Furthermore we will provide a demo with a simplified cloud provider like lab (including, amongst others, an FTP interface to upload files and a web interface to start machines) at upcoming conferences.
Do you need system/root access to the hypervisor in order to successfully carry out the attack?
No. All necessary information can be gathered during the attack.
What is the potential impact of a successful attack?
Read access to the physical hard drives of the hypervisor and thus access to all data/virtual machines on the hypervisor. We’re still researching on the write access.
Which platforms are vulnerable?
As of our current state of research, we can perform the complete attack path exclusively against the ESXi5 and ESXi4 hypervisors.
In case vCloud Director is used for customer access, are these platforms still vulnerable?
To our current knowledge, no. But our research on that is still in progress.
Are OVF uploads/other virtual disk formats vulnerable?
Our research onOVFis still in progress. At the moment,we cannot make a substantiated statement about that.
Is AWS/$MAJOR_CLOUD_PROVIDER vulnerable?
Since we did not perform any in the wild testing, we don’t know this yet. However, we have been contacted by cloud providers in order to discuss the described attack.
Given AWS does not run VMware anyways they will most probably not be vulnerable.
Is it necessary to start the virtual machine in a special way/using a special/uncommon API?
Which VMware products are affected?
At the moment, we can only confirm the vulnerability for the ESXi5 and ESXi4 hypervisors. Still, our research is going on 😉
As announced at last week’s #HITB2012AMS, I’ll describe the fuzzing steps which were performed during our initial research. The very first step was the definition of the interfaces we wanted to test. We decided to go with the plain text VMDK file, as this is the main virtual disk description file and in most deployment scenarios user controlled, and the data part of a special kind of VMDK files, the Host Sparse Extends.
The used fuzzing toolkit is dizzy which just got an update last week (which brings you guys closer to trunk state 😉 ).
The main VMDK file goes straight forward, fuzzing wise. Here is a short sample file:
The first field, descr_comment, and the second field, version_str, are plain static, as defined by the last parameter, so they wont get mutated. The first actual fuzzed string is the version field, which got a default value of the string 1 and will be mutated with all strings in your fuzz library.
As the attentive reader might have noticed, this is just the first attempt, as there is one but special inconsistency in the example file above: The quoting. Some values are Quoted, some are not. A good fuzzing script would try to play with exactly this inconsistency. Is it possible to set version to a string? Could one set the encoding to an integer value?
The second file we tried to fuzz was the Host Sparse Extend, a data file which is not plain data as the Flat Extends, but got a binary file header. This header is parsed by the ESX host and, as included in the data file, might be user defined. The definition from VMware is the following:
Interesting header fields are all C strings (think about NULL termination) and of course the gdOffset in combination with numSectors and grainSize, as manipulating this values could lead the ESX host to access data outside of the user deployed data file.
So far so good, after writing the fuzzing scripts one needs to create a lot of VMDK files. This was done using dizzy:
Last but not least we needed to automate the deployment of the generated VMDK files. This was done with a simple shell script on the ESX host, using vim-cmd, a command line tool to administrate virtual machines.
By now the main fuzzing is still running in our lab, so no big results on that front, yet. Feel free to use the provided fuzzing scripts in your own lab. Find the two fuzzing scripts here and here. We will share more results, when the fuzzing is finished.