Root Cause Analysis of a Heap-Based Buffer Overflow in GNU Readline

Tillmann Oßwald

In the last blog post, we discussed how fuzzers determine the uniqueness of a crash. In this blog post, we discuss how we can manually triage a crash and determine the root cause. As an example, we use a heap-based buffer overflow I found in GNU readline 8.1 rc2, which has been fixed in the newest release. We use GDB and rr for time-travel debugging to determine the root cause of the bug.

We can download the source code of GNU readline as a tar.gz file from here.

We then want to build it.

$ ./configure --enable-shared=no
make all

I adapted one of the examples to be even more straightforward.

$ cat examples/rlbasic.c
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

#if defined (READLINE_LIBRARY)
#  include "readline.h"
#  include "history.h"
#else
#  include <readline/readline.h>
#  include <readline/history.h>
#endif

int
main (int c, char **v)
{
    char * buf = readline("");
    if (buf != 0) {
        puts(buf);
        free(buf);
    }
}

We also need to build the examples now.

$ cd examples
$ make all
$ echo test | ./rlbasic
test
test

This setup (with the respective instrumentation) was used for fuzzing and now is used to triage one of the crashes we found.

After some time of fuzzing, honggfuzz reported the first crashes. As mentioned in the previous blog post, some information is already embedded in the filenames of the crashes. Here are some of the files that honggfuzz created.

'SIGABRT.PC.7ffff7c03615.STACK.19d36d1d13.CODE.-6.ADDR.0.INSTR.mov____0x108(%rsp),%rax.fuzz'
'SIGABRT.PC.7ffff7c03615.STACK.eb563da6d.CODE.-6.ADDR.0.INSTR.mov____0x108(%rsp),%rax.fuzz'
'SIGABRT.PC.7ffff7c03615.STACK.ec136d3ea.CODE.-6.ADDR.0.INSTR.mov____0x108(%rsp),%rax.fuzz'
'SIGABRT.PC.7ffff7c03615.STACK.f43c4c1ee.CODE.-6.ADDR.0.INSTR.mov____0x108(%rsp),%rax.fuzz'

We can see that the signal that was sent is “abort” in the first part of the filename. Furthermore, we see that the program counter of all crashes is the same (0x7ffff7c03615). Both issues indicate a heap-related issue (SIGABRT) and potentially the same type of issue, like heap-based buffer-overflow or double-free (same PC).

To gather some initial details about the crash, we use Valgrind to determine what happened.

$ valgrind --tool=memcheck ./rlbasic > /dev/null < SIGABRT.PC.7ffff7c03615.STACK.f43c4c1ee.CODE.-6.ADDR.0.INSTR.mov____0x108\(%rsp\),%rax.fuzz
==510271== Memcheck, a memory error detector
==510271== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==510271== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==510271== Command: ./rlbasic
==510271==
==510271== Invalid write of size 1
==510271==    at 0x483DF68: strncpy (vg_replace_strmem.c:550)
==510271==    by 0x13221D: rl_insert_text (text.c:99)
==510271==    by 0x1325F9: _rl_insert_char.part.0 (text.c:902)
==510271==    by 0x133B8D: _rl_insert_char (text.c:720)
==510271==    by 0x133B8D: rl_insert (text.c:965)
==510271==    by 0x112FFA: _rl_dispatch_subseq (readline.c:887)
==510271==    by 0x1135AF: _rl_dispatch (readline.c:833)
==510271==    by 0x1135AF: readline_internal_char (readline.c:645)
==510271==    by 0x113F2C: readline_internal_charloop (readline.c:694)
==510271==    by 0x113F2C: readline_internal (readline.c:706)
==510271==    by 0x113F2C: readline (readline.c:385)
==510271==    by 0x11262F: main (rlbasic.c:17)
==510271==  Address 0x4bb2c20 is 0 bytes after a block of size 13,568 alloc'd
[...]
==510271== Invalid read of size 8
==510271==    at 0x12EBC8: _rl_free_undo_list (undo.c:108)
==510271==    by 0x12EBC8: rl_free_undo_list (undo.c:124)
==510271==    by 0x112B4D: readline_internal_teardown (readline.c:498)
==510271==    by 0x113F43: readline_internal (readline.c:707)
==510271==    by 0x113F43: readline (readline.c:385)
==510271==    by 0x11262F: main (rlbasic.c:17)
==510271==  Address 0x3737373737373737 is not stack'd, malloc'd or (recently) free'd
==510271==
==510271==
==510271== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==510271==  General Protection Fault
==510271==    at 0x12EBC8: _rl_free_undo_list (undo.c:108)
==510271==    by 0x12EBC8: rl_free_undo_list (undo.c:124)
==510271==    by 0x112B4D: readline_internal_teardown (readline.c:498)
==510271==    by 0x113F43: readline_internal (readline.c:707)
==510271==    by 0x113F43: readline (readline.c:385)
==510271==    by 0x11262F: main (rlbasic.c:17)
==510271==
==510271== HEAP SUMMARY:
==510271==     in use at exit: 382,054 bytes in 898 blocks
==510271==   total heap usage: 4,421 allocs, 3,523 frees, 1,342,095 bytes allocated
[…]

This excerpt shows us two interesting things. The first part shows there is a heap overflow in rl_insert_text.c:99, and the second part shows that we have control over data (0x3737373737373737 or “77777777”) that is potentially being freed. It should be noted that the results from Valgrind and other tools like “Dr Memory” and the glibc error message might vary, depending on how they implement heap-related checks.

rr, which stands for record and replay, is an awesome enhancement to GDB. It does two things; it allows you to record the execution of a binary and replay that execution at a later point. The replay will always be deterministic. Furthermore, the replay can be reversed. Instead of only stepping forward in the execution flow, you can now step backward; this is sometimes called time-traveling debugging. In the first step, we record the execution.

$ rr record -n ./rlbasic < SIGABRT.PC.7ffff7c03615.STACK.f43c4c1ee.CODE.-6.ADDR.0.INSTR.mov____0x108\(%rsp\),%rax.fuzz
rr: Saving execution to trace directory `/[…]/rlbasic-6

Now we can replay the execution. I am using GEF as a gdbinit script. It enhances GDB with some additional commands and convenience features. However, rr can be used without any additional gdbinit script.

rr replay /home/till/.local/share/rr/rlbasic-6
gef➤ continue
[…]
───────────────────────────────────────────────────────────── threads ────
[#0] Id 1, stopped 0x7f0e82eaa615 in raise (), reason: SIGABRT
─────────────────────────────────────────────────────────────── trace ────
[#0] 0x7f0e82eaa615 → raise()
[#1] 0x7f0e82e93862 → abort()
[#2] 0x7f0e82eec5e8 → __libc_message()
[#3] 0x7f0e82ef427a → malloc_printerr()
[#4] 0x7f0e82ef47ec → mremap_chunk()
[#5] 0x7f0e82ef9670 → realloc()
[#6] 0x5584b8275f2e → xrealloc(pointer=<optimized out>, bytes=0x8000)
[#7] 0x5584b826017c → realloc_line(minsize=<optimized out>)
[#8] 0x5584b82651ab → invis_addc(face=0x30, c=0x5e, outp=<synthetic pointer>)
[#9] 0x5584b82651ab → rl_redisplay()

We observe the SIGABRT in GDB. The backtrace shows realloc was called. We now place a breakpoint at realloc and then resume execution in reverse order, with the reverse-continue command.

gef➤  break realloc
Breakpoint 1 at 0x7f0e82ef9580
gef➤ reverse-continue

Soon we hit the breakpoint. Let us recall realloc and its arguments:

void *realloc(void *ptr, size_t size);

Next, we check how parameters are passed by taking a look at the calling conventions. Ptr will be in RDI, and the size parameter is passed in RSI.

gef➤  info registers
[…]
rdi        0x5584b91541f0      0x5584b91541f0

Now we know which chunk is passed. We can examine it with GEF. This is one of the many reasons I like to use a gdbinit script like GEF.

gef➤  heap chunk 0x5584b91541f0
Chunk(addr=0x5584b91541f0, size=0x3737373737373730, flags=PREV_INUSE|IS_MMAPPED|NON_MAIN_ARENA)
Chunk size: 3978709506094217008 (0x3737373737373730)
Usable size: 3978709506094216992 (0x3737373737373720)
Previous chunk size: 3978709506094217015 (0x3737373737373737)
PREV_INUSE flag: On
IS_MMAPPED flag: On
NON_MAIN_ARENA flag: On

However, we can also investigate the chunk by hand. A nice diagram showing the layout of a heap chunk can be found here.

The prev_size is located before the actual data, which is also where our pointer is pointing to.

gef➤  x/2x 0x00005584b91541f0 - 8
0x5584b91541e8:            0x37373737        0x37373737

This looks a lot like the information about the prev_size we received from GEF. Since the value written is 0x3737373737373737 or “77777777” this indicates a heap-based buffer overflow of the previous chunk. Now we want to find out where this data was written. We place a watchpoint at prev_size and reverse-continue until we find out where the data was written.

gef➤  watch *0x5584b91541e8
Hardware watchpoint 2: *0x5584b91541e8
gef➤  info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y
0x00007f0e82ef9580 <realloc>
               breakpoint already hit 2 times
2       hw watchpoint  keep y                      *0x5584b91541e8
gef➤  disable 1
gef➤ reverse-continue
[…]
[#0] Id 1, stopped 0x7f0e82fd1933 in __strncpy_avx2 (), reason: BREAKPOINT
─────────────────────────────────────────────────────────────── trace ────
[#0] 0x7f0e82fd1933 → __strncpy_avx2()
[#1] 0x5584b826d21e → rl_insert_text(string=0x7ffc188192a0 "7")
[#2] 0x5584b826d5fa → _rl_insert_char(count=0x1, c=0x37)
[#3] 0x5584b826eb8e → _rl_insert_char(c=<optimized out>, count=0x1)
[#4] 0x5584b826eb8e → rl_insert(count=<optimized out>, c=<optimized out>)
[#5] 0x5584b824dffb → _rl_dispatch_subseq(key=<optimized out>, map=0x5584b8287420 <emacs_standard_keymap>, got_subseq=<optimized out>)
[#6] 0x5584b824e5b0 → _rl_dispatch(map=<optimized out>, key=<optimized out>)
[#7] 0x5584b824e5b0 → readline_internal_char()
[#8] 0x5584b824ef2d → readline_internal_charloop()
[#9] 0x5584b824ef2d → readline_internal()
──────────────────────────────────────────────────────────────────────────
gef➤  x/x 0x5584b91541f0-8
0x5584b91541e8:            0x00373737

And we repeat the process once more just to be sure.

gef➤ reverse-continue
[…]
───────────────────────────────────────────────────────────── threads ────
[#0] Id 1, stopped 0x7f0e82fd1933 in __strncpy_avx2 (), reason: BREAKPOINT
─────────────────────────────────────────────────────────────── trace ────
[#0] 0x7f0e82fd1933 → __strncpy_avx2()
[#1] 0x5584b826d21e → rl_insert_text(string=0x7ffc188192a0 "7")
[#2] 0x5584b826d5fa → _rl_insert_char(count=0x1, c=0x37)
[#3] 0x5584b826eb8e → _rl_insert_char(c=<optimized out>, count=0x1)
[#4] 0x5584b826eb8e → rl_insert(count=<optimized out>, c=<optimized out>)
[#5] 0x5584b824dffb → _rl_dispatch_subseq(key=<optimized out>, map=0x5584b8287420 <emacs_standard_keymap>, got_subseq=<optimized out>)
[#6] 0x5584b824e5b0 → _rl_dispatch(map=<optimized out>, key=<optimized out>)
[#7] 0x5584b824e5b0 → readline_internal_char()
[#8] 0x5584b824ef2d → readline_internal_charloop()
[#9] 0x5584b824ef2d → readline_internal()
──────────────────────────────────────────────────────────────────────────
gef➤  x/x 0x5584b91541f0-8
0x5584b91541e8:            0x00003737

This indeed looks like the data gets written step by step, or in our case, more like unwritten. The same function is called repeatedly and more and more of the data gets bytewise written.

Now we have sufficient information to take a look into the source code to figure out what happened. The function rl_insert_text can be found in text.c:85. The following snippet shows the part that is interesting for us:

  if (rl_end + l >= rl_line_buffer_len)
    rl_extend_line_buffer (rl_end + l);

  for (i = rl_end; i >= rl_point; i--)
    rl_line_buffer[i + l] = rl_line_buffer[i];
  strncpy (rl_line_buffer + rl_point, string, l);
[…]
  rl_point += l;
  rl_end += l;

The overflow happens in line 99, where a strncpy overflows into the next chunk. How does this happen?

Let’s take a look at the values of the variables and then discuss the previous source code:

gef➤  print rl_end
$3 = 0xb4
gef➤  print rl_point
$4 = 0x350a
gef➤  print rl_line_buffer_len
$5 = 0x3500
gef➤  print rl_line_buffer
$6 = 0x5584b9150ce0 "\377\200\"\377p|pp\"pppppppp"
gef➤  heap chunk 0x5584b9150ce0
Chunk(addr=0x5584b9150ce0, size=0x3510, flags=PREV_INUSE)
Chunk size: 13584 (0x3510)
Usable size: 13576 (0x3508)
Previous chunk size: 0 (0x0)
PREV_INUSE flag: On
IS_MMAPPED flag: Off
NON_MAIN_ARENA flag: Off

Let’s analyze this snippet from the top. If rl_end (a global) + l are greater than rl_line_buffer_len the buffer is extended. rl_extend_line_buffer is a wrapper for realloc. It also takes care of adjusting rl_line_buffer_len. In text.c:99 (line 6 in our snippet), the overflow occurs. rl_line_buffer is a pointer to a heap-allocated buffer, and we add the value of rl_point. However, it is never checked if this arithmetic operation still points into our allocated buffer. There is only one only check regarding the buffer and its size, during which the size is compare against rl_end and l. From the source code, we can learn that rl_end should point to the buffer’s end and rl_point to a point in the buffer.

See Readline.h:545

/* The location of point, and end. */
extern int rl_point;
extern int rl_end;

Thus, implementing a check comparing rl_point + l against rl_line_buffer_len might prevent this memory corruption but does not make any sense in this context. Thus, we need to find where the state was corrupted such that rl_point > rl_end.

We can use our GDB session with reverse debugging again. We disable the existing watchpoint. For our approach, we’ll use Python. The GDB Python API allows you to sub-class the breakpoint class and write a custom stop function. Depending on the custom stop function’s return value, the program will break, and you can analyze it, or the breakpoint will be stepped silently. I used a GDB script to compare rl_point and rl_end, and if rl_point was greater, it would break.

I then used this custom breakpoint as a watchpoint for both rl_point and rl_end. After pasting the script as described in the Gist we have four breakpoints:

gef➤  info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep n
0x00007f0e82ef9580 <realloc>
               breakpoint already hit 2 times
2       hw watchpoint  keep n                      *0x5584b91541e8
               breakpoint already hit 2 times
5       hw watchpoint  keep y                      rl_end
6       hw watchpoint  keep y                      rl_point

Now we reverse the execution. This takes some time as the watchpoints are triggered frequently, and the condition does not change for quite some time.

gef➤ reverse-continue
Continuing.

We found the condition were everything was okay:

──────────────────────────────────────────────── source:isearch.c+616 ────
    611   
    612           break;
    613   
    614           case -4:          /* C-G, abort */
    615           rl_replace_line (cxt->lines[cxt->save_line], 0);
 →  616           rl_point = cxt->save_point;
    617           rl_mark = cxt->save_mark;
    618           rl_deactivate_mark ();
    619           rl_restore_prompt();
    620           rl_clear_message ();
    621   
───────────────────────────────────────────────────────────── threads ────
[#0] Id 1, stopped 0x5584b825f171 in _rl_isearch_dispatch (), reason: BREAKPOINT
─────────────────────────────────────────────────────────────── trace ────
[#0] 0x5584b825f171 → _rl_isearch_dispatch(cxt=0x5584b915b750, c=<optimized out>)
[#1] 0x5584b825ffef → _rl_isearch_dispatch(c=<optimized out>, cxt=0x5584b915b750)
[#2] 0x5584b825ffef → rl_search_history(direction=<optimized out>, invoking_key=<optimized out>)
[#3] 0x5584b824dffb → _rl_dispatch_subseq(key=<optimized out>, map=0x5584b8287420 <emacs_standard_keymap>, got_subseq=<optimized out>)
[#4] 0x5584b824e5b0 → _rl_dispatch(map=<optimized out>, key=<optimized out>)
[#5] 0x5584b824e5b0 → readline_internal_char()
[#6] 0x5584b824ef2d → readline_internal_charloop()
[#7] 0x5584b824ef2d → readline_internal()
[#8] 0x5584b824ef2d → readline(prompt=0x5584b827842b "")
[#9] 0x5584b824d630 → main(c=<optimized out>, v=<optimized out>)
──────────────────────────────────────────────────────────────────────────
gef➤  print rl_point
$7 = 0x11
gef➤  print rl_end
$8 = 0x11
gef➤  print cxt->save_point
$9 = 0x3468

At this point, an old context is restored. However, not the full context is restored. All that is missing from the source code listing above is the return statement. This program restores rl_point but not rl_end. This resulted in the state where we observed the buffer overflow.

Let’s take a look at the fix:

diff --git a/isearch.c b/isearch.c
index ef65e5f..080ba3c 100644
--- a/isearch.c
+++ b/isearch.c
@@ -619,6 +619,7 @@ opcode_dispatch:
       rl_restore_prompt();
       rl_clear_message ();
+      _rl_fix_point (1);    /* in case save_line and save_point are out of sync */
       return -1;
     case -5:          /* C-W */

If rl_line and rl_point are out of sync, they are being synchronized via _rl_fix_point. With this fix, all of the Aborts reported by honggfuzz have been fixed. So even though honggfuzz does some initial triaging and bucketing, the resulting crashes had one root cause, which was uncoverable with time travel debugging. Using rr to analyze the root cause this bug saved a lot of time and nerves as we could easily reverse the execution to a previous point. It also made writing this blog post and documenting the steps easy. If you realized you screwed up something earlier and reported false information, you can simply go to this point where you screwed up and fix the data ;).

I want to thank Chet Ramey, the maintainer of GNU readline, for fixing the bug and point out an error in my initial analysis of the flaw. Initially, I checked for the faulty condition where rl_point was greater than rl_end when reverse debugging. I then encountered code where I missed that the values were synchronized via _rl_fix_point. While writing this blog post, I noticed that it is far easier to check until the issue was no longer present.

Now that we know the vulnerability details, it would be nice if we could detect further occurrences of this issue. This can be achieved with static tools, like CodeQL or Joern. I chose to use a dynamic approach as I already had a corpus from the previous and this fuzzing campaign. I used Frida’s Stalker to gain control at each return statement and checked If rl_point was greater than rl_end. In the end, I did not find more occurrences of this issue. If you are interested in the Frida script, you can find it here.

Timeline:

2020-11-10 bug reported to the maintainer of GNU readline

2020-11-10 maintainer acknowledged the bug

2020-11-18 New version with a bugfix is announced on the GNU readline mailing list

Tillmann Oßwald

Root Cause Analysis of a Heap-Based Buffer Overflow in GNU Readline

Comments

Leave a Reply to Tillmann Oßwald Cancel reply