Breaking

Pidgin, Word Documents, my Clipboard and I

Lately, I’ve experienced some weird Pidgin crashes when I was copy&pasting into chat windows. The strange part was: I didn’t even know what triggered the crash because I actually didn’t know what was in my clipboard at this exact point. This is a quick write-up of how I investigated the issue and some interesting properties I found out about clipboards.

Everything started with a document that I was editing in a Windows VM in Microsoft Word. At some point, I wanted to copy some lines of the document and paste it into a Pidgin chat window on my Linux host system. As I did this, I noticed that when I pasted the data into the chat window it included a lot of white spaces. I thought something went wrong and just tried to delete it by pressing CTRL+A (to mark everything) and press BACKSPACE. But this caused Pidgin (2.13.0-5 on Arch Linux) to close with a segfault and created a core dump.

I tried to get some information out of the core dump and the backtrace but it just seemed to be something related to GTK. Because my GTK libraries did not include symbols, these messages were mostly useless:

Stack trace of thread 1875:
#0  0x00007f5b359f8d7f raise (libc.so.6)
#1  0x00007f5b359e3672 abort (libc.so.6)
#2  0x000055a416179a9d n/a (pidgin)
#3  0x00007f5b359f8e00 __restore_rt (libc.so.6)
#4  0x00007f5b366abbbe gtk_widget_set_size_request (libgtk-x11-2.0.so.0)
#5  0x000055a41616b616 n/a (pidgin)
#6  0x00007f5b35fc12d2 g_closure_invoke (libgobject-2.0.so.0)
#7  0x00007f5b35fae348 n/a (libgobject-2.0.so.0)
#8  0x00007f5b35fb201e g_signal_emit_valist (libgobject-2.0.so.0)
#9  0x00007f5b35fb2a80 g_signal_emit (libgobject-2.0.so.0)
#10 0x00007f5b366aa306 gtk_widget_size_allocate (libgtk-x11-2.0.so.0)
#11 0x00007f5b365edd80 n/a (libgtk-x11-2.0.so.0)
#12 0x00007f5b35fc12d2 g_closure_invoke (libgobject-2.0.so.0)
#13 0x00007f5b35fae348 n/a (libgobject-2.0.so.0)

The second thing I wondered about: what was the payload that triggered this behavior? I tried different inputs and variations of documents and it seemed to be related to the comments in the Word document because it did not trigger with data from documents that had no comments.

My test document was just a Word document that included the string “test” and a comment “testcomment” with a reference to this string.

Test document with comment

Pressing CTRL+A and copy&pasting it into a chat window will be enough to crash Pidgin (on the Linux host system). But if you just paste this into e.g. a text editor it will only include the string “test”. If you would paste it e.g. in another Word document it would include the full document with the comment. The question is, where is the rest of the document on my host system?

I started to search for some information about how clipboards work and found this excellent blog post: X11: How does “the” clipboard work?. In order to fully understand the following you should read this blog post, it includes a lot of information and is very well written.

A tl;dr of what I found out: There are actually multiple clipboards and multiple targets. These targets can potentially include totally different contents and may be interpreted differently depending on the application that either writes or reads those targets.

In order to find out what my clipboard looked like when copy&pasting from Word I first tried to get all of the available targets with xclip:

$ xclip -o -target TARGETS -selection clipboard
UTF8_STRING
text/plain;charset=UTF-8
text/plain;charset=utf-8
STRING
TEXT
text/plain
text/html
text/html;charset=utf-8
TARGETS
MULTIPLE
TIMESTAMP

This already looked quite interesting, because for a clipboard that just returns “test” this looked like a lot of data. When trying to read some of those targets it was quite easy to find our output:

$ xclip -o -target “TEXT” -selection clipboard
test

To get the full content I wrote a small script to dump all targets:

#!/bin/bash

for target in $(xclip -o -target TARGETS -selection clipboard);do
    echo $target
    xclip -o -target "$target" -selection clipboard
    echo
    echo "---"
done

With the output of this script it turned out that the “text/html” target was pretty interesting:

$ xclip -o -target “text/html” -selection clipboard |head -n10
Version:1.0
StartHTML:0000000105
EndHTML:0000044404
StartFragment:0000042819
EndFragment:0000044364

<html xmlns:o=”urn:schemas-microsoft-com:office:office”
xmlns:w=”urn:schemas-microsoft-com:office:word”
xmlns:m=”http://schemas.microsoft.com/office/2004/12/omml”
xmlns=”http://www.w3.org/TR/REC-html40″>

It returned 896 lines of mostly HTML code. For my simple example document, it looked like it actually contained the full document! So I just dumped the content of the “test/html” target to a file and reloaded it into the clipboard:

$ xclip -o -target “text/html” -selection clipboard > test_html_target
$ cat test_html_target | xclip -i -target “text/html” -selection clipboard

Pasting this into a chat window still triggered the crash, so it seemed like the “text/html” target was the problem. I started to continuously delete parts of the output and tried it until the point where I was able to pinpoint the exact cause. It was the following line:

<hr class=msocomoff align=left size=1 width=”33%”>

I wasn’t exactly sure why but the document included a horizontal line. But pasting an HTML horizontal line into a Pidgin chat window and deleting it will actually crash Pidgin. The following command is the easiest way to trigger the crash:

$ echo “<hr>” | xclip -i -target “text/html” -selection clipboard

After executing this just open a Pidgin chat window, paste it (CTRL+V) and just keep pressing BACKSPACE. At some point, Pidgin will just crash.

After searching for Pidgin issues that were related to copy&pasting and horizontal lines I found three related issues (the oldest one is four years old and the latest Pidgin version is still affected):

It is indeed related to changing the content of the window as it will be triggered when deleting the input. I thought about if it might be triggerable remotely, but I’m not sure if there is a way to delete the content of the output part of a chat window. Clearing the content was the only thing that could be remotely related as executing JavaScript (to do something like <span onmouseover=”this.remove()”>A</span>) seems not to be supported.

It should be noted that similar behavior can be triggered for other applications that support different clipboard targets. It is also imported to keep in mind when someone tries to copy&paste pre-formatted data like HTML to test other applications that the target might be important. And when it comes to copy&pasting text with comments from Microsoft Word to Pidgin: after pasting it you should rather just send it, even if you would want to delete it or the pidgin will die. 🙂

Cheers!