Extracting Data from Very Large Pcap Files – Part 3: Pcap Filtering in the Cloud

This is the third (and last) part of the series (parts 1 & 2 here). We’ll provide the results from some additional tests supported by public cloud services, namely AWS (Amazon Web Services).

Lab Setup

The Amazon Elastic Compute Cloud (short: EC2) provides a flexible environment for the on demand provisioning of virtual machines of different performance levels. For our lab setup, a so-called extra large instance was used. According to Amazon, the technical specs are the following:

15 GB memory

8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)

1,690 GB instance storage

64-bit platform

I/O Performance: High

API name: m1.xlarge

Since the I/O performance of single disks had turned out to be the bottleneck in the “local” setup, eight Elastic Block Storage (short: EBS) volumes were created and attached to the instance. Each EBS volume is hosted within a specific availability zone and can be attached to instances running in the same zone. EBS volumes can be created and attached issuing two commands of the amazon ec2 command line tools. Therefore the amount of storage can be scaled up very easily. The only requirement (for our tests) is the existence of a sufficient number of EBS volumes which then contain parts of the pcap file to be analyzed.



During the benchmarks, the performance was significantly lower than with the setup described in the previous post, even though eight different EBS volumes were used to avoid the bottleneck of a single storage volume. The overall performance of the test was seemingly limited by the I/O performance restriction within virtualized instances and virtualized storage systems. Following the overall cloud computing paradigm, performance limitations of this kind might be circumvented by using multiple resources which do the processing in parallel. This could be done by using multiple instances or by using frameworks like Amazon MapReduce which are designed to process huge sets of data. Applying this approach to the analysis of pcap files, the structure of the pcap format carries some inherent problems. The format consists of a binary representation of the data which is structured by the time of the captured packets and not by logical packet traces. Therefore it would be necessary to process the complete pcap file by each instance to extract all streams to identify which streams of the file are to be analyzed by the concrete worker instance. This prevents an efficient distribution of the analysis in multiple jobs or input files. If the captured network data would be stored in separate streams instead of one big pcap file, the processing using a map/reduce algorithm would be possible and thus potentially increase scalability significantly.

That said, finally here are the results of our testing (test methodology described in earlier post):









So it took much longer to extract the data from a 500 GB file which can be attributed to the increased latency times accessing centralized storage (from a SAN/over the network) when compared to locally connected SSDs.
Hopefully this little series provided some insight for you, dear readers. We’ll publish the full technical report as an ERNW Newsletter in the next months.
Have a good one, thanks



Continue reading

Extracting Data from Very Large Pcap Files – Part 2: Results from the Local Lab

In the first post I’ve laid out the tools and lab setup, so in this one I’m going to discuss some results.

Description of overall test methodology

To evaluate the performance of the different setups used to analyze capture data, both tcpdump and pcap_extractor (see last post) were used. For the tests, five capture files were created using mergecap. Various sample traffic dumps were merged to five large files with different file sizes. All these files consisted of several capture files containing a variety of protocols (including iSCSI and FCoE packets). Capture files of ∼40, ∼80, ∼200, ∼500, and ∼800 GB size were created and were analyzed with both tools. For all tests the filtering expressions for tcpdump and pcap_extractor were configured to search for a specific source IP and a specific destination IP matching to iSCSI packets contained in the capture file. Additionally pcap_extractor was “instructed” to look for some search string (formatted like a credit card number).To address the performance bottleneck (again, see last post), that is the I/O throughput, two different setups of the testing environment (see above) were implemented, the first one going with a raid0 approach using four SSD hard drives, the second one with four individual SSD hard drives, each of them processing only a fourth of the analyzed capture file. Standard UNIX time command was invoked to measure the time of execution. Additionally the tools analyzing the data were started with the highest possible scheduling priority to ensure execution with the maximum of available resources. This is a sample command line invoking the test:

/usr/bin/time -hp /usr/bin/nice -n -19 ./pcap_test2 -i $i/in.pcap -o $i/out.pcap -f “ip src and ip dst” -s “5486000000620012”  > $i/out &


The most interesting results table is shown below:


So actually extracting a given search string from a 500 GB file could be done in about 21 minutes, employing readily available tools and using COTS hardware for about 3K EUR (as of March 2011). This means that an attacker disposing of (large) data sets resulting from previous eavesdropping attacks will most likely succeed in getting the exact data she’s going after. Furthermore the time needed scales in a lineary fashion with the file size, so that processing a 1 TB data volume presumably would have taken ∼42 minutes, a 2 TB file would have taken ∼84 minutes and so on. In addition, SSD prices are constantly declining, too.

Thus it could be shown that the perception that the sheer volume of data gained from eavesdropping attacks on high speed links might prohibit an attacker from analyzing this data is, well, simply not correct ;-).

Risk Assessment & Mitigating Controls

Several factors come into play when trying to assess the actual risk of this type of attack. Let’s put it like this: once an attacker disposes either of physical access to a fibre at some point or is able to get into the transport path by means of certain network based attacks – which are going to be covered in another, future post – collecting and analyzing the data is an easy task. If you have sufficient reasons for trusting the party actually implementing the connection  (e.g. a carrier offering Metro Ethernet services) and “the overall circumstances” you might rely on the isolation properties provided by the service and topology. In case you either don’t have sufficient reasons to trust (some discussion on approaches to “evaluate trustworthiness” can be found here or here) or in highly regulated environments, using encryption technologies on layer 2 (like these or these) might be a safer approach.
In the next post we’ll discuss the cloud based test setup, together with its results. Stay tuned &
have a great day,


Continue reading

HITB Aftermath

didn’t find the time so far to post a short blog about HITB Amsterdam so far… but here we go.

Unfortunately I couldn’t arrive in AMS earlier than Thursday evening so I missed the first day (and – from what I heard – some great talks). However we went out for dinner that night with the likes of Andreas (Wiegenstein), Jim (Geovedi), Raoul (Chiesa), Travis (Goodspeed), Claudio (Criscione) and some more guys and I had some quite good conversations, both on technical matters and on Intra-European cultural differences ;-). Btw: thanks again to Martijn for taking care of the restaurant.

On Friday I listened to Travis’ talk on “Building a Promiscuous nRF24L01+ Packet Sniffer” (cool & scary stuff) and a part of this talk on iPhone data protection (well delivered as well). In the afternoon Daniel and I gave an updated version of the “Attacking 3G and 4G Telecommunication Networks ” presentation (the HITB version can be found here). Overall I can say that HITB was an excellently organized event with a great speaker line-up (not sure if we contributed to that one ;-)) and some innovative ideas (inviting a bunch of local hacker spaces among those). Dhillon is a fabulous host and I already regard HITB as one of the major European security events (next to Troopers, of course ;-)).
Have a great weekend everybody


Continue reading

Yet another update on IPv6 security – Some notes from the IPv6-Kongress in Frankfurt

A couple of hours ago Christopher (Werny) and I gave this presentation at the Heise IPv6-Kongress, which overall was a quite interesting and well-organized event bringing together a number of practitioners from the field. While yesterday’s talks were dominated by a certain euphoria and optimistic pioneer spirit, the second day featured some security talks which induced slight shadows to the brave new world of IPv6 ;-). I particularly enjoyed meeting Eric Vyncke from Cisco (one of the two authors of this great book) and Marc “van Hauser” Heuse who released a new version of the THC-IPV6 tool set today. We had some fruitful discussions and we took the opportunity to test some of his newly implemented attacks against “RA Guard” running on a 4948E Chris and I had brought for a demo within our talk. Unfortunately – or fortunately in terms of a “from theory to reality” approach – I have to say that Marc found a quite clever way to circumvent RA Guard by putting the actual “RA payload” into a second frame following a first one mostly containing a “long & empty” destination option (after a fragmentation header pointing to the mentioned second one). To get an idea pls see these screenshots from Wireshark.















This actually completely defeats (the current implementation of) RA Guard which means that the victim machine received a whole lot of router advertisments…


Eric who gave an excellent talk on his own (mostly covering defense techniques but, amongst others, describing some interesting attacks against tunnel technologies, which btw reminds me I still owe you a blogpost on those… trust me: it’s not forgotten ;-)) stated that this specific type of attack could be mitigated by using an ACL containing sth along the lines of

deny ip any any undetermined-transport

[which is supposed to match any IPv6 packet where the upper-layer protocol cannot be determined].

We (Christopher and I) weren’t even aware of that keyword and we did not yet have an opportunity to test its effectiveness. Still there’s some immediate lessons to be learned from those hours in Frankfurt:

a) in the field of IPv6 security one can learn something new every day 😉

b) there’s still so much “uncovered space” in the IPv6 (security) world that we’ll certainly see yet-unknown types of attacks in the next years.

c) Marc is a really smart guy (which prompted me inviting him to speak at next year’s Troopers ;-))

d) Going with ACLs on “access layer”/customer/subscriber facing ports might be the better approach than just using RA Guard. (on a related note: some Cisco guy I talked to was very sceptical that RA Guard will ever be available on 2900 or 3500 series switches).


Most probably this ([1], [2], [3]) little sequence of IPv6 related posts will be continued soon (but not before we’ve finished the update of the “Attacking 3G and 4G networks” talk to be given at HITB Amsterdam next Friday ;-)).

Have a great weekend everybody


Continue reading

Evaluating Operational Feasibility

I’ve discussed the concept of evaluating the operational “feasibility” (or “impact”, depending on your point of view) of security controls before. Some people approached me asking “which considerations should we take into account when trying to understand or rate this for $SOME_SECURITY_CONTROL?”. Therefore, in the following I’ll give an unordered list of factors to consider to get an understanding of the “operational feasibility” of a given security control. Two things should be noted in advance:

– evaluating the operational “feasibility” (which is “a positive factor”) as opposed to the operational “impact” (being a “negative factor”) allows for easier integration into a metric scheme, as the two main factors-to-considered – the other one is the “security benefit” of a control – can be expressed on the same scale then, with a high value meaning a good thing.
– as the (maturity of) and as-is state of operational processes usually have a much higher impact on the security posture of a given environment than the components deployed in the environment (see this presentation, slide 14ff.), this approach focuses on _operational costs_ and does not take initial investment costs into account. In short: opex is the thing to look at, not capex.
Here we go… for each (potential) security control you might look at:

a) How many lines of code/configuration does it need?

b) Can it be implemented by means of templates or scripts? Effort needed for this?

c) To what degree does the implementation differ in different scenarios (e.g. per system/subnet/site)? Can “the difference” be scripted, e.g. taken from another source (a CMDB) or “calculated” (like the addresses of neighboring routers on the local link)?

d) How much additional configuration is needed to establish the previous functionality/state? E.g. to pass legitimate traffic in case of a (“fresh”) application of ACLs?

e) What’s the “business impact” incl. number of associated support/helpdesk calls?

f) Cost for _deployment_ of additional hardware, licenses or other tangibles. (again, the cost for their initial procurement is capex).

g) In case of a tangible security control think about the full life-cycle management of the asset (patching, monitoring, alerting, capacity management and the like). This one is often heavily overlooked, see for example this excellent blog post of Anton Chuvakin for a discussion of the “real costs of a SIEM deployment”.

h) Does the control require a new operational process or task?

i) Propagation: how far does the (implementation of the) control reach?

j) How many different people or companies/partners (sub contractors) touch the work?

k) Impact on OLAs and SLAs.

The above might give an idea of how to tackle the task of evaluating the operational feasibility. In another, future blogpost I may discuss a sample metric using this stuff from a real-world environment (will have to write down and anonymize some pieces though). For the moment many thanks to Friedwart, Angus and Sergey for valuable input to the above list.

Feel free to contact us (or leave a comment) with suggestions as for additional considerations.

have a good one,


Continue reading

update for your fuzzing toolkit

As I’m currently developing the ‘next gen’ state-full fuzzing framework @ERNW [called dizzy, to be released soon 😉 ], I will give you an updated set of fuzzing scripts from the ‘old world’.

Some of you will remember the 2008 release of sulley_l2, which was a modified version of the sulley fuzzing framework, enhanced with Layer 2 sending capabilities and a hole bunch of (L2) fuzzing scripts. All the blinking, rebooting, mem-corrupting ciscos gave us some attention. Back from then, we continued to write and use the fuzzing scripts, so the hole collection grew.

Find the latest version of the tool-set here.

If you take a look inside the ‘audits’ folder, you will find all the ERNW made fuzzing scripts. I’ll give you a short description on the most of them:

  • ARP – This are some basic ARP fuzzing scripts, mainly as reference L2 implementation, haven’t found anything interesting with them, yet.
  • BGP – Some scripts for the basic BGP packet types, has nothing to do with Layer2 but will kill some devices 😉
  • CAPWAP – Within our wireless research we also did some wireless mgmt-protocol fuzzing and came up with this scripts. (RFC5415)
  • CDP – Fuzzing scripts for Cisco’s discovery protocol. Most fun is gone here, as bugs were submitted and fixed by the time.
  • DOT1Q – One of the first L2 fuzzing scripts, building a tagged packet.
  • DTP – Fuzzing scripts for Cisco’s dynamic trunking protocol. Thats the one which make Ciscos blink like Christmas-Trees.
  • EXTREME – A hand full of scripts targeting Extreme’s  discovery protocol, those will create purple stack traces 😉
  • GTP – In the 3G / 4G research we did some GPRS tunneling protocol fuzzing, not finished yet.
  • IP – Also more a reference implementation.
  • ISL – As to be complete with the Vlan tagging there is also a script for Cisco’s ISL.
  • LLDP – Those scripts won’t work as expected, if you know why, drop me a mail, you will get dizzy first 😀
  • LWAPP – Also output from the wireless research, by that time this one randomly reboots access points.
  • OSPF – A script for fuzzing OSPF HELO packets, wont get any further, as sulley knows no state.
  • PNRP – Simon’s awesome PNRP fuzzing scripts.
  • PVST – Spanning Tree in a few flavors, if you ever need even more of that packets 😉
  • SNMP – Right, more like an ASN1 fuzzer, but provided some nice results.
  • UDLD – One more L2 protocol with a bunch of strings inside (watch out for the device-id).
  • VRRP – while implementing the VRRP attacks in loki, also did some fuzzing, obviously ;).
  • VTP – An other L2 based, Cisco only protocol, make devices blinking.
  • WLCCP – And the last one is again from our wireless research. Haven’t found anything interesting by fuzzing, but the loki module for this works nice.

So, thats all for now, have fun with the code and stay tuned for more tools on fuzzing to be finished/released soon.




Continue reading

Extracting Data from Very Large Pcap Files – Part 1: Tools and Hardware

There is a common misconception that the sheer amount of data coupled with multiplexed channels (e.g. WDM technology) make successful eavesdropping attacks on high speed Ethernet links – like those connecting data centers – highly unlikely. This is mainly based on the assumption that the amount of resources (e.g. RAM, [sufficiently fast] storage or CPU power) needed to process large files of captured data is a limiting factor. However, to the best of our knowledge, no practical evaluation of these assumptions has so far been performed.

Therefore we conducted some research and started writing a paper (to be released as a technical report shortly) that aims to answer the following questions:

– Can the processing of large amounts of captured data be done “in a feasible way” ?

– How much time and which type of hardware is needed to perform this task?

– Can this be done with readily available tools or is custom code helpful or even required? If so, how should that code operate?

– Can this task be facilitated by means of public cloud services?

We performed a number of tests with files of different sizes and entropy. Tests were both carried out with different sets of dedicated hardware and by means of public cloud services. The paper describes the tools used, the various test setups and, of course, the results. A final section includes some conclusions derived from the insights provided by the test sets.It is assumed that an attacker has already gained access enabling her to eavesdrop on the high speed data link. A detailed description how this can be done can be found e.g. here or here. The focus of our paper is on the subsequent extraction of useful data from the resulting dump file. It is further assumed the collected data is available in standard pcap format.
We’ll summarize some of the stuff in a series of three blog posts, each discussing certain aspects of the overall research task. In the first one we’ll describe the tools and hardware used. In the second we’ll give the results from the test lab with our hardware while the third part describes the tests performed in the (AWS) cloud and provides the conclusions. Furthermore we’ll give a presentation of the results, including a demo (probably the extraction of credit card information from a file with the size 500 GB which roughly equates to a live migration of 16 virtual machines with 32 GB RAM each) at the Infoguard Security Lounge taking place on 8th of June in Zug/Switzerland.
Last but not least before it get’s technical: the majority of the work was performed by Daniel, Hendrik and Matthias. I myself had mostly a “supervisor role” 😉 So kudos to them!

COTS packet analysis tools
A number of tests utilizing available command-line tools (tethereal, tshark, tcpdump and the like) were performed. It turned out that, performance-wise, “classic” tcpdump showed the most promising results. During the following, in-depth testing phase two problems of tcpdump showed up:

– It’s single-threaded so it can’t use multiple processors of a system (for parallel processing). Given the actual bottlenecks to be related with I/O anyway (see below) this was not regarded to be major problem.

– Standard pcap filters do not allow for “keyword search” which somehow limits the attack scenarios (attacker might not be able to search for credit card numbers, user names etc. but would have to perform an IP parameter based search first and then hand over to another tool which might cause an unacceptable delay in the overall analysis process). To address this limitation Daniel wrote a small piece of code that we – not having found an elegant name like Loki so far 😉 – called pcap_extractor.

This is basically the fastest possible implementation of a pcap file reader. It opens a libpcap file handle for the designated input file, applies a libpcap filter to it and loops through all the filter matching packets, writing them to an output pcap file. Contrary to tcpdump and most other libpcap based analysis tools, it provides the possibility to search for a given string inside the matching packets, for example a credit card number or a username. If such a search string is applied, only packets matching the libpcap filter and containing the search string are written to the output file.A call to search a pcap file for iSCSI packets which contain a certain credit card number and write them to the output file would then look like:

# pcap_extractor -i input-file.pcap -o output-file.pcap -f “tcp port 3260” -s “5486123456789012“

The source code of pcap_extractor can be downloaded here.

Identifying the bottleneck(s)

While measuring the performance of multiple pcap analysis tools the profiling of system calls indicated that the tools spend between 85% and 98% of the search time on waiting for I/O. In case of the fastest tool that means 98% of the time the tool does nothing, but waiting for dump data. So the I/O bandwidth turned out to be the major bottleneck in the initial test setups.

Actual lab setup

The final test system was designed to provide as much I/O bandwidth as possible and was composed of:

Intel Core i7-990X Extreme Edition, 6x 3.46 GHz

12GB (3 * 4GB) DDR3 1600MHz, PC3-12800

ASRock X58 Extreme6 S1366 mainboard

4 * Intel 510 Series Elm Crest SSD 250GB

The mainboard and the SSDs were chosen to support SATA3 with a theoretical maximal I/O bandwidth of 6 Gbit/s. FreeBSD was used as operating system.

In this post we’ve “prepared the battle ground” (as for the tools and hardware to be used) for the actual testing, in the next one we’ll discuss the results. Stay tuned & have a great day



Continue reading

Once more: hardening is better than patching

I can’t help myself. And I fully understand that some of you, dear readers, might get a bit annoyed by always hearing the same tune from our side. This post is, surprise!, about yesterday’s Microsoft Patch Tuesday which – as can be seen here and here – disclosed quite a number of vulnerabilities in various Microsoft components. To make the point evoked in this post’s title I’d like to draw your attention to two particular bulletins, both rated as critical.

Microsoft Security Bulletin MS11-028 – Critical, Vulnerability in .NET Framework Could Allow Remote Code Execution (2484015)

The advisory states that “this security update resolves a publicly disclosed vulnerability in Microsoft .NET Framework. The vulnerability could allow remote code execution on a client system if a user views a specially crafted Web page using a Web browser that can run XAML Browser Applications (XBAPs)”.

Looking at the “Workarounds” section, it turns out that the configuration of some specific parameters within Internet Explorer (those are: Loose XAML, XAML browser applications, XPS documents, Run components not signed with Authenticode, Run components signed with Authenticode) would prevent a successful attack,  including potentially future ones against the vulnerable components. Disabling those parameters (amongst others) is exactly what this document suggests.

Microsoft Security Bulletin MS11-029 – Critical, Vulnerability in GDI+ Could Allow Remote Code Execution (2489979)

To quote from the advisory itself: “this security update resolves a privately reported vulnerability in Microsoft Windows GDI+. The vulnerability could allow remote code execution if a user viewed a specially crafted image file using affected software or browsed a Web site that contains specially crafted content”.
Here, in the “Workarounds” section disabling metafile processing is listed as a potential one. Which, in turn, we’ve recommended here.

So, to cut the chase: once more proper hardening could have been your friend, at least for those two “critical” ones.And yes, we’ve already taken the potential business impact of these measures into account. We can safely state that in many environments there’s practically none. But not having to worry about some of yesterday’s advisories and maybe even avoiding getting owned (for MS11-029 Microsoft estimates that it’s “likely to see [a] reliable exploit developed in [the] next 30 days”) might have some benefit in pretty much every organization. Think about it!





Continue reading

Sisters’ Act of MFD Security

Recently Micele and I were researching for our talk about the current state of security of Multifunction Devices (MFDs). Since we’re both seasoned pentesters who are quite familar with MFDs, we were really surprised that very little new research is going on on the topic of MFD security. While diving deeper into the topic, we found a very simple explanation for this: As in 2002, it is still possible to download print or scan jobs using PJL, many devices still offer default FTP or Telnet access, and, of course, stored files can be recovered from MFD hard drives — on an enterprise wide scale. To even strengthen our impression of the current state of MFD security, most devices crashed or did go wild while performing some scans — and we do not talk about fuzzing here.

This devastating result lead to the question how MFDs can be secured. Since there are a lot of MFD hardening resources out there, even from vendors, we decided to put together a comprehensive hardening guide for MFDs. To raise the level of awareness, we put together a lot of examples on attacks on MFDs and then focused on the development of our own MFD security guide which is based on the seven sisters. The result of this approach can be found here. And of course, soon there will be a ERNW newsletter to cover this topic in a more academic and structured way 😉

Continue reading

RSA: Anatomy of an Attack

Lots of stuff has been written about this blog post from RSA describing the (potential) details of the attack, so I will refrain from detailed comments on this piece that Marsh Ray nicely called “some of the most egregious hyperbole I’ve read in infosec”.

Just one short note. Presumably the attack, in an early stage, used a “spreadsheet [that] contained a zero-day exploit that installs a backdoor through an Adobe Flash vulnerability (CVE-2011-0609)”.

I’ve written about Flash here.

nuff said, thanks



Continue reading