Features

To help malware reseacher, SOC investigator, CERT analyst, etc. Exalyze perform some automatic actions. Objective is to do almost a reverser would like to show in a start of his analysis. We also added the capacity to pivot on those datas and identify other potentially related samples. It’s a “toolbox” for many cases to help an analyst.

Malware analysis tools

Sequences Extraction

During fast analysis of malware samples, we often look for strings and external API usage, and how these are used in the sample functions or subfunctions.

Th sequences extraction feature automatically does that, by analyzing strings cross references and sensitive API usage.

The set of identified actions is then stored in database, and displayed for each sample.

This type of quick analysis can lead to a fast understanding (albeit partial) of the malware capabilities, meaning a lot of gain in analysis time during reverse engineering!

In the example below, we can see that the sequences analysis of the sample helps quickly identifying the sample’s capabilities.

Example of sequence analysis

Can you guess what this malware is doing just by looking at the sequences view?

Capabilites Extraction

When manually analyzing samples, we often proceed to identify their capabilities by recognizing global features of the binary.

The analysis is mostly based on personal expertise and subjective assessments, as for example, when we see the presence of a call to CreateProcessW, we assess “This sample probably create processes!”

Such individual calls, functions or pattern clusters may indicate distinct sample capabilities, and may be missed during manual analysis.

Upon sample submission, we conduct thorough functional analysis to identify specific capabilities, then we map these capabilities to their corresponding Tactics, Techniques, and Procedures (TTPs).

A capability summary is then compiled and included in the analysis report.

This executive overview provides valuable intelligence to analysts without reverse engineering expertise, enabling them to quickly understand the malware’s potential functionality.

Note

For some of those capabilities we also generate the corresponding MITRE ATT&CK TTPs and summarize these TTPs in a matrix such as seen below:

TTP table example

Entropy Map

By using a color coded visualization of the malware samples, analysts can quickly recognize its structures. When this visualization is used a preliminary triage tool, if significantly accelerates the initial assessment phase of malware analysis by highlighting anomalies that warrant deeper investigation.

Note

For each file we generate an associated entropy bitmap based on the entropy of the sample separated in 256 bytes chunks.

The entropy map is represented with the following colors:

Entropy map color scale

Entropy map color scale

For example, an analyst can quickly identify packed or encrypted samples by identifying large section of the binaries with high entropy (colored in red). The two following figures show such an example by highlighting the differences between a sample and its packed version:

Entropy map color scale

Entropy map of a static binary

Entropy map color scale

Entropy map of the same binary packed with UPX

Yara Generation

Exalyze’s YARA generation works in four steps:

  • Identification of interesting strings to process.

  • Exclusion of strings already seen in a database of precomputed trusted binaries.

  • Disassembly of the sample and extraction of interesting parts of bytecode.

  • Exclusion of patterns already seen in a database of precomputed trusted binaries.

More than a hundred thousands of samples populate the trusted database, but we can miss some patterns, so we added a match check if some binaries share more than 60% of their patterns.

This process ensure that the Yara generation is fast, and with our tests it works mostly fine :D

Similarity Analysis

Code similarity is a unique capability of Exalyze to conduct comprehensive comparison between a sample and ALL other executables already analyzed.

Our similarity analysis engine is based on the Machoc hash which we published in 2016.

This methodology requires a full disassembly of each sample, and then a generation of the Control Flow Graphs (CFGs) for each function. These CFGs are then hashed using Murmurhash, and creates a unique signature for each sample.

The code profiling approach is complementary to traditional hash search such as imphash or richhash, because it enables analysts to find find evolutions of malware families even when all other types of hashs don’t match.

For example, considering the code evolution below, we can see that most of the code is the same, but a few functions were added. Using our similarity analysis matching we can quickly identify this kind of match between a sample and thousands of others.

Illustration of a binary diff

Note

We establish a similarity threshold of 75% code correspondence to classify executables as related.

Finding Similar Samples

When hunting for threats, we are often looking for variants of known malware families.

This unique capability is very useful for both threat hunters and SOC/CERT analysts, because if “the funny sample” you found is highly similar to a malware, it probably isn’t a good news.