06/23/2020

Introducing the TypeRefHash (TRH)

We introduce the TypeRefHash (TRH) which is an alternative to the ImpHash that does not work with .NET binaries. Our evaluation shows that it can effectively be used to identify .NET malware families.

Update (20.10.2022)

After publishing this post, we were notified that Joe Desimone (@dez_ on Twitter) used a very similar approach for hashing the TypeRef table in his ClrGuard (Code on Github) in 2017. He also presented his work on Derbycon 2017 (Video on Youtube).

Introduction

The ImpHash was introduced in 2014 by FireEye [1]. It has since been used by many malware analysts and implemented in tools like VirusTotal to identify similar malware samples by their imports. In theory, if programs use the same imports, they use similar source code.

.NET samples usually only import mscoree.dll, such that there is only a handful of different ImpHashes for all .NET binaries. Therefore, the ImpHash cannot be used here. This motivated us to find an alternative, the TypeRefHash (TRH). To show the imported DLLs, functions and the TypeRef table, we used the online tool penet.io.

Imported DLLs of a .NET sample — The only import of a .NET sample is the "mscoree.dll".

.NET files store imported namespaces of their referenced types in a so-called Metadata table. We can use these to construct an identifier like the ImpHash. Similar to the combination of DLL/function name in the Import table, the TypeRef table contains a list with type names and their corresponding namespace. For example a .NET binary may import the type DebuggerBrowsableState from the namespace System.Diagnostics.

TypeRef table of a .NET binary. — TypeRef table of a .NET binary, with the imported namespaces and types.

Calculation

To calculate the TRH we extract the TypeRef table and resolve the indices to the corresponding strings.

Order the entries by TypeNamespace and then by TypeName.
Concatenate the TypeNamespaces and TypeNames with a dash. In case that the namespace is empty, the concatenated string starts with the dash.
Join all strings with commas and calculate the SHA256 hashsum of the resulting UTF8 byte-string.

We use SHA256, instead of MD5 which is used for the ImpHash, as we already see MD5 collisions on our data sets. We order the entries in the table to prevent attacks where a different TypeRefHash could be created for a sample by just reordering the table. A similar attack was shown for the ImpHash by Balles and Sharfuddin [2]. We chose a dash and a comma as the seperators, as they are not valid in namespaces and type names in .NET.

Imagine we have a .NET sample with the following simplified TypeRef table:

#	TypeName (Resolved)	TypeNamespace (Resolved)
0	CompilationRelaxationsAttribute	System.Runtime.CompilerServices
1	RuntimeCompatibilityAttribute	System.Runtime.CompilerServices
2	TargetFrameworkAttribute	System.Runtime.Versioning
3	DebuggingModes
4	AssemblyFileVersionAttribute	System.Reflection

This results in the following ordered and concatenated strings. It should be noted that TypeRefs that have an empty namespace are sorted to the beginning of the list.

-DebuggingModesSystem

System.Reflection-AssemblyFileVersionAttribute

System.Runtime.CompilerServices-CompilationRelaxationsAttribute

System.Runtime.CompilerServices-RuntimeCompatibilityAttribute

System.Runtime.Versioning-TargetFrameworkAttribute

This is concatenated to the following final string:

-DebuggingModesSystem,System.Reflection-AssemblyFileVersionAttribute,System.Runtime.CompilerServices-CompilationRelaxationsAttribute,System.Runtime.CompilerServices-RuntimeCompatibilityAttribute,System.Runtime.Versioning-TargetFrameworkAttribute

The resulting TRH is the SHA256 hashsum of the above string.

63AE8074B4C2EF8E36FE3272BE23B445CEAB495E14877935C457E75CFB5E5A1E

You can find the TRH reference implementation in the PeNet library here.

Evaluation

How good can a TypeRefHash identify a certain malware family? To answer this, we evaluated .NET samples that we received mid May to mid June 2020 and looked at the corresponding hashes for seven families. We chose those, because we were able to collect a significant number of samples for each malware family, such that a meaningful evaluation is possible.

We looked at the following families:

Malware Family	# Samples
AsyncRAT	558
Blackshades	5035
Bladabindi	7793
DiscordTokenGrabber	159
Nanocore	1335
QuasarRAT	517
RevengeRAT	276

We inspected the distribution of different TypeRefHashes for those families. In the following figures the blue sections depict the most common TRH for that family. If the number of samples with the same TypeRefHash was equal or lower than five, we aggregated those TRHs in the shaded areas, to not pollute the chart.

Percentage of different TRH per malware family.

We can see that in most cases one TypeRefHash dominates a family. Especially blackshades could be identified very successfully with the two most common TRHs comprising 97% of all analysed samples.

We evaluated the distribution for different malware families. The most common TypeRefHash for each family can be seen in the following table:

Malware Family	Most common TRH
AsyncRAT	4807b5cd7256fad54967dfe3c394c27d16bad1ac95b0306911a3546025bd6ccf
Blackshades	306db7dcdf4dd7bbf2eaa054a8c050fb97cbe84c0da87528c6e508ac5e11607b
Bladabindi	695409c18e59ff8a2c04f5572f61d35157ea1ce34e6f3db4975dfbaeb5d7e07f
DiscordTokenGrabber	6f917770f111b5e0f6bd7b1ccd3adf491fbc756bf031fe107233d3b19d4737d
Nanocore	31feea84c77a972ebe0bfc87ac90630ad824e91965b664c47d0d2b0761b30d16
QuasarRAT	03d72f6a261029edbd5028d814b27b075f5c3c62219dbfe8a349998909d07b9a
RevengeRAT	faaf850b8f9ce7eeed4c9d18b2fbd70ef1c9dde8d920c6e333829f3150d9ca08

The distribution can be seen in the following figures.

Percentage of malware families per most common TRH.

We can see that for five families, we hit the right samples in 100% of the cases. When looking at the most common TypeRefHash of QuasarRAT, we found one CardinalRAT sample, too. Only with RevengeRAT our results are a little bit more inaccurate, as we found 15 Bladabindi and one AsyncRAT samples. We also found two samples known to be clean. Therefore, the TypeRefHash cannot be used effectively for some malware families, like Revengerat.

Summary

As the ImpHash cannot be used with .NET binaries, we developed a similar method called TypeRefHash (TRH). The TRH is a SHA256 hashsum over the imported .NET namespaces and types. This is similar to the ImpHash, which is an MD5 hashsum over the imported DLLs and their functions.

Our evaluation showed that the TRH can be used to identify malware families with a similar precision as the ImpHash for non-.NET files. Depending on the family, the TRH can be unique for one malware family or can be found in multiple families.

You can find the reference implementation in the PeNet library here.

You can find a list with the samples used for the evaluation with their corresponding family name and TRH here.

A command line tool to compute the TRH on Windows and Linux can be found here.

References

[1]: https://www.fireeye.com/blog/threat-research/2014/01/tracking-malware-import-hashing.html (accessed: 17.06.2020)

[2]: Balles, C. and Sharfuddin, A., 2019. Breaking Imphash. https://arxiv.org/ftp/arxiv/papers/1909/1909.07630.pdf (accessed: 17.06.2020)

Disclaimer: The PeNet library and penet.io are both projects from one of the authors of this blog entry (Stefan Hausotte).

Back to Blog

Phillip Kemkes

R&D Engineer

Introducing the TypeRefHash (TRH)

Update (20.10.2022)

Introduction

Calculation

Evaluation

Summary

References

Share Article

Phillip Kemkes

Stefan Hausotte

Share Article

Content

Topics