Everyone who deals with malware will know this: Malware names are a convoluted mess. AV scanners will show different detection names for the same file. This confusion is also reflected in media coverage. Is there a way out of this mess?
Before we start our expedition into this muddled place, let's get the terminology right. "Malware name" might refer to one of the following:
The first part of our series examines Antivirus detection names. The second part is a dive into malware family names.
The first attempt to make malware naming consistent was in 1991, when a committee at CARO created A New Virus Naming Convention. This was a time where all or almost all existing malware was also a virus. The naming scheme has influenced today's detection names. Most AV vendors use the same or similar components that CARO suggested but often with their own terminology and ordering.
The full name of a virus consists of up to four parts, desimited by points (‘.’). Any part may be missing, but at least one must be present. The general format is
(CARO, 1991, A New Virus Naming Convention)
This article will not describe all of these components in detail but highlight some points. The best description is in the conventions themselves on CARO's website.
The Family_Name portion of the detection name doesn't always denote an actual malware family. CARO's conventions provide four umbrella names for insignificant viruses:
The Group_Name represents a sub group of malware in the same family employing the same rules and does and don'ts as the Family_Name.
The Major_Variant and Minor_Variant create even more specific sub groups. Often the infective length of the virus or consecutive letters or numbers are used for these components.
The Modifier is applied to the detection name if the threat actor used a third-party component to conceal their virus, e.g., a polymorphic engine or a compression tool.
CARO's naming conventions are well-conceived. The format pattern starts with the most generic information and becomes increasinlgy more specific. The does and don'ts listed for the Family_Name portion of the format make sure that the names are easy to remember, well-readable, don't interfere with each other, don't offend any third parties (companies, people), and don't use mutable characteristics to describe a family (e.g. activation dates). They were created with forethought and are still applicable today.
The malware landscape has changed since 1991, and so have the means of detecting malware, which might be one reason that the CARO conventions didn't work out in the long run.
Every AV vendor has their own detection name conventions, but most of them use similar components. They usually include: platform, malware type, malware family, a variant component and additional information that is prepended or appended to the name.
The platform, if it is part of the detection name, specifies the execution environment for the malware. This may be the operating system and architecture (e.g., Win32), a framework (e.g., MSIL for malware using .NET framework), or a programming language (e.g. Powershell). Unspecific terms for unknown execution environments exist too, e.g. Script may denote any text-like file.
The malware type tells something about the behavior of the malware.
The malware family component in a detection name is either the actual family name of the malware or an umbrella name for a broad range of families if the family is not known.
The variant in a detection name is mostly just a counter to distinguish different signatures from each other whereas in the CARO conventions it was meant as an actual variant of the malware family. The variant may consist of numbers or letters or both. In automatically created detections the variant portion may also be generated from the file itself, e.g., via hashing.
The modifier component is optional. It adds additional information about the detection. It may further specify the malware's behavior or type or add a characteristic about the signature, e.g., it may state that the signature is heuristic or generic.
The table below lists some Antivirus vendors with links to their official detection naming conventions (if available) and their format using the components above.
|Avast||Platform:Type1-Modifier \[Type2\]||VBS:Downloader-ARK [Trj]|
Trojan horse Crypt8.BHVG
|ESET||[Modifier] Platform/[Type.]Family.Variant Type||a variant of MSIL/TrojanDropper.Agent.BPM trojan|
AV detection names can contain useful information. However, everyone working with them must be aware that often they don't, and sometimes the information in them is wrong. So if you have a scan result listing of a service like Virustotal, how do you know which detection names can be trusted? A rule of thumb:
The more specific detections from different scanning engines exist that are consistent with each other, the more likely the information is correct.
The sections below examine how to differentiate specific from non-specific detections and explain how to interpret certain key words used in detection names.
The inconsistency and shift of meaning for terms used by the antivirus industry makes communication difficult. A well-known example is the use of the term "virus". Initially this was the most common or only malware type that existed. Peter Szor defines a virus as follows:
A virus recursively replicates itself by infecting or replacing other programs or modifying references to these programs to point to the virus code instead. A virus possibly mutates itself with new generations.
(cf. [Szor05, p.27, 36])
Antivirus products were true to their name because they protected systems from viruses. Nowadays, antivirus products mostly protect systems from malware regardless if they are viruses or not. Yet, the term "virus" has become synonymous with "malware" in mainstream media.
The word "trojan" is another example. The "trojan horse" describes a way malware can be delivered to the target system by pretending to be a useful and benign program or even providing useful functionality and tricking the user into execution (cf. [Szor05, p.37]). Thus, it describes a specific infection vector. It is in most cases not an inherent characteristic of a malware family but a way third parties make use of the malware. However, in detection names and in mainstream media the term "trojan" is used synonymously with "malware". We can often see that "trojan" is used as default value for the type component in unspecific detection names. Maybe that is why mainstream media picked up the term to describe any malware.
Apart from these quirks there are also certain key words in the malware family component of detection names which have specific meanings. Umbrella names that include several malware families based on common characteristics will be explained in the second part of this series. However, some keywords in the malware family component don't have anything to do with the characteristics of the malware family itself. They are default values, detection technologies, or describe the malware's protection mechanism that was added by a third party. These are in the table below.
|Kryptik, Krypt, Cryptik, Crypt, Packed||Packed file|
|Obfus||Obfuscated file, mostly used for malicious script files|
|Injector, Inject||Packed file that injects into a process, usually via RunPE|
|AntiXY||Protection mechanism against XY, e.g. AntiAV means the file might incapacitate AV programmes, AntiVM means it might refuse to run in a Virtual Machine|
|FakeXY, XYFake||The file imitates XY, e.g. FakeAdobe imitates an Adobe product. This is often done via third party tools that change the icon and version information of the file|
|Corrupt, Corrupted, Malformed||The file is corrupt.|
|Patched||The file was modified which makes it suspicious.|
|Agent||Default name for unknown or insignificant malware family|
|Razy, Kazy, Zusy, Graftor||Unknown malware family, detected by certain Bitdefender technology|
|WisdomEyes||Unknown malware family, detected by certain Baidu technology|
|Artemis||Unknown malware family, detected by certain McAfee technology|
If you evaluate a sample in Virustotal, you might see an image similar to the one below.
There are at least six scanners that have the exact same detection for this file, including the same variant portion which is usually a vendor specific value. The reason for this is simple: The results stem from one and the same engine -- in this case Bitdefender's engine.
AV products may often incorporate engines of other AV vendors additionally to their own. AV Comparatives has a list of AV vendors and the third party engines that they use. Bitdefender, Kaspersky and Avira appear as third party engines in this list.
That means if you evaluate multiscanner results, and you see that several vendors have the exact same detection name, including the variant portion of the name, it is probably just one and the same scanner. The detection in question should rather count as one detection instead of six (like in the sample above) while considering how valuable and informative the results are. E.g., a sample that has six detections from one and the same engine is more likely a false positive than a sample with six detections of six different engines.
The more specific a detection name is, the more information you can draw from it. Unspecific names are those with default components. Descriptive components and umbrella names are a bit more specific. Actual identification of a malware family is the most specific.
Let's take a look at this packed Ursnif sample.
The detection names that are marked green identify the malware family Ursnif aka Gozi. "Wastenif" used by Microsoft seems to be an alias for Ursnif as well. The blue detection names don't provide the malware family but other information.
All of the other detection names don't provide any information about the malware except the platform (Win32) in some cases.
Packed malware has generally less detections and less specific detection names than their non-packed counterparts. This makes sense because packing is used by threat actors to evade AV detection.
But even if the packed file is fully detected, results for the unpacked file usually provide more accurate information about the malware identity. The scanners on Virustotal don't use the full technology set that is available for the actual antivirus product. They mostly rely on detection signatures that hit on the file without executing it, whereas in-memory scanning technologies like DeepRay aren't involved. Identification for packed files is often not possible or difficult if they aren't executed.
Therefore it is recommended to unpack the malware before scanning to get better results. An example is in the pictures below.
The packed file is detected by only two engines. Both detection names indicate that it is a script file which is packed. The unpacked file in the second image is detected by ten engines and seven of them state it is a downloader written in VBScript. F-Secure claims it is a JScript which will download Teslacrypt ransomware. That means the unpacked file's detection names reveal its actual behavior -- downloading other malware. Small downloaders like this one often don't get their own malware family name.
| Packed Ursnif||Trojan.Agent.DGAT||7a8e75dd59b0c9633c9976bb3f07a53fe5fdbd3330ce8ba90153697edff4fff1|
| Packed VBScript malware||Script.Malware.KryptikPuf.A||c6e7f80c08b456f2786454c59dc0f3138c1c17b070d5f02b6acc814ac740d128|
| Unpacked VBScript malware||Script.Trojan-Downloader.VBS.TIOAOR||f43748608c9e904dd67ea5eb5650c15d0b62a1d3b1f917defef09ef946a0f553|
[Szor05] Peter Szor. The Art of Computer Virus Research and Defense. Addison Wesley Professional, February 2005.