Show me who you call and I tell you what you are - how DGA extraction improves security

09/15/2017
G DATA Blog

Domain Generating Algorithms are not a new thing by any stretch. They are used by malware authors to make their command&control infrastructure more resilient. Security researchers can also use features of those methods to get an edge over the bad guys. Come on in and catch a glimpse into some cutting-edge research performed at G DATA.

Many modern pieces of malware need to contact a command-and-control (C2) server on the web to receive instructions or to offload stolen data. This is true for bot nets as well as banking malware like ZeuS or ZeuS Panda. The addresses of those control servers were often hard-coded into the malware, which means that every installation of the malware contacts the same address. There is an obvious weak point in this strategy, at least from an attacker’s perspective: as soon as a control server or its domain is taken out, or seized by authorities, the attacker has lost control over the infected machines.

What criminals do

Wouldn’t it be nice (for criminals) if there was a way for them not to worry about their control servers being seized anymore? It turns out, there is a way. If nobody knows which server a malware is going to contact, it would be very difficult to shut it down. This is where Domain Generating Algorithms (DGA) come in. In simplified terms, a DGA computes a more or less random internet address. This can happen once a day or even several times in an hour. The generated domains are then contacted by the malware at predetermined time.  

All that a criminal needs to do is register those randomly generated domains, link them to his control infrastructure and he is golden. Of course, the attacker needs to know which domains his creation is going to contact at any given time so he can register the domains beforehand. And since he knows how the algorithm works, the criminal can pre-compute and register hundreds of domains and never worry about the impact of one of them being taken out. Researchers as well as authorities are essentially chasing a hare – they might take down one server, just to find that their target has already moved to the next one. The criminal can rest assured that the infected machines will contact a new address shortly, which is still under his control.  

Linguistics & domain URLs

Internet addresses (also referred to as URLs, Uniform Resource Locators), as they are used every day by most people are usually easy to remember, because those who run a website have an interest to get visitors. For that reason, we have addresses like www.gdata.de instead of www.kfualh85gdgiwe.de. Since a C2 server is never knowingly going to be visited by regular internet users, a DGA has a lot more freedom in composing URLs.
Interestingly, linguistics play a role in analyzing the domains that are generated: a meaningful word that may be part of an internet address is less likely to have certain combinations of letters in them. Every language in the world follows a certain set of rules when it comes to forming meaningful words. According to those rules, a sequence like the above would never form any meaningful world in any language (except Klingon, perhaps). Therefore it is highly unlikely to be legitimate and the process spawning the connection warrants further inspection. This is just one criterion to determine whether or not shady activity is taking place; other methods include mechanisms to detect random word shuffling.

Wouldn’t it be nice (for criminals) if there was a way for them not to worry about their control servers being seized anymore? It turns out, there is a way. If nobody knows which server a malware is going to contact, it would be very difficult to shut it down. This is where Domain Generating Algorithms (DGA) come in. In simplified terms, a DGA computes a more or less random internet address. This can happen once a day or even several times in an hour. The generated domains are then contacted by the malware at predetermined time.  

All that a criminal needs to do is register those randomly generated domains, link them to his control infrastructure and he is golden. Of course, the attacker needs to know which domains his creation is going to contact at any given time so he can register the domains beforehand. And since he knows how the algorithm works, the criminal can pre-compute and register hundreds of domains and never worry about the impact of one of them being taken out. Researchers as well as authorities are essentially chasing a hare – they might take down one server, just to find that their target has already moved to the next one. The criminal can rest assured that the infected machines will contact a new address shortly, which is still under his control.  

What to do with this data

As soon as it has been figured out the exact way in which URLs are calculated, one possible (and also controversial) course of action would be for law enforcement to register the domains so they are unavailable to the attacker. Experts also refer to this process as “sinkholing”. But then, this would tip criminals off and notify them that someone is on their tracks. This usually results in the DGA being modified and the race starts all over again.  

By the way: the failure to pre-register a domain has also contributed to dampening the impact of the WannaCry ransomware/havocware: malware researcher Marcus Hutchins discovered that the domain was not registered and took it upon himself to do so. In the process, he inadvertently tripped WannaCry’s infamous kill switch and became the hero of the day for thousands of businesses all over the world.

Further information

If you are interested in details on how DGAs work as well as how DGA extraction can be used to improve detections, you can read Emanuel Durmaz’s blog article on the blog of G DATA Advanced Analytics.