Hunting with Artificial Intelligence: Detection of malicious domains (II)

This post and the full series has been elaborated jointly with Ana Isabel Prieto, Sergio Villanueva and Luis Búrdalo.


In the previous article we commented on the difficulty faced by Threat Hunting analysts as a result of the high number of domains registered daily by an organization. This makes it difficult to analyze and locate potentially malicious domains, which may go unnoticed among so much traffic. For this reason, in an attempt to facilitate the analyst’s task, the use of alternative techniques based on Machine Learning is proposed. Before presenting the different tests performed, the article introduces the algorithms to be used for the detection of anomalies in the domains.

To begin with, it is necessary to comment that having a large and varied database is fundamental for a model to be able to detect potentially malicious domains reliably, since its parameters are going to be adjusted in an environment that must be similar to the real one.

However, there is great difficulty in identifying patterns in high-dimensional data, and even more difficulty in representing such data graphically and expressing them in a way that highlights their similarities and differences. This is where the need arises to use a powerful data analysis tool such as PCA (Principal Components Analysis).

[Read more…]

Hunting with Artificial Intelligence: Detection of malicious domains (I)

This post and the full series has been elaborated jointly with Ana Isabel Prieto, Sergio Villanueva and Luis Búrdalo.


Internet brings a world of possibilities for personal development and the realization of many of the daily activities, being an indispensable piece in today’s society. On this network there are hundreds of millions of domains to access, although unfortunately not all of them are safe. Malicious domains are those used by cybercriminals to connect to command and control servers, steal credentials through phishing campaigns or distribute malware.

In many cases, these domains share certain lexical characteristics that at first glance may attract attention. For example, in phishing campaigns, domains with TLD xyz, top, space, info, email, among others, are relatively common. Similarly, attackers use DGA (Domain Generation Algorithm) techniques to create random domains to exfiltrate information, such as istgmxdejdnxuyla[.]ru. Other striking properties can be excessive hyphens, multi-level domains or domains that attempt to impersonate legitimate organizations such as amazon.ytjksb[.]com and amazon.getfreegiveaway[.]xyz.

With digitization on the rise, organizations surf to thousands of different domains, making it difficult to detect malicious domains among so much legitimate traffic. In a medium-sized organization, between 3,000 and 5,000 domains of traffic are logged daily. This volume makes it unfeasible to analyze them manually. Traditionally, part of this detection process is automated using pattern search rules, for example, rules to find domains with TLDs (Top Level Domain) used in phishing campaigns, containing the name of large companies that are not legitimate or have more than X characters.

[Read more…]