This post and the full series has been elaborated jointly with Ana Isabel Prieto, Sergio Villanueva and Luis Búrdalo.
In previous articles of this series (see part I and part II) we described the problem of detecting malicious domains and proposed a way to address this problem by combining various statistical and Machine Learning techniques and algorithms.
The set of variables from which these domains will be characterized for their subsequent analysis by the aforementioned Machine Learning algorithms was also described. In this last installment, the experiments carried out and the results obtained are described.
The tests have been carried out against a total of 78,661 domains extracted from the a priori legitimate traffic of an organization, from which 45 lexical features belonging to the categories described above have been calculated.
[Read more…]