Analysis of Linux.Haikai: inside the source code

A few days ago we got the source code of the Haikai malware, which corresponds to one of the many implementations carried out by the continuous recycling of source code belonging to different IoT botnets. Although we have not identified any new developments compared to previous IoT malware versions, it has allowed us to obtain a lot of information on techniques, improvements and authors.

It should also be noted that, according to different records obtained, this botnet has been in operation for most of the last month of June.

In the following lines the code will be analyzed, as well as the possible attributions and the implementations not referenced in the execution thread, which allow us to guess that the code is mutating in different lines in parallel for the same function.

So let’s start by analyzing the structure of the files.

Files in the Haikai source code

Files in Mirai source code

As we can see, all the files (with the exception of g.c), have the same nomenclature as the Mirai malware. In spite of this, most of the code that is called in the execution thread is in g.c; and other very interesting features incorporated by Mirai in 2016 (like those included in the scanner.c file), are ignored by Haikai.

As for the main of the code, it is in the g.c. file. It can be noted that the first action it performs is to check the Python binary to use one length, and another one in the assignment of the name of the process, a feature not present in the Mirai source code.

As soon as you carry out this access, check if the list of servers is less than or equal to 0 through the variable SERVER_LIST_SIZE, defined above with the following instructions.

#define SERVER_LIST_SIZE (sizeof(commServer) / sizeof(unsigned char *))
unsigned char *commServer[] = {"185.47.62[.]197"};

Next, the debug string is executed, and then the functions table_init() y killer_init()are called up.:

\x1b[31m[Hakai] \x1b[32mConnected\x1b[0m\r\n

Both functions are present in the Mirai source code and incorporate new features that we will analyze later; however, we are going to stop in the analysis of the StartTheLelz () function.

As we can see in the image below, this function is imported from the BashLite malware, one of the first IoT botnets known to make use of shellshock-type vulnerabilities. However, this new implementation incorporates a number of new features.

BashLite source code

While the BashLite function has less than 300 lines of code, the one implemented in Haikai is around 600.

In both cases, the telnet response management is carried out through a switch-case structure. In the case of Haikai, through the commented code and the disorder in new cases, they show a different author behind it.

The implementation of case 70, which performs an anti-honeypots check by verifying the architecture strings in the detectbox character list.

char *detectbox[] = {"powerpc","mips", "arc", "x86_64", "armv7l", "armv6l", "armv5l", "armv4l", "sh4", "mipsel", "arm4", "arm5", "arm6", "ARMv6", "ARMv7", "amd64", 0};

In the following cases, the ideal method for raising the sample is analyzed:

Finally, it manages the ARMv7, ARMv4 and ARC architectures in a special way, although this type of management is also discussed on the MIPS and MPSL architectures.

This management consists of executing, through the echo command, the hexadecimal of a binary adjusted to the architecture, according to lists defined at the beginning of the code. These lines are the following:

char *a4[] = {"armv4l", "arm4", "ARMv5", "armv5l", 0};
char *a7[] = {"armv7l", "arm7", "ARMv7", "ARMv6", "armv6l",  0};
char *mipsel[] = {"mipsel", 0};
char *ArCc[] = {"arc", 0};

If we reconstruct the binary, we obtain a sample whose only function is to carry out the download through an HTTP request of the source code that we are analyzing, but compiled for its architecture:

This technique was first implemented by Hajime malware.

As for the other architectures, a bash script is downloaded, which, will presumably download the binaries of different architectures, a technique inherited from the Gafgyt malware.

On the other hand, we have found that the number of combinations used in the search for potential victims is less than the number entered in Mirai (only nine combinations). In addition, these credentials are not implemented through the scanner_init () function, but are defined as global variables at the beginning of the code, although they have also been introduced in this function.

char *usernames[] = {"root\0", "root\0", "admin\0", "root\0", "adm\0", "default\0", "default\0", "root\0", "root\0"};
char *passwords[] = {"root\0", "admin\0", "admin\0", "vizxv\0", "\0", " OxhlwSG8\0", "S2fGqNFs\0", "zlxx.\0", "LsiuY7pOmZG2s\0"};

As for communication with the Command & Control server, it was found to be governed by the following commands:

  • QTELNET → Activates the botnet
  • OFF → For scanning the network in search of vulnerable devices
  • ON → Enables network scanning for vulnerable devices

Interestingly enough, network scanning is carried out with the StartTheLelz () function, already discussed above, when the scanner.c file and, therefore, the scanner_init () function, is implemented in the code, but is never referenced.

As for the instructions for the execution of denial of service attacks, they are as follows:

  • H → Executes attacks via HTTP
  • U → Executes attacks via UDP
  • S → Executes attacks via STD
  • T → Executes attacks via TCP
  • KT → Kills malware-related processes

None of these attacks is new, as we have seen and analyzed them in previous articles, so we will only focus on the implementation of User-Agents in the HTTP attack, which allows the addition of a large number of them with great ease.

As for the table_init () function, implemented in the table.c file, it allows the encoding of strings in the malware and is directly imported from Mirai.
The comments in the code allow us to obtain the raw key information and, in this way, deepen the attribution of the sample.

First, we obtain the C2 domain which is hakaiboatnet [.]pw and in which, at the time it was in operation, we had the following display:

And if you followed the steps indicated on the web, the result was the following:

To round off the “trolley”, once Ankit Anubhav (@ankit_anubhav), a well-known IoT malware researcher, echoed the fact in his Twitter account, the domain included a display of himself on the web portal.

Other hard-coded information is the safe string, where in Mirai we found the link to the well-known Rick Astley video clip “Never Gonna Give You Up”, in this sample we have: “gosh that chinese family at the other table sure ate alot”.

We already reported the existence of this chain in the report “Mirai Year One: Evolution and Adaptation of a botnet”, and attributed it to the Lizard Squad group, authors of the MASUTA, MEMES and FREEPEIN campaigns of Mirai, in addition to the incorporation of the hexadecimal code injection of the downloader and UPX packaging.
We have seen all these developments in this code.

Another point to note is the large amount of code not referenced in the main thread that allows us to obtain information, perhaps not from this particular botnet, but from the evolution of the code shared by the community.

First of all, it is interesting to highlight the implementation of new addresses to be avoided by malware through get_random_ip (). While Mirai avoided a small number of networks, Haikai includes the networks belonging to the following subnets:

Amazon + Microsoft
Blazingfast & Nforce
Choopa & Vultr
Department of Defense
Department of the Navy, Space and Naval Warfare System Command, Washington DC - SPAWAR
Digital Ocean
FBI controlled Linux servers & IPs/IP-Ranges
General Electric Company
Hewlett-Packard Company
IANA NAT reserved
IANA Special use
Internal network
Invalid address space
Ministry of Education Computer Science
NASA Kennedy Space Center
Naval Air Systems Command, VA
Some more
Total Server Solutions
U.S. Department of State
US Postal Service

Capture of the get_random_ip () function from Mirai

Screenshots of the Haikai scanner.c file

Finally, we found evidence that the most interesting part of the code had been deleted, and was linked to the infection of devices through exploits, a feature of the newest versions of the IoT malware, such as IoTReaper, Okiru or Omni.

Note that a domain used by Okiru is network [.] Bigbotpein [.]com, also used by the Mirai FREEPEIN variant.

Haikai exploit.h file code captures

Code captures from Haikai feg.h file

None of the referenced functions are implemented in the malware execution thread.

After analyzing the source code, we can conclude that the code has a complex origin, detecting techniques of up to five different IoT malwares (BashLite, Gafgyt, Mirai, Hajime and IoTReaper). However, strings and domains allow us to assign a large part of the code to the Lizard Squad group (or at least to some of its members).

On the other hand, it is also important to highlight the great concept of community that those dedicated to the exploitation of IoT botnets have, where, with the exception of exploits that allow the anticipation in such a competitive sector, they have no qualms about sharing those improvements implemented in the code.

See also in: