Taking apart office automation documents with OfficeMalScanner

One of the main routes of malware infection is through office automation documents. They represent a very potent vector of infection, specially in directed attacks and phishing campaigns.

These documents are crafted to carry hidden macros, OLE objects, executables, etc., which, once the user opens the document, conduct a series of malicious actions to obtain information with the idea of profiting from it or simply damaging the system. Generally, this type of generic malware downloads other malware for the Internet (droppers), exploits system vulnerabilities, duplicates itself to assure its lifespan in the system, exfilters user information, etc.

A very useful tool for analyzing and detecting anomalous patterns in office automation documents is the “OfficeMalScanner” suite, which you can download from the author’s web, http://www.reconstructer.org/.

The suite is made up of the following tools:

  • OfficeMalScanner: Analyzes “Microsoft Office” documents (doc, xls, ppt) looking for embedded files, OLE objects, shellcodes, VBA macros. It also has a function capable of deciphering simple obfuscation methods like ROR and XOR.
  • RTFScan: Scans RTF files and extracts embedded objects that can then be analyzed by “OfficeMalScanner”.
  • MalHost-Setup: It is a tool capable of extracting a document’s shellcode and packaging it in a PE to make it easier to analyze with a debugger.
  • DisView: Is a code viewer assembler that shows the de-assembled object in the indicated offset. Useful for localizing the shellcode detected by the above-mentioned tools.

Let’s analyze several malicious documents with these tools to show how they work and the results we can obtain.

The first is a “Microsoft Office Word” document with malicious macros. Let’s extract the macros and the embedded OLE objects using “OfficeMalScanner” and the info parameter.

You can see how it launches an alert warning you that there are macros in the document. It stores the extracted macros in the specified directory. This allows us to see them with any text editor without having to open the malicious document and then the “Visual Basic” editor to analyze the macro. If we open the extracted macro we can see it contains the source code.

The next file is a “Microsoft Office PowerPoint” document. This document has no macros but it does have embedded objects. We can see this using “OfficeMalScanner” again and the scan, debug and brute parameters. With these parameters, the tool will scan the document searching for embedded OLE and PE objects and, if they are encrypted with simple techniques like XOR, ROR, ROL, ADD or SUB, it will try to find the key to decipher them and extract them for analysis.

The analysis execution result summary is then displayed. “MalOfficeScanner” has found suspicious patterns based on the rules of its heuristic search.

FS:[30h] signature found at offset: 0x506e 
API-Hashing signature found at offset: 0x52fb 
PUSH DWORD[]/CALL[] signature found at offset: 0x50ab 
PUSH DWORD[]/CALL[] signature found at offset: 0x5137 
PUSH DWORD[]/CALL[] signature found at offset: 0x518a 
PUSH DWORD[]/CALL[] signature found at offset: 0x51c5 
PUSH DWORD[]/CALL[] signature found at offset: 0x51d6 
PUSH DWORD[]/CALL[] signature found at offset: 0x5250 
PUSH DWORD[]/CALL[] signature found at offset: 0x528b 
PUSH DWORD[]/CALL[] signature found at offset: 0x52bb 
PUSH DWORD[]/CALL[] signature found at offset: 0x52c1 
PUSH DWORD[]/CALL[] signature found at offset: 0x52cd 

Then it tries to decipher by brute force the encrypted blocks identified in the scan. In the next report we can see that it has found 1 OLE object and 3 embedded PEs. Once it has found the deciphering key, it exports them. They can now carry out the corresponding analyses on the decrypted objects.

+++++ decryption loop detected at offset: 0x00005190 +++++
+++++ decryption loop detected at offset: 0x00005192 +++++
+++++ decryption loop detected at offset: 0x00005256 +++++
+++++ decryption loop detected at offset: 0x00005258 +++++ 
Brute-forcing for encrypted PE- and embedded OLE-files now... 

XOR encrypted embedded OLE signature found at offset: 0x10b00 - encryption KEY: 0x85
 Dumping Memory to disk as filename: embebidos__EMBEDDED_OLE__OFFSET=0x10b00__XOR-KEY=0x85.bin
 [ OLE File (after decryption) - 256 bytes ]
 d0 cf 11 e0 a1 b1 1a e1  00 00 00 00 00 00 00 00  | ................ 
XOR encrypted MZ/PE signature found at offset: 0x5b00 - encryption KEY: 0x85
 Dumping Memory to disk as filename: embebidos__PEFILE__OFFSET=0x5b00__XOR-KEY=0x85.bin
 [ PE-File (after decryption) - 256 bytes ]
 4d 5a 90 00 03 00 00 00  04 00 00 00 ff ff 00 00  | MZ.............. 
XOR encrypted MZ/PE signature found at offset: 0x26700 - encryption KEY: 0x85 
Dumping Memory to disk as filename: embebidos__PEFILE__OFFSET=0x26700__XOR-KEY=0x85.bin 
[ PE-File (after decryption) - 256 bytes ] 
4d 5a 90 00 03 00 00 00  04 00 00 00 ff ff 00 00  | MZ.............. 
XOR encrypted MZ/PE signature found at offset: 0x2e8fc - encryption KEY: 0x85 
Dumping Memory to disk as filename: embebidos__PEFILE__OFFSET=0x2e8fc__XOR-KEY=0x85.bin 
[ PE-File (after decryption) - 256 bytes ] 
4d 5a 90 00 03 00 00 00  04 00 00 00 ff ff 00 00  | MZ.............. 

The process for RTF is similar, but we will have to use “RTFScan” instead of “OfficeMalScanner”.

If any shellcodes are detected in these objects we can extract them with “MalHost-Setup”. You only have to tell it the offset where the shellcode starts and it will wrap it up in a new PE. Then we can analyze it in a debugger more easily, as analyzing the whole process of your office automation software and the malicious documents could become an arduous task. This system presents a problem: it is not always possible to do the explotation outside of the the office automation software environment. To do so, we will need to use another type of technique.

This suite is really useful and gives us a lot of information when we are facing suspicious documents. Recently, the forensic analyst Didier Stevens extracted the anomaly search strings this suite uses and generated a set of Yara rules for identifying malicious documents quickly (yara rules).

In future blog posts we will dig further into document analysis.