Exploiting APT data for fun and (no) profit (I): acquisition and processing

Post of the serie “Exploiting APT data for fun and (no) profit”:
=> I: acquisition and processing
=> II: simple analysis
=> III: not so simple analysis
=> IV: conclusions

When attending to talks about APT -or when giving them- sometimes you hear sentences like “most threat actors are focused on information theft” or “Russia is one of the most active actors in APT landscape”. But, where do all those sentences come from? We have spent a whole night exploiting APT data for fun and (no) profit, in order to provide you with some curiosities, facts, data… you can use from now in your APT talks!! :)

Since 2019 the folks at ThaiCERT publish the free PDF book “Threat Group Cards: A Threat Actor Encyclopedia” and they have an online portal (https://apt.thaicert.or.th/cgi-bin/aptgroups.cgi) with all the information regarding APT groups acquired from public sources. In this portal, apart from browsing threat groups and their tools, they present some statistics about threat groups activities (source countries, target countries and sectors, most used tools…). Most of these threat groups are considered APT (at the time of this writing, 250 out of 329, with last database change done 20 October 2020).

But what happens when you need specific statistics or correlations? You can download a JSON file and exploit it yourself:

$ curl -o out.json https://apt.thaicert.or.th/cgi-bin/getmisp.cgi?o=g 

But JSON is a modern thing and is hard to handle with awk, one of the Tools from the Gods ; so we also download JSON.sh to convert it to a pipeable format:

$ curl -o JSON.sh https://raw.githubusercontent.com/dominictarr/JSON.sh/master/JSON.sh
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
100 4809 100 4809 0 0 15512 0 --:--:-- --:--:-- --:--:-- 15512
$ chmod +x JSON.sh
$

Now, we parse the JSON file with JSON.sh:

$ cat out.json |./JSON.sh -l > work.txt

Et voilà, we have a file to feel comfortable with. But to feel more comfortable, we split the file into many files, one for each threat actor identified by ThaiCERT (in our main file, by the “values” key):

$ n=`awk -F, 'index($1,"values")>0 {print $2}' work.txt |grep -v value| sort -n|uniq|tail -1` export n
$ for i in $(seq 1 $n);do grep "values\",$i," work.txt >$i.txt;done
$

Please, don’t blame about the efficiency of this one-liner; it will be executed only once. While you are reading this line, now we have one single text file for each threat actor:

$ ls [0-9]*.txt |wc -l
327
$

Each of the text files is composed by entries of the form “[key] value”; just an example:

$ cat 98.txt
["values",98,"value"] "DustSquad, Golden Falcon"
["values",98,"description"] "(Kaspersky) For the last two years we have been monitoring a Russian-language cyberespionage
actor that focuses on Central Asian users and diplomatic entities. We named the actor DustSquad and have provided private
intelligence reports to our customers on four of their campaigns involving custom Android and Windows malware. In this
blogpost we cover a malicious program for Windows called Octopus that mostly targets diplomatic entities.\n\nThe name
was originally coined by ESET in 2017 after the 0ct0pus3.php script used by the actor on their old C2 servers. We also
started monitoring the malware and, using Kaspersky Attribution Engine based on similarity algorithms, discovered that
Octopus is related to DustSquad, something we reported in April 2018. In our telemetry we tracked this campaign back to
2014 in the former Soviet republics of Central Asia (still mostly Russian-speaking), plus Afghanistan."
["values",98,"meta","synonyms",0] "DustSquad"
["values",98,"meta","synonyms",1] "Golden Falcon"
["values",98,"meta","synonyms",2] "APT-C-34"
["values",98,"meta","synonyms",3] "Nomadic Octopus"
["values",98,"meta","attribution-confidence"] "50"
["values",98,"meta","country"] "RU"
["values",98,"meta","motivation",0] "Information theft and espionage"
["values",98,"meta","date"] "2014"
["values",98,"meta","cfr-target-category",0] "Defense"
["values",98,"meta","cfr-target-category",1] "Government"
["values",98,"meta","cfr-target-category",2] "Media"
["values",98,"meta","cfr-suspected-victims",0] "Afghanistan"
["values",98,"meta","cfr-suspected-victims",1] "Kazakhstan"
["values",98,"meta","refs",0] "https://apt.thaicert.or.th/cgi-bin/showcard.cgi?u=982ea477-0c28-490e-87d6-3f43da257cae"
["values",98,"meta","refs",1] "https://securelist.com/octopus-infested-seas-of-central-asia/88200/"
["values",98,"meta","refs",2] "https://www.zdnet.com/article/extensive-hacking-operation-discovered-in-kazakhstan/"
["values",98,"related",0,"dest-uuid"] "e74394ee-e4ab-4642-aca4-fa84d0dcabbf"
["values",98,"related",0,"tags",0] "estimative-language:likelihood-probability=\"almost-certain\""
["values",98,"related",0,"type"] "uses"
["values",98,"related",1,"dest-uuid"] "3d3bf55f-402e-4122-a52b-196aed8e6507"
["values",98,"related",1,"tags",0] "estimative-language:likelihood-probability=\"almost-certain\""
["values",98,"related",1,"type"] "uses"
["values",98,"related",2,"dest-uuid"] "7ff6da6a-d13a-42db-91ac-ac6c3915f3d0"
["values",98,"related",2,"tags",0] "estimative-language:likelihood-probability=\"almost-certain\""
["values",98,"related",2,"type"] "uses"
["values",98,"uuid"] “982ea477-0c28-490e-87d6-3f43da257cae”
$

Now everything is ready to start parsing the files and getting results. Let’s go!

See also in: