How To: Use Thug Honeyclient to Investigate a Malicious Website

One of my favorite ways to quickly and safely investigate a potentially malicious website is the Thug low-interaction honeyclient project. It is a python script that will visit a site and present itself as an exploitable browser (or any of several built in user-agents) but instead of get compromised, it displays and saves all files that were thrown at it and makes a graph of all the interactions for you.

About Thug:

Thug has a wide range of options that can be exercised in order to present the exact personality you'd like to a malicious website. According to the documentation:

Currently 8 Internet Explorer (Windows XP, Windows 2000, Windows 7), 15 Chrome (Windows XP, Windows 7, MacOS X, Android 4.0.3, Android 4.0.4, Android 4.1.2, Linux, iOS 7.1, iOS 7.1.1, iOS 7.1.2, iOS 8.0.2, iOS 8.1.1), 3 Firefox (Windows XP, Windows 7, Linux) and 5 Safari (Windows XP, Windows 7, MacOS X, iOS 7.0.4, iOS 8.0.2) personalities are emulated and about 90 vulnerability modules (ActiveX controls, core browser functionalities, browser plugins) are provided.

These can be used with ease by just specifying the right -u [useragent] switch on the command line. Thug can also emulate shellcode, Adobe Reader, Shockwave Flash, use the HoneyAgent Java sandbox, and submit URLs and samples to VirusTotal with optional configurations.

Installing:

There are a lot of dependencies so if you want to experiment with Thug quickly, I recommend you use either a Docker container, or my favorite, the excellent REMnux Linux distro for malware analysis which has recently reached version 6 (thanks to Lenny Zeltser for the awesome resource and great SANS class on malware RE). REMnux is set up and ready to go along with all the other tools you might want to use afterwards.

Finding Sources:

First you need to find a site you think is serving malicious content or is hosting a redirect to an exploit kit. The easiest way I know to find active exploit kits and infected sites is looking at places that list current spam and suspected malicious links, some of my favorites are:

Find a recent article or submission that seems like something Thug would pick up on and run with it.

For this example, I'm going to use the current top hit from ThreatGlass - "mp3li.net", which, according to the their analysis, leads to downloading a binary file.

Usage:

Now it's almost time for the fun part - pointing Thug at your malicious link and seeing what happens. One quick thing I found that might need changing to prevent errors during running, go into src/Logging/logging.conf and turn hpFeeds enabled from True to False.

Ok time to start, the most basic usage is just thug.py -FZM "[url]", the FZM is to use File, MAEC 1.1, and JSON logging output, which is the default I use for a one off analysis to make sure everything is recorded. If you start using Thug a lot, it can support mongodb output, but that's overkill for just doing one sample, if you don't use "FZM" nothing will be saved, you will just see the normal thug output.

Here is what it looks like when you run Thug.

Thug Output Demo

When analysis is finished, it saves the output into a folder in /var/log/thug/logs/[md5 of site name input]/[date time]/, or you can use the -n switch to specify an output folder. Note: The shellcode.py errors are ok and are just because of the imperfect emulation of the DOM, the author of Thug says they do not stop the rest of the analysis.

Folder Structure

As you can see below, after Thug completes, the log folder will hold a collection of the analysis output, sorted by type, in the analysis/ folder, as well as a folder for every type of file downloaded.

$ ls -Rl
.:
total 16
drwxrwxr-x 4 remnux remnux 4096 Jul 26 16:01 analysis
drwxrwxr-x 6 remnux remnux 4096 Jul 26 16:01 application
drwxrwxr-x 3 remnux remnux 4096 Jul 26 16:01 image
drwxrwxr-x 5 remnux remnux 4096 Jul 26 16:01 text

./analysis:
total 28
-rw-rw-r-- 1 remnux remnux 17154 Jul 26 16:01 graph.svg
drwxrwxr-x 2 remnux remnux  4096 Jul 26 16:01 json
drwxrwxr-x 2 remnux remnux  4096 Jul 26 16:01 maec11

./analysis/json:
total 1024
-rw-rw-r-- 1 remnux remnux 1048093 Jul 26 16:01 analysis.json

./analysis/maec11:
total 20
-rw-rw-r-- 1 remnux remnux 19123 Jul 26 16:01 analysis.xml

./application:
total 16
drwxrwxr-x 2 remnux remnux 4096 Jul 26 16:01 javascript
drwxrwxr-x 2 remnux remnux 4096 Jul 26 16:01 javascript; charset=utf-8
drwxrwxr-x 2 remnux remnux 4096 Jul 26 16:01 octet-stream
drwxrwxr-x 2 remnux remnux 4096 Jul 26 16:01 x-javascript; charset=utf-8

./application/javascript:
total 80
-rw-rw-r-- 1 remnux remnux 19933 Jul 26 16:01 5605d117e30801fd99712289f8d3ecf4
-rw-rw-r-- 1 remnux remnux 18732 Jul 26 16:01 6ec7a9c81a048c66149d251dd5e5828d
-rw-rw-r-- 1 remnux remnux 26330 Jul 26 16:01 a171cd8dabc6c3da2fd0bf37a99e6546
-rw-rw-r-- 1 remnux remnux  4687 Jul 26 16:01 d9a22e0e560ddc12cad2ab5f9a70c0d2
-rw-rw-r-- 1 remnux remnux  1865 Jul 26 16:01 f0362aa842d15762300b7a26af2c519b

./application/javascript; charset=utf-8:
total 40
-rw-rw-r-- 1 remnux remnux 37313 Jul 26 16:01 70dea909d5df37a8a8d7dc0ca86b9627

./application/octet-stream:
total 372
-rw-rw-r-- 1 remnux remnux 377976 Jul 26 16:01 ee61d1b67ba6e91641e25c88dcda4489

./application/x-javascript; charset=utf-8:
total 168
-rw-rw-r-- 1 remnux remnux 170707 Jul 26 16:01 40ba55d77b5e0addfbe621948858a82a

./image:
total 4
drwxrwxr-x 2 remnux remnux 4096 Jul 26 16:01 png

./image/png:
total 8
-rw-rw-r-- 1 remnux remnux 4624 Jul 26 16:01 36c176df13dbf37d86db736038e3c146

./text:
total 12
drwxrwxr-x 2 remnux remnux 4096 Jul 26 16:01 css
drwxrwxr-x 2 remnux remnux 4096 Jul 26 16:01 html
drwxrwxr-x 2 remnux remnux 4096 Jul 26 16:01 javascript; charset=UTF-8

./text/css:
total 16
-rw-rw-r-- 1 remnux remnux 15537 Jul 26 16:01 6bcb811da7755283789057d3ab8bad8d

./text/html:
total 48
-rw-rw-r-- 1 remnux remnux   564 Jul 26 16:01 8e325dc2fea7c8900fc6c4b8c6c394fe
-rw-rw-r-- 1 remnux remnux 44126 Jul 26 16:01 93475d26575f858699a9e9c235d422ae

./text/javascript; charset=UTF-8:
total 196
-rw-rw-r-- 1 remnux remnux  94020 Jul 26 16:01 25721ced154b3a99e818431446d7506d
-rw-rw-r-- 1 remnux remnux 104346 Jul 26 16:01 37426efd0b6784736946bc99aa62af13

In the case of this site, there was an application, text, and image folder. Going into the application folder shows there are several sub-folders, most of which are javascript, and one called "octet-stream". Inside that is one file, analyzing it shows:

 $ file ee61d1b67ba6e91641e25c88dcda4489 
 ee61d1b67ba6e91641e25c88dcda4489: PE32 executable (GUI) Intel 80386, for MS Windows

This file, not surprisingly is listed on VirusTotal as adware/malware known as "MediaDrug".

Graph Output from Thug

This is the image created by Thug in /analysis/graph.svg (open it with firefox). The like MediaDrug download interaction can be seen at the very bottom.
Thug Graph

The download location can be confirmed by grepping the analysis/json/analysis.json file for the md5 which shows that this was indeed where the binary came from.

{
            "timestamp": "2015-07-26 16:01:28.127578", 
            "cve": "None", 
            "description": "[HTTP] URL: http://setup.mediadrug.com/partnership.main.redirect/?advert_key=ZWMwMDAzMDAwYjAwMDI4ODAwMDAwMjkwMDAwMjkwMDAwMjkwMzYxZmQxYTlhMg==&name= (Content-type: application/octet-stream, MD5: ee61d1b67ba6e91641e25c88dcda4489)", 
            "method": "Dynamic Analysis"
    },

So there you have it, a way to collect everything from a likely malicious website and have it categorized, hashed and graphed for you automatically. Happy malware hunting!