MALWARE USES MULTIPLE WEB SERVERS TO HAVE A PERIODIC HTTP C&C CONNECTION WHILE ITS NETFLOWS ARE NOT PERIODIC

This blog post was originally published on 10 November 2014, by Sebastian Garcia, at https://mcfp.weebly.com/analysis/archives/11-2014.

While analyzing our capture CTU-Malware-Capture-Botnet-89-1 we found out that there were some strange issues with the periodicity of the C&C channels. In this capture there were a lot of HTTP connections, but few of them were periodic. During the analysis of the network capture we usually start looking at the NetFlows and then we move to the payload data. What we found is that several periodic HTTP connections had non-periodic NetFlows. This was strange for us so we took a deeper look. 

The traffic of this malware looks something like this in our monitoring server:

 Some hours of traffic in the CTU-89-1 capture

 

Some hours of traffic in the CTU-89-1 capture

We first converted the pcap file to a web log file (using justsniffer) to see the HTTP requests better. An example of the requests are:

TimeStamp Method URL
1339.609 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY
&version=4.0.0.72&p2p=0&http=0&ts=0&up=0
1632.742 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY
&version=4.0.0.72&p2p=0&http=0&ts=0&up=0
1933.323 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY
&version=4.0.0.72&p2p=0&http=0&ts=0&up=0

To find out the periodicity of these requests we just compute the difference between timestamps and we print it in the first column. These differences were around 300 seconds, i.e. 5 minutes:
 

293.133 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY &version=4.0.0.72&p2p=0&http=0&ts=0&up=0 
300.581 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY &version=4.0.0.72&p2p=0&http=0&ts=0&up=0 
293.941 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY &version=4.0.0.72&p2p=0&http=0&ts=0&up=0

This confirmed that these HTTP requests were periodic, but what about their NetFlows? To find out the NetFlows we extracted the IP addresses used in these requests and we sorted them by amount of requests. The results are:

Amount, IP

  • 16, 202.108.14.236
  • 18, 202.108.14.235
  • 18, 220.181.184.199
  • 18, 220.181.184.75
  • 19, 220.181.184.74
  • 20, 111.206.22.76
  • 21, 202.108.14.19
  • 25, 111.206.22.77
  • 25, 220.181.109.16
  • 26, 202.108.14.221
  • 29, 220.181.184.166
  • 30, 220.181.109.15
  • 31, 202.108.14.219

This was the first interesting part, since the same URL was being requested alternatively to different IP addresses. Once that the IP addresses were extracted, we generated their 4-tuples and see their periodicity (described in the HackLu 2014 presentation). To get the 4-tuples we first convert the pcap file to a bidirectional Argus file:

argus -F argus.conf -r 2014-09-15_capture-win2.pcap -w 2014-09-15_capture-win2.biargus

Then we extract the NetFlows (the ra.conf file has specific fields):

ra -r 2014-09-15_capture-win2.biargus -n -Z b -F ra.conf > 2014-09-15_capture-win2.binetflow

And then we use our CCDetector.py (to be released soon) program that implements the state-based behavioral model (also described in the HackLu 2014 presentation). 

CCDetector.py -f 2014-09-15_capture-win2.binetflow -P oneline > 2014-09-15_capture-win2.3model

From this .3model file we can see the characteristics of the 4-tuples related with the IP addressess:

  • 10.0.2.102-202.108.14.221-80-tcp State:220s0ssss0ssss0ssss0ss0ssssssst0s
  • 10.0.2.102-220.181.184.74-80-tcp State:120ss0s0sss0s0s0s0s0s0s0ss0s0ss0s
  • 10.0.2.102-220.181.109.158-80-tcp State:110r
  • 10.0.2.102-220.181.109.16-80-tcp State:22sssss0s0s0ssb0ss0sss0s0sssss0ss
  • 10.0.2.102-220.181.184.199-80-tcp State:220s0s0s0s0ss0s0s0s0s0ss0s0sss
  • 10.0.2.102-220.181.184.166-80-tcp State:220s0sssbbss0sss0ss0ss0ss0ssssssssss
  • 10.0.2.102-220.181.109.15-80-tcp State:220ssbs0ss0s0ss0sssssssssbB0ss0s0ssss0s0s
  • 10.0.2.102-202.108.14.236-80-tcp State:22ssss0s0s0sss0sssss
  • 10.0.2.102-220.181.109.159-80-tcp State:110r
  • 10.0.2.102-202.108.14.219-80-tcp State:22sss0sssssss0ssss0s0ss0ssssss0ssbsb0s
  • 10.0.2.102-202.108.14.19-80-tcp State:22ssss0s0st0ss0sst0s0ss0s0s0ss
  • 10.0.2.102-111.206.22.77-80-tcp State:22Bbsbssssss0s0sss0ssB0ss0ss0ss
  • 10.0.2.102-220.181.184.75-80-tcp State:220ts0sss0ssss0s0ssss0ss
  • 10.0.2.102-111.206.22.76-80-tcp State:22ssts0s0ss0ss0sss0s0ss0s0ss
  • 10.0.2.102-202.108.14.235-80-tcp State:23b0sss0s0ssss0s0sss0s0s0s


In our state-based behavioral model the letters for periodic flows are 'a' to 'f' and 'A' to 'F'. Considering that the letters in these previous states were mostly 's' and '0', we conclude that there are NO periodic flows in these connections. However, we know that the HTTP requests are periodic. So what happened? 

To confirm that these previous 4-tuples are not periodic we can 'open' the 4-tuple and see the flow by flow analysis. This is the information for the first 4-tuple:

        1970-01-01 02:37:09.810596      T1=-1  T2=-1  TD=   0.0
        1970-01-01 02:42:09.781309      T1=-1  T2=299.970713  TD=   0.0
        1970-01-01 05:27:11.154282      T1=299.970713  T2=9901.372973  TD=9601.4
        1970-01-01 07:17:11.504023      T1=9901.372973  T2=6600.349741  TD=-3301.0
        1970-01-01 08:07:12.292119      T1=6600.349741  T2=3000.788096  TD=-3599.6
        1970-01-01 08:37:12.330020      T1=3000.788096  T2=1800.037901  TD=-1200.8
         (...)

Here T2 is the time difference between the current flow and the previous one, and T1 is the time difference between the previous flow and two flows ago. The values shown mean that the times of these requests were 299s, 9901s, 6600s, 3000s, etc, which are not periodic. So we confirm that the flows for the IP 202.108.14.221 were not periodic.

The answers to this problem is that the bot was sending HTTP requests to a specific URL, but the IP addresses assigned to the web server keep changing in some sort of load balancing schema. This is very common in normal applications, but in this case the malware is using a complex load balancing to have a periodic C&C HTTP connection. 

IMPLICATIONS

The implications of this load balancing schema are that:

  • When researchers analyze network traffic, we tend to consider each connection separately. If the detection method is using NetFlows, it is most probably going to miss this periodicity.
  • If the web log analysis is using the IP address of the web server as an index, then it could be possible that the researcher will miss the connections to the rest of the IP addresses. 
  • Finally, we think that the owner of the malware is not aware of this complications because the load balancing seems to be designed to give more resilience to the botnet and not to hide the network patterns.