This blog post was originally published on 26 March 2015, by Sebastian Garcia, at https://stratosphereips.org/differences-on-the-behavioral-patterns-of-malware-and-normal-dns-connections.html.
This blog post is a comparison and analysis of the differences in the behavioral patterns found in the DNS traffic of malware and normal connections. We captured malware and normal traffic in the MCFP project and we extracted the DNS behavior with the stf tool. The captures correspond to DNStraffic of a SPAM malware, DGA-based malware and a normal computer. The idea is to analyze the differences in the behaviors as they are shown by the stf program. For an explanation of how the stfprogram is generating this data see this explanation.
SPAM-related DNS Capture
The first capture was generated by executing a variant of the Pushdo Malware (MD59926B031C7E7DCD2A35786AA78534BE8). The capture can be downloaded from the MCFP folder CTU-Malware-Capture-Botnet-35-1.
The connection that is requesting the DNS data in this capture is identified by the stf program as 10.0.2.107-184.108.40.206-53-udp and its behavioral chain of states starts as follows:
11*R,R.r.R.r.r.r.R.r.r.r.r.v.v.B.B.r.A.r.A.R.R.r.R.a.R.r.a.r.R.a.R.r.r.r.r.r.A.R.r.r.R.R.A.A.R.r.r.r.R.R.R.r.r.r.A.A.A.A.r.r.r.A.A.A.A.B.v.R.R.a.A.R.R.A.R.r.r.R.R.R.R.R.R.R.A.A.r.r.a.R.r.R.R.R.R.r.r.R.R.R.A.A.r.r.r. a.a.R.R.r.r.r.s.s.B.A.R.R.R.r.r.r.R.R.R.R.R.R.r.r.r.r.r.r.r.r.R.R.r.r.R.R.A.R.r.A.A.r.r.R.R.R.R.r.r.r. r.A.r.R.R.r.r.r.A.R.R.R.R.r.r.r.A.A.u.U.r.A.R.R.r.r.s.A.r.r.A.r.r.r.r.R.R.r.R.R.r.r.r.r.a.r.r.A.R.R.R.r.R.R.R.R.A.r.R.r.r.R.r.r.r.R.R.R.R.R.r.r.r.R.r.r.r.r.r.r.r.r.R.R.r.A.R.R.r.R.r.R.R.r.r.r.R.r.r.a.R.R.A.s .A.A.A.a.R.R.r.A.s.R.R.r.R.r.r.R.R.r.a.R.r.A.r.A.r.R.r.R.R.r.r.A.A.r.r.R.R.r.r.r.r.r.A.R.R.r.r.r.r.R.R.s.(…)
According to the letter assignment strategy we can say that this behavior is mostly not periodic because it is using letters such as RrUu and that it only have some periodic flows from time to time (most probably due to chance) as seen by the letters BAa. It can also be said that the flows are sent very quickly, because the symbol . means that there is less than 5 seconds between flows. It is also worth noticing that the letters r and R means that the flows were small in size and short in duration. This is going to be used latter to differentiate the traffic.
Payload of the SPAM-related Traffic
A deeper analysis of the behavior of this DNS connection shows actually which domains were requested and the real times between flows. An example of the payloads is:
State: “1” TD: -1.0 T2: False dns.msftncsi.com
State: “1*” TD: -1.0 T2: 0:06:04.725089 smtp.live.com
State: “R,” TD: 44.538416 T2: 0:00:08.189000 wsipowerontheweb.com
State: “R.” TD: 955.319645 T2: 0:00:00.008572 skaner.com.pl
State: “r.” TD: 2.032243 T2: 0:00:00.004218 adultlivechat.us
State: “R.” TD: 8.590631 T2: 0:00:00.000491 timeturkey.com
State: “r.” TD: 3.069246 T2: 0:00:00.001507 nataliecurtiiss.com
State: “r.” TD: 2.903661 T2: 0:00:00.000519 le-mariage.com
State: “r.” TD: 3.296724 T2: 0:00:00.001711 shipeliteexpress.com
State: “R.” TD: 5.002924 T2: 0:00:00.000342 vanguardpkg.com
State: “r.” TD: 3.804094 T2: 0:00:00.001301 redconeretreat.com
State: “r.” TD: 3.212346 T2: 0:00:00.000405 glmghotels.com
State: “r.” TD: 1.782716 T2: 0:00:00.000722 yamamoto-sr.com
State: “r.” TD: 1.714964 T2: 0:00:00.000421 meubles-jacquelin.com
State: “v.” TD: 1.39905 T2: 0:00:00.000589 phototype.com
As it can be seen from above listing, the frequency of the connections is not periodic and very quick, with some time differences between flows of 0.0004 seconds. The flows are sent very fast probably to try to send as many mails as possible in a short time. This capture can be also used to show one of the new features in the stf behavioral models. The stf program do not consider these flows periodic because the ratio between time differences is too large (values ranging from 1.3 to 955). Therefore, stf can differentiate between very fast flows, which usually have time differences in the order of milliseconds and real periodic flows coming from malware. The idea is that the periodicity is not only considered when three flows have very similar time differences, but also when these time differences are stable enough. For example, if the time differences are very small, they are more prone to have variances, which will lead to bigger ratios. This is the case of the present capture. So, faster flows usually have greater variances in their time differences (although small) and therefore are more prone to be considered not periodic.
DGA-Related DNS Capture
The second capture was generated by executing a variant of the Yakes Malware (MD543ecaeb983683f57af842c8993e242e6) and is called CTU-Malware-Capture-Botnet-100 in the Malware Capture Facility Project.
This malware uses a DGA to resolve the domains over the DNS protocol. The malware actually sent the new type of DGA that consists of words and not only letters. The connection that requested the DNS data was identified as 10.0.2.105-220.127.116.11-53-udp by the stf program and its behavioral chain of states starts as follows:
0000000000SR+a+a+a+B+b+b+b+b+a+A+b+A+a+a+000RR+a+RR+a+a+a+b+B+b+b+b+ a+a+b+a+a+b+RR+a+000000SS+b+b+B+a+a+b+a+a+b+00000000000000000000000V R+B+e+E+a+a+e+A+B+a+a+a+a+a+a+a+a+a+e+v+v+e+a+v+r+A+a+a+a+a+a+a+a+a+a +E+A+B+E+e+A+a+e+A+A+a+a+a+a+a+a+a+a+a+e+s+v+E+a+v+r+A+a+a+a+a+a+a+a+ a+a+e+A+B+E+e+a+A+e+a+a+a+a+a+a+a+a+a+a+a+e+s+v+e+A+v+r+A+b+a+a+a+a+b +a+a+a+e+B+B+e+e+A+a+E+A+A+a+a+a+a+a+A+a+a+a+e+s+v+e+a+v+r+A+a+a+a+a+ a+a+a+a+a+e+B+B+e+e+a+A+e+A+A+a+a+a+a+a+a+a+a+a+e+s+v+E+a+v+r+A+a+a+a +a+a+a+a+a+a+e+E+b+e+e+a+a+e+A+B+a+a+a+a+a+a+a+a+A+e+a+b+e+e+a+v+r+A+ a+a+a+a+a+a+a+a+a+e+s+v+e+a+A+E+A+A+a+a+a+a+a+a+a+a+a+e+A+B+e+E+a+v+r +a+a+a+b+a+a+a+a+a+a+e+s+v+E+a+A+e+a+a+a+a+a+a+a+a+a+a+b+E+A+E+R.R.V+
It can be seen that there are a lot of letters aAbBeE, which according to the letter assignment strategy means that the behavior is more periodic than the previous one. The + symbol means that the time difference between the flows is between 1 minute and 5 minutes, which is a good indicator that the behavior is malicious since the frequency is not small. So we have a highly periodic signal with a freq between 1-5 minutes. As with other malicious behaviors it is interesting to see that the connection is not perfectly periodic. From time to time there are letters such as RS0svrV. However, this is normal for a malicious behavior because the malware may be updating the binaries, changing the C&C server, or receiving orders. Let’s analyze the content now.
Payload of the DGA-related Malware Capture
The payload of this connection can be seen in stf and starts with the following data:
State: “1” TD: -1.0 T2: False dns.msftncsi.com
State: “2*” TD: -1.0 T2: 0:06:28.920575 silverresolve.com
State: “S+” TD: 5.966805 T2: 0:01:05.180709 treeproducealarm.com
State: “b+” TD: 1.000067 T2: 0:01:05.185083 worddustballprocess.com
State: “b+” TD: 1.031732 T2: 0:01:03.180260 requestpressure.com
State: “b+” TD: 1.015835 T2: 0:01:04.180718 trainingpursue.com
State: “a+” TD: 1.015635 T2: 0:01:05.184163 musclecuphospital.com
State: “b+” TD: 1.0338 T2: 0:01:03.052979 taskshowerreaction.com
State: “b+” TD: 1.004841 T2: 0:01:02.749220 substanceissue.com
State: “a+” TD: 1.022871 T2: 0:01:04.184343 nailadaptbank.com
State: “a+” TD: 1.01366 T2: 0:01:05.061071 toweldependequipment.com
State: “a+” TD: 1.015737 T2: 0:01:04.053084 clockpunchposition.com
State: “00R” TD: 163.395829 T2: 2:54:26.006739 pianoremovebill.com
State: “R+” TD: 168.675111 T2: 0:01:02.048317 quantitybitebed.com
State: “A+” TD: 1.064587 T2: 0:01:06.055812 loanapologize
State: “00000000000S” TD: 644.606275 T2: 11:49:39.990931 silverresolve.com
State: “S+” TD: 673.935788 T2: 0:01:03.181080 treeproducealarm.com
State: “b+” TD: 1.000021 T2: 0:01:03.179723 worddustballprocess.com
State: “b+” TD: 1.000024 T2: 0:01:03.181219 requestpressure.com
State: “b+” TD: 1.047578 T2: 0:01:06.187224 trainingpursue.com
State: “a+” TD: 1.031275 T2: 0:01:04.179979 musclecuphospital.com
State: “a+” TD: 1.034002 T2: 0:01:02.069499 taskshowerreaction.com
State: “b+” TD: 1.037091 T2: 0:01:04.371688 substanceissue.com
State: “a+” TD: 1.018849 T2: 0:01:03.180817 nailadaptbank.com
As it can be seen from the flows, the frequency of the connections is actually near 1 minute (T2 value). There are some timeouts, including one of more than 11 hours and some non-periodic flows. This type of connection is a good representative of a DGA DNS behavior.
Normal DNS Capture
This normal capture was generated by working in a Linux computer for several hours. The task were accessing most normal websites such as Twitter, Google Mail, Google Search, etc.
The connection is identified by stf as 10.0.0.34-18.104.22.168-53-udp and its chain of behavioral states starts as follows:
It can be seen, according to the letter assignment strategy, that these flows are not periodic because most of the letters are like sSvVrR, with some occasional periodic letters such as bB. Also, and more importantly, the time between flows is less than 5 seconds, as described by the appearance of the symbol . between the letters. This indicates that they are being sent very quickly, probably because the user is accessing a lot of different sites simultaneously.
The difference with the DGA malware is big because of the periodicity, but which is the exact difference with the SPAM malware? A comparison between this normal capture and the SPAMmalware capture related with SPAM shows that:
- The normal capture have flows that are small in size but longer in duration than the SPAM flows, because of the letters sS instead of the letters rR.
- The normal capture have flows that are medium in size and also longer in duration that the SPAMflows, indicated by the letters vV.
The differences in the behavioral patterns are large enough to guarantee that the detection algorithm will be able to differentiate these two behaviors.
Payload of the Normal DNS Capture
Using the stf program we can extract the actual payload from the flows and the exact time differences between flows. The following are the first flows of the capture:
State: “2” TD: -1.0 T2: False aus4.mozilla.org
State: “2.” TD: -1.0 T2: 0:00:00.135054 aus4.mozilla.org
State: “S,” TD: 224.123573 T2: 0:00:30.268785 mail.google.com
State: “S.” TD: 212.004882 T2: 0:00:00.142774 mail.google.com
State: “v.” TD: 3.748136 T2: 0:00:00.038092 clients1.google.com
State: “S.” TD: 5.102746 T2: 0:00:00.007465 clients1.google.com
State: “S.” TD: 624.359411 T2: 0:00:04.660843 mail-attachment.googleusercontent.com
State: “S.” TD: 99.921599 T2: 0:00:00.046645 www.gstatic.com
State: “S.” TD: 590.443038 T2: 0:00:00.000079 www.gstatic.com
State: “S.” TD: 22.253165 T2: 0:00:00.001758 ssl.gstatic.com
State: “S.” TD: 42.878049 T2: 0:00:00.000041 ssl.gstatic.com
State: “S.” TD: 8.219512 T2: 0:00:00.000337 lh5.googleusercontent.com
State: “S.” TD: 9.108108 T2: 0:00:00.000037 lh5.googleusercontent.com
State: “V.” TD: 2502.027027 T2: 0:00:00.092575 mail-attachment.googleusercontent.com
State: “S.” TD: 8.754965 T2: 0:00:00.010574 safebrowsing.google.com
State: “S.” TD: 264.35 T2: 0:00:00.000040 safebrowsing.google.com
State: “S.” TD: 12063.15 T2: 0:00:00.482526 www.google.com
State: “S.” TD: 5026.3125 T2: 0:00:00.000096 www.google.com
State: “V.” TD: 8.770833 T2: 0:00:00.000842 clients2.google.com
State: “S.” TD: 17.914894 T2: 0:00:00.000047 clients2.google.com
State: “S.” TD: 49.659574 T2: 0:00:00.002334 mail.google.com
State: “S.” TD: 33.342857 T2: 0:00:00.000070 mail.google.com
State: “V.” TD: 7028.985714 T2: 0:00:00.492029 apis.google.com
State: “S.” TD: 43.747577 T2: 0:00:00.011247 lh3.googleusercontent.com
State: “S.” TD: 193.913793 T2: 0:00:00.000058 lh3.googleusercontent.com
State: “s.” TD: 2.189655 T2: 0:00:00.000127 lh4.googleusercontent.com
State: “s.” TD: 2.886364 T2: 0:00:00.000044 lh4.googleusercontent.com
State: “V.” TD: 55.977273 T2: 0:00:00.002463 clients5.google.com
State: “S.” TD: 28.310345 T2: 0:00:00.000087 clients5.google.com
State: “S.” TD: 1255.137931 T2: 0:00:00.109197 apis.google.com
State: “w.” TD: 3.303094 T2: 0:00:00.033059 plus.google.com
It can be seen that the time between the flows is very short, even around 0.00009 seconds some times. This behavior seems to be constant within most normal captures, where the accessed web sites ask to resolve domain names based on its own developed behavior.
This analysis showed that the long-term DNS behavioral patterns of different types of malware actions as well as normal actions is very different. These differences are large enough to guarantee a very good detection in the later stages of the Stratosphere IPS. The SPAM malware behavior is not periodic and with short and small flows, the DGA malware sent highly periodic flows, and finally the normal connection sent non-periodic, small and longer flows.