Q200 : Botnet detection using Deep learning
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2019
Authors:
[Author], Hoda Mashayekhi[Supervisor], Mohsen Rezvani[Advisor]
Abstarct: The necessity of using the Internet and its important role in people's lives is undeniable. On the other hand, with the growing number of Internet users and the dramatic growth of computer networks and infrastructures, the issue of security and monitoring of network traffic is one of the most urgent needs of cyberspace. In recent years, the botnet has been known as one of the most dangerous malwares, capable of destroying healthy computers and converting them into bots to transmit viruses, spam, etc. So far, various methods have been developed to identify botnets, among which the classification of network traffic by learning approaches is considered as one of the most well-known security solutions due to its performance and extensibility. In recent years, traditional methods have been replaced by new learning methods such as deep learning, which have shown significant performance in various domains, and have also been used in malware detection. However, there are various challenges in using learning methods for botnet detection. One of the problems with botnet detection using traffic classification is the lack of a valid and reliable feature set to describe network behavior. Moreover, selected features must maintain the privacy of network communications. In addition, the volume of network traffic is very high in some attacks and it is time-consuming to check all the traffic content. The faster the attacks are detected, the lower potential for irreversible risks in the intrusion detection system. To remove the challenges, two deep learning-baxsed botnet detection approach is proposed in this study. In the first method, a suitable feature set is selected from several different feature sets that can best describe the network traffic behavior. It then generates a new feature set by calculating the correlation between the features. Afterward, it generates a new feature set by calculating the correlation between the features. The deep learning algorithm is used to learn the features and classify unseen test data. The results demonstrate that the proposed method improves accuracy around 12% over the works done in this field. In the second method, the features are extracted from raw data automatically by deep learning algorithms. In contrast to the existed methods, the data is assigned to the learning algorithm at the bit level without extracting the initial feature. The Long Short Term Memory network is used to extract the feature. It should be noted that the obtained raw data are given to the learning algorithm after a series of proper preprocessing. Also, the data is extracted only from the header of the network packets and the payloads of the network packets are not used. This will protect the privacy of network communications. Moreover, a limited number of network packets have been used for testing. As a result, detection speeds are much faster than when all traffic data is used and attacks are detected faster. The Random Forest algorithm is used to classify flows baxsed on the features extracted from the deep learning algorithm. The results display that the deep learning algorithm is able to extract and learn the essential features properly. The classification results illustrate that in the proposed method, the classification accuracy has increased by about 22% compared to some state of the art studies. Also, the true-positive and false-positive alert detection rates are improved from 92 to 98.2 and from 15 to 14 percents, respectively.
Keywords:
#Deep learning #malware #botnet #network traffic classification #information security #long-short term memory network. Link
Keeping place: Central Library of Shahrood University
Visitor: