Q261 : Dark Web Analysis using BERT’s Language Model
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2024
Authors:
Baratali Akhtariyan [Author], Mohsen Rezvani[Supervisor]
Abstarct: The hidden nature and limited access of the dark web has led to the proliferation of many criminal activities, including cyber threats, arms sales, drug sales, and the sale of illegal tools. The emergence of large language models has created the hope that it will be possible to analyze the content on the dark web with proper accuracy. In this regard, it will be very useful and effective to use massive cyber data available in the dark web to prevent cyber threats and train language models. The technology of large linguistic models requires a lot of high-quality data for better training and to achieve sufficient accuracy, and this is a challenge that researchers in the field of cyber security face due to the contamination of the data available on the dark web. Most of the research in this field focused on all the characteristics of the dark web dataset and low-quality data, which failed to achieve high accuracy. In this thesis, we presented a new language model baxsed on the BERT language model, which was trained on the data extracted from the dark web. The proposed model is a transformer-baxsed textual model that uses a two-way encoder of transformers for learning approach we evaluated it on a high - quality dataset, without repetitive data, free of unknown words, all in English and specifically on hacking and security data. Finally, by analyzing the evaluated values of the proposed model with the previous models, it was found that the proposed model was able to have better accuracy in data classification due to the injection of high-quality data compared to the previous models.
Keywords:
#Dark Web #Large Language Models #Transformers #BERT Keeping place: Central Library of Shahrood University
Visitor: