Bot detection using unsupervised machine learning

Wu, Wei; Alvarez, Jaime; Liu, Chengcheng; Sun, Hung-Min
Microsystem Technologies

This research focuses on bot detection through implementation of techniques such as traffic analysis, unsupervised machine learning, and similarity analysis between benign traffic data and bot traffic data. In this study, we tested and experimented with different clustering algorithms and recorded their accuracy with our prepared datasets. Later, the best clustering algorithm was used to proceed with the next steps of the methodology such as determination of majority clusters (cluster with most flows), removal of duplicate flows, and calculation of similarity analysis. Results were recorded for the removal of duplicate flows stage, the results indicate how many flows each majority cluster contains and how many duplicate flows were removed from this majority cluster. Next, results for similarity analysis indicate the value of the similarity coefficient for the comparisons between all datasets (bot datasets and benign dataset). With these results we can present some heuristic conclusion for determining possible bot infection in a certain host.