Distributed Denial of Service (DDOS) Attack Detection on Zigbee Protocol Using Naive Bayes Algorithm

so


I. Introduction
Distributed Denial of Service or better known as DDoS is an attempted attack from several computer systems that target a server so that the amount of traffic becomes too high so that the server cannot handle the request.DDoS is usually done by using several computer systems that are used as sources of attacks.So, they attack one server through several computers so that the amount of traffic can also be higher.A DDoS attack is like a traffic jam that prevents a driver from reaching their desired destination on time.According to data, 33% of businesses in the world have fallen victim to DDoS attacks.DDoS is hard to trace.Some types of DDoS attacks can be very powerful and even reach speeds of 1.35 Tbsp.Additionally, DDoS attacks can cause losses of $ 40,000 per hour if they occur.
ZigBee is a standard from IEEE 802.15.4 for data communication on personal consumer devices as well as for business scale.ZigBee is designed with low power consumption and works for low level personal networks.ZigBee devices are commonly used to control another device or as a wireless sensor.ZigBee has a feature which is able to manage its own network or manage data exchange on the network [1].Another advantage of ZigBee is that it requires low power, so it can be used as a wireless control device which only needs to be installed once, because only one battery can make ZigBee last up to a year.In addition, ZigBee also has a "mesh" network topology so that it can form a wider network and more reliable data.
In the previous research of Muhammad Aziz, Rusydi Umar, Faizin Ridho based on the results of the analysis carried out that the attack information that has been detected by the IDS based on signatures needs to be reviewed for accuracy using classification with statistical calculations.Based on the analysis and testing carried out with the artificial neural network method, it was found that the accuracy was 95.2381%.The neural network method can be applied in the field of network forensics in determining accurate results and helping to strengthen evidence at trial.
In previous research, Jodi Chris Jordan Sihombing, Dany Primanita Kartikasari, Adhitya Bhawiyuga based on the tests that have been carried out, SDMD's performance in detecting DDoS attacks is very good.The accuracy obtained in detecting DDoS attacks is 96.08%, 95.66%, and 98.76% for syn flooding, udp flooding, icmp flooding, respectively.The system can also cope with and minimize the impact of DDoS attacks.This can be proven from the number of attack packets that enter the victim host decreasing when SDMD is activated.
In previous research Nadila Sugianti, Yayang Galuh, Salma Fatia, Khadijah Fahmi Hayati Holle (2020) based on the discussion that has been explained and the results of tests that have been carried out regarding the problem of detecting HTTP-based DDOS attacks based on the number of users, number of packages, number of packages / user and length.The data captured, the fuzzy logic method using Sugeno method can be used as a detector in determining HTTP-based DDOS attacks with an accuracy of up to 90%.
In previous research, Kurniabudi, Abdul Harris, Abdul Rahim, (2020) based on experimental data that the Information Gain feature selection technique was able to improve the performance of the classification method, especially Random Forest which has better performance than Naïve Bayes, Bayes Network, OneR, AdaBoost and Random A tree with 99.99% accuracy in testing all training data and 99.95% on testing using 10-fold cross validation.But on the other hand, Random Forest has a longer time to build models and training processes when compared to Naïve Bayes, Bayes Network, OneR, AdaBoost and Random Tree.In the experiments conducted in this study, researchers used Information Gain as a feature selection technique for the CICIDS-2017 dataset in detecting DDoS attacks.For further research, other feature selection techniques can be used that might improve DDoS attack detection performance.Apart from the use of other classification techniques need to be considered in the next research, especially those which have better performance with lower computation time.
In previous research, Arif Wirawan Muhaammad, Cik Feresa Mohd Foozy, Ahmad Azhari (2020) based on experimental results that the combination of the seven key data set features selected used as input for the classification of artificial neural networks in this study gave the highest accuracy value of 97.76%.Lila Dini Utami, Romi Satria Wahono (2015) that Naïve Bayes is a classifier that classifies a text, one example is restaurant reviews.Naïve Bayes is very simple and efficient, is also very popular for text classification and performs well on many domains.There are 3 stages of data processing, namely naïve bayes, naïve bayes and information gain, and naïve bayes, information gain, and adaboost.And it turns out, if only naïve bayes are used, the accuracy will only reach 70% and AUC = 0.500.Likewise, if naïve Bayes are accompanied by information gain, the accuracy achieved is only 70% and AUC = 0.500, it proves that the information gain does not affect the accuracy of naïve Bayes.However, if naïve bayes and information gain are accompanied by adaboost, the accuracy increases 29.5% to 99.5% and AUC = 0.995.Al Riza Khadafy, Romi Satria Wahono based on the results of experiments and evaluations in this study, in general it can be concluded that the application of the NB classification algorithm can reduce noise data on large datasets and have many classes or multi-classes so that the classification accuracy of the DT algorithm can be increased.The accuracy results obtained indicate that the proposed method DT + NB is superior to the DT method, with an accuracy value for each test dataset such as Breast Cancer 96.59% (21.06% increase), Diabetes 92.32% (increase 18 , 49%), Glass 87.50% (increase 20.68%), Iris 97.22% (increase 1.22%), Soybean 95.28% (increase 3.77%), Vote 98.98% ( increased 2.66%), Image Segmentation 99.10% (increased 3.36%), and Tic-tac-toe 93.85% (increased 9.30%).Comparison of accuracy values is carried out by t-test or t-test between the DT method and the proposed method of DT + NB to obtain a significant difference in accuracy between the two methods.From the comparison results, the P value (T <= t) is 0.01321, this indicates that the p value is smaller than the alpha value (0.01321 <0.05).
The Naive Bayes algorithm is a classification method using probability and statistical methods proposed by the English scientist Thomas Bayes.The Naive Bayes Algorithm predicts future opportunities based on past experiences, so it is known as Bayes' Theorem.The main characteristic of this Naïve Bayes Classifier is a very strong assumption (naive) of the independence of each condition / event.Naive Bayes Classifier performs very well compared to other classifier models, "Naïve Bayes Classifier has a better level of accuracy than other classifier models.The advantage of using this method is that this method only requires a small amount of training data to determine the estimated parameters required in the classification process.Because it is assumed to be an independent variable, only the variance of a variable in a class is needed to determine the classification, not the entire covariance matrix.
Many studies have been done before and the Naive Bayes algorithm is the best model compared to other models such as: logistic regression, neural network, random forest, decission tree, support vector machine and k-nearest neigbor.Naive Bayes is a classification algorithm that is simple and easy to implement so that this algorithm is very effective when tested with the right dataset, especially if it is naïve bayes with feature selection, then naive bayes can reduce redudants in data (Witten, Frank, & Hall, 2011).The Naive Bayes algorithm is included in supervised learning and one of the fastest learning algorithms that can handle a number of features or classes (Lee, 2015).
In this study, the Instruction Detection System in the ZigBee Protocol will be implemented using the Naïve Bayes algorithm.The Naïve Bayes algorithm is a machine learning method that uses probability calculations.This algorithm makes use of probability and statistical methods to predict future probabilities based on past experiences.

II. Detection Approach
The DDoS attack detection approach implemented in this study is divided into several stages namely:

Retrieving Dataset
CICDDoS2019 contains benign and the recent DDoS attacks, resembling real data (PCAPs).It also includes the analysis of network traffic analysis using CICFlowMeter-V32 [51] and labelled flows.B-Profile system [47] was used to profile the abstract behaviour of human interactions and generate naturalistic benign background traffic.For this dataset, the abstract behaviour of 25 users was constructed based on the HTTP, HTTPS, FTP, SSH, and email protocols [47].The dataset includes different modern reflective DDoS attacks such as Port Map, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag, SYN, NTP, DNS, and SNMP.The capturing period for the training day on January 12th started at 10:30 and ended at 17:15, and for the testing day on March 11th started at 09: 40

III. Method Research
The naïve Bayes classifier is built on Bayes' Theorem, where event independence is assumed.In statistics, two events are said to be independent if the likelihood of one does not impact the other [54].Table 9 presents the algorithm of the Bayesian classifier to calculate probability.For instance, let P(B|A) equal the conditional probability of any given event.Then, let P(B) be the probability of B, and P(A) be the probability of A. Furthermore, let P(A|B) be equal to the likelihood of A given B. As such, the theorem is formally presented as:

IV. Result and Discussion
This section outlines the details of the design and implementation of the proposed solution.The solution is implemented in Python 3. Firstly, an overview of the solution is presented, briefly describing the phases of this implementation.describes the data preparation process, including details on data cleaning and transformation, and dataset splitting.presents the modelling process, with a detailed account of the training, validation and testing processes.concludes with an overview of the evaluation procedure, including a summary of the performance metrics used to analyse the intrusion detection performance of the DDoS datasets.

Data Cleaning and Transformation
Missing data.Handling missing data is vital in machine learning, as it could lead to incorrect predictions for any model.Accordingly, null values are eliminated by propagating the last valid observation forward along the column axis.This is implemented using the fillna method from the pandas library [52], as shown below.
data.fillna(method ='ffill', inplace = True) Undefined Data.The elimination of null values can result in undefined data.A null field with no cells on its left becomes NaN after propagation, since there are no cells to provide a value.Consequently, these values are decoded into 0.This is all done using the fillna method [52].

data=data.fillna(0)
Transformation.The format of the collected data might not be suitable for modelling.In such cases, data and data types need to be transformed so that the data can then be fed into the models, as described by the CRISP-DM method.Accordingly, some data features were transformed into numeric or float, since models do not perform well with strings, or do not perform at all.
Class Labels.Each dataset instance represents a snapshot of the network traffic at a given point in time.These instances are labelled according to the nature of the traffic, that is, whether the traffic is benign or malicious.The labels across the four datasets vary, therefore they are encoded to have homogeneity in the class labelling system.Classification is binary, where benign traffic is labelled as NORMAL, and malicious traffic is labelled as ATTACK.Table 5 summarises the classification system.

B. Splitting Datasets
A key characteristic of a good learning model is its ability to generalise to new, or unseen, data.A model which is too close to a particular set of data is described as overfit, and therefore, will not perform well with unseen data.A generalised model requires exposure to multiple variations of input samples.Primarily, models require two sets of data, one to train and another to test.The training data is the set of instances that the model trains on, while the testing data is used to evaluate the generalisability of the model, that is, the performance of the model with unseen data.The train/test split can yield good results; however, this approach has some drawbacks.Although splitting is random, it can happen that the split creates imbalance between the training and the testing set, where the training set has a large number of instances from only one class.In such cases, the model fails to generalise and overfits.To mitigate this, the datasets are split into three subsets; training, validation and testing.This split is done in a 60:20:20 ratio, for training, validation and testing respectively.The train_test_split helper method from the scikit-learn library [53] is used for the split, as presented in the code snippet below.With this approach, training is done in two phases, with the training and the validation sets.Firstly, the training set is used to train the model.Then, the validation set is used to estimate the performance of the model on unseen data (data that the model is not trained on).For the purpose of this study, validation is done using a stratified k-fold approach.The k-fold validation method is described in Section 6.During the training process, the selected algorithms are provided with training data to learn from to eventually create machine learning models.Accordingly, the training set is used, as specified in Section.At this point in the process, the input data source needs to be provided and should contain the target attribute (class label).The training process involves finding patterns in the training set that map the input features with the target attribute.Based on the observed patterns, a model is produced.
In this study, four DDoS datasets are being used as the input data source, where the target attribute is the type of network traffic, that is, attack or normal.Six algorithms are trained with each of the four sets.Training is conducted using several methods from the scikit-learn libraries.Table provides breakdown of the methods used for each algorithm.Appendix B contains the sources code for the models that were built to analyse the intrusion detection capacity of each dataset.
Following the training process, the model is validated using k-fold cross validation.Cross validation is applied to assess the generalisability of a model.This method aims to reduce the errors of overfitting that occur when a model is too closely fit a range of data instances.Cross validation is done in iterations, and each iteration involves splitting the dataset into k subsets, referred to as folds.The model is trained on k-1 folds, and the other fold is held back for testing, as illustrated in Figure 11.This process is repeated until all folds have served as a test fold.Once the process is completed, the evaluation metric is summarised by calculating the average value [54].In this study, a stratified k-fold approach is used using the validation dataset (20% of the global set).Stratified k-fold is a variation of k-fold cross validation that ensures that the distribution of classes is the same across all folds.This is implemented using the StratifiedKFold method from the scikit-learn library [64], with k=5.Below is a code snippet of the stratified k-fold, where n_splits specifies the number of folds.[47].From initial observations, it is clear that the naïve Bayes model performs poorly in comparison to the rest, with an accuracy rate of 45% (see table 15).The F-measure of the same model is also low.Taking a more granular look into this metric, it shows that both the precision and recall of the model are problematic, with 66% and 54% respectively.For this dataset, the best performing model was the random forest, achieving an accuracy of 99%, with a 99% precision and 99% recall.The model also took the longest to train, with a computation time of 84.2 seconds.Meanwhile, the other models took under 10 seconds to train.

V. Conclution
The Naïve Bayes model performed relatively poor overall and produced the lowest accuracy score of this study (45%) when trained with the CICDDoS2019 dataset [47].For the same model, precision was 66% and recall was 54%, meaning that almost half the time, the model misses to identify threats.

Fig. 1 .
Fig. 1.Bar chart showing the spreading of traffic type in the CICDDoS2019 dataset.
Fig 12  illustrates a comparative bar graph for the accuracy rates achieved by models that were trained with the CICDDoS2019 dataset[47].From initial observations, it is clear that the naïve Bayes model performs poorly in comparison to the rest, with an accuracy rate of 45% (see table15).The F-measure of the same model is also low.Taking a more granular look into this metric, it shows that both the precision and recall of the model are problematic, with 66% and 54% respectively.For this dataset, the best performing model was the random forest, achieving an accuracy of 99%, with a 99% precision and 99% recall.The model also took the longest to train, with a computation time of 84.2 seconds.Meanwhile, the other models took under 10 seconds to train.

Table 1 OS Specification and Machine IPs for CICDDoS2019. Adapted from DDoS Evaluation Set [47].
and ended at 17:35.Attacks were subsequently executed during this period.
Ibnu Mas'ud et.al (Distributed Denial of Service (DDOS) Attack Detection on Zigbee Protocol Using Naive Bayes Algorithm)

Table 4 Labelling system for binary classification.
. Volume and Class DistributionIn the CICDDoS2019 dataset, there were 121,980 (41.4%) records classified as normal traffic and 172,647 (58.6%) classified as attack traffic. A 3.3.