Predicting the Spread of the Corona Virus (COVID-19) in Indonesia: Approach Visual Data Analysis and Prophet Forecasting

The coronavirus pandemic (COVID-19) is one of the viruses transmitted through respiratory infections that can lead to death, this virus was first detected in China, Wuhan City, Hubei Province in December 2019, and on January 30, 2020, the COVID-19 outbreak. declared a Health Emergency by WHO [1]. As of April 2020, as many as 120 countries have reported approximately 2 million cases with 195,755 people dying and more than 781,109 people recovered, in Indonesia this case first appeared in March 2020, recorded more than 1,000 confirmed cases, with a mortality rate. up to 8.8% spread across 34 Provinces and have enacted travel restrictions, school closings to break the chain of the spread of this pandemic. The National Disaster Management Agency, Task Force for the Acceleration of Handling COVID-19 of the Republic of Indonesia reported that dated April 12 2020 at 16.00 WIB, the spread of this virus in Indonesia was 4,241 confirmed, under treatment 3,509, 359 people recovered and died 373, this will have a severe economic impact, thus requiring stricter policies and plans for predicting confirmed cases in the coming days to limit the growth factors associated with the increase in the number of cases.

Machine Learning approach to predict outbreak activity in China and [5] proposed a SIR model to study epidemic development in India, the Susceptible-Infectious-Quarantined Model -Recovered (SIQR) was proposed [6] for data analysis from the Brazilian Ministry of Health and Zhou et al [7] proposed a Logistics model and an SEIR model. Gupta & Pal [8] proposed an ARIMA model based on exploratory data analysis for prediction of outbreak trends in India, whereas [9] proposed a timely and short-term forecast [10] by proposing a correlation model for the growth of the legal power of COVID-19 on four continents and the inefficiency of quarantine strategies. Recently [1] proposed a Neural network (NN) approach to the Long short-term memory (LSTM) model to predict the parameters, risks, and effects of an epidemic, whereas [12] applying a modified ANFIS model. Several research results have concluded that countries that do not implement quarantine the number of cases will grow exponentially [13], but predicting a pandemic requires an analysis of contributing factors to make relatively accurate estimates such as the available dataset. The current number of confirmed cases in Indonesia still needs to be analyzed because of the limited testing data available and, likely, more cases have not been diagnosed than have already been diagnosed. In this paper, we aim to analyze the trend situation shortly of the outbreak in Indonesia by applying the Exploration Data Analysis (EDA) model approach, then building a time series forecasting model using the prophet method. The EDA model is a data analysis model using various techniques to maximize insight into the data set as a step in making hypotheses before the data analysis step to understand the data used [14]- [17]. The EDA model is applied to estimate various parameters in the model, then simulations are carried out to see what will happen under the various scenarios. The results of the analysis will be applied to a time series prediction model based on grouping individuals in a population of confirmed cases, deaths, and cures.

A. Prophet Method
The proposed forecasting method in this paper is the Prophet model [18] which is the liquid of one of the model's overnight series when based on the procedure where the non-linear trend is adaptive with the seasonal fit, Kraton, and daily, weekly effects plus. Propheat models have better performance with the current series that has a strong seasonal effect and a few seasons of IDI data, strong for data loss and trend shifting, and usually handles outliers well. Prophet provides a practical approach to forecasting on a scale " which intends to automate the general characteristics of the current series by providing simple and customizable methods. This approach begins by modeled on a series when using the parameters specified by Muhamad, generates an estimate, and then evaluates it. In general, the Prophet model is formulated as follows: Where: g(t) trend model, which describes the long-term increase or decrease in the data. s(t) models seasonality with the Fourier series, which describes how data is affected by seasonal factors such as time of year, h(t) models the effect of holidays or major events affecting business time series and ϵₜ represents the term irreducible error.

B. Data
We use open-source data (https://github.com/CSSEGISandData/COVID-19) to estimate the various parameters in the model and then simulations are carried out to see what will happen under the various scenarios. We collect data on cases reported every day up to April 29, 2020, then estimate the cross-cutting trend of outbreaks in various worlds starting from China, then illustrating a time series of confirmed COVID-19 cases from the top ten countries for confirmed cases, deaths, and recovered, also, we evaluate the trend of outbreak developments in Southeast Asian countries which focus on Indonesia, the Philippines, Brunei, Malaysia, and Singapore. The results of the analysis will be applied to the Prophet model to predict confirmed COVID-19 cases and death cases in Indonesia.

III. RESULT AND DISCUSSION A. Exploratory Data Analysis
In this paper, an Exploratory Data Analysis (EDA) approach is proposed and visualizes the 2019-nCoV open dataset provided by Johns Hopkins University to provide insights into this virus outbreak on various continents, especially in Southeast Asia. We used the dataset for January-April 2020, data analysis was divided into several phases, namely analysis of trends in China and the world, analysis of global data, analysis of trends in the number of cases in Indonesia, then applying the Prophet model to predict outbreak trends in Indonesia for confirmed cases, recovered and cases of death within the next 30 days. The first discovery of this coronavirus pandemic was found in China, Hubei Province, so it is necessary to analyze how the development of this virus and its spread in countries around the world is very useful to understand the global trend of increasing the number of cases over time. There are always patterns in any data, but what is of concern is how strongly the data follows a pattern that spreads exponentially.

1) Outbreak Trends in China and the World
China's Hubei province has recorded the highest number of cases (67k). The highest number of cases in China is in Hubei province, where the virus is believed to have originated. In Hubei, the city of Wuhan noted the number of cases confirmed the highest, followed by Zhejiang, Guangdong, and Hunan. Province has recorded a surge of 30% in the case of new which was confirmed by 14.840 on 13th February 2020. The cases began to be reported in other countries outside the country of China from the date of 8 February. Since it is, the cases increased drastically in other countries. But at the end of February, the cases that have been stable in the State China, but the situation has worsened in the whole world seen in picture 1 (a) cases confirmed, while the number of death continue to increase in the whole world seen in picture 1 (b) Cases Died.

2) Trends in the Top 10 Countries
The next stage, analyzing the trend of outbreak development in the ten countries with the highest prevalence of confirmed cases, deaths and recoveries is presented in Figure 2 which is a visual display of the 10 countries with the highest number of cases outside China, where the USA is the highest country for confirmed cases, deaths, and cases. active, while for recovered cases Germany is in the top rank. And the USA is in position 4 of 10 countries

3) Outbreak Trends in Southeast Asia
Further analysis is how the development of the outbreak in the state of Southeast Asia, in this case, we focus on 5 (five) countries that are Indonesia, Philippines, Brunei, Malaysia, and Singapore with confirmed cases, died, and recovered in In the confirmed cases of the outbreak in Figure 5 (a), Indonesia until April 29, 2020, there were 9,771 cases, the Philippines as many as 8,212, Brunei 138, Malaysia 5,945 and Singapore with 15,641 at most and Indonesia was in the second position of the 5 countries. The state of Indonesia died in the first place as many as 784, while the cases recovered by the State of Indonesia were in the first place as many as 1,391 then the state of Singapore, which is presented in table 1 We used predictive analysis, to estimate how many confirmed cases and deaths could be expected in the near future. For each epidemic, the most important evaluation is the Death Rate. It is a measure of the number of deaths in a given population over a specified interval. Figure 6(a) shows how the death rates varied from January 22 2020 to April 2020 worldwide and Figure 6(b) shows the variation in mortality rates across different continents over time. The plot above shows the number of confirmed cases reported per day, so we are currently looking at the cumulative number. The number of cases has increased exponentially in Southeast Asian countries since the first cases in China. The trend of increasing death cases in Indonesia occurred on April 14, 2020, as many as 60 people, while in the month of the most confirmed cases in April 2020 this shows that in April there was an increase in cases in Indonesia.

B. Analisys Tren and Forecasting
In this study, we built a model to estimate the number of confirmed cases and deaths in Indonesia within the next 30 days using the Prophet algorithm based on available data up to 29 April 2020, we applied a time series forecasting model to a dataset detailing the number and location of cases. confirmed pandemics, including people who recovered, and those who died. Algorithms prophets are used to predict future values based on previously observed values. We also consider parameter setting to optimize the predicted results which are evaluated by Mean Absolute Percentage (MAP) while the Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MEA) models are used for the evaluation of prediction errors In table 3, it is part of the dataset used for prediction, the predictive analysis framework is used by dividing the historical data for the last month for prediction needs then comparing the prediction results with the actual data. By default the prophet algorithm determines the ds and y variables to build a prediction model, so the first step is to change the Date variable to ds and Confirmed to y for the prediction of confirmed cases, while the death cases death becomes y and will produce an estimated value ('yhat') lower limit ('yhat_lower') and upper limit ('yhat_upper') estimate. The results of the prediction of confirmed cases and deaths are presented in Figure 10 while Figure 11 is a visualization of the trend and weekly analysis for both cases.
Based on the comparison of the predicted results with the actual data presented in table 3, the prophet algorithm has a good performance with the evaluation result of our estimated relative error rate (MAPE) of about 6.52%, and the average of our model is wrong 52.7 (MAE ) for confirmed cases, whereas case mortality was 1.3% for the MAPE and MAE models around 236.6%. Our experimental results using the prophet algorithm to do the job of predicting confirmed cases and deaths in Indonesia within the next 30 days resulted in a good error rate by adjusting for the acceleration of the increase in the number of confirmed cases and deaths.

IV. Conclusion
In this paper, we present a data visualization of the trend of the coronavirus outbreak in confirmed cases, deaths, and cures based on data reported from January to April 29, 2020. The Exploratory Data Analysis (EDA) model approach is applied to provide an understanding of the trend of outbreaks that started in China, the ten highest countries, trends in Southeast Asian countries, and time series experience using the Prophet algorithm for the next 30 days in Indonesia. Based on the comparison of predicted and actual data on confirmed cases and deaths in Indonesia, the prophet algorithm does a good job where the result of our estimation error rate (MAPE) evaluation is around 6.52%, and our model average is wrong 52.7 ( MAE) for confirmed cases, while the case mortality is around 1.3% for the MAPE and MAE models around 236.6%, besides that the results of the experience result in the conclusion that there will be a significant increase in the trend of the spread of this pandemic in Indonesia for confirmed cases and cases of death. increase in the next month, this can be used as input in the policy-making process to limit the spread of the virus, besides that, we still need to consider conditions of government policies such as limiting access to communication in future research.