Fuzzy C-Means Clustering Algorithm For Grouping Health Care Centers On Diarrhea Disease

only. ABSTRACT In Indonesia, public health services at the city or district level are carried out by regional public hospital or “puskesmas” (health care centers), especially in Banyuwangi regency, East Java, Indonesia that have 45 health care centers spread throughout the villages. This research focused to deaths of baby caused by diarrhea diseases, which are the second leading cause of death among children younger than 5 years globally. All of the health care centers need to be divided into 3 groups to find out which health care centers have the least, most moderate, and many diarrhea sufferers. Fuzzy C-Means algorithm is used to overcome this problem. The result from this research shown that 2 health care centers have the smallest member of diarrhea sufferer, 14 health care centers have medium member of diarrhea sufferers, and the rest have a large number of diarrhea sufferers. From the result of this study, it can be a reference for the health department center in dealing with diarrheal diseases, accordingly the infant mortality rate due to diarrheal diseases can be lowered to health care centers that have high diarrhea sufferers.

In Indonesia, public health services at the city or district level are carried out by regional public hospital or "puskesmas" (health care centers), especially in Banyuwangi regency, East Java, Indonesia that have 45 health care centers spread throughout the villages. This research focused to deaths of baby caused by diarrhea diseases, which are the second leading cause of death among children younger than 5 years globally. All of the health care centers need to be divided into 3 groups to find out which health care centers have the least, most moderate, and many diarrhea sufferers. Fuzzy C-Means algorithm is used to overcome this problem. The result from this research shown that 2 health care centers have the smallest member of diarrhea sufferer, 14 health care centers have medium member of diarrhea sufferers, and the rest have a large number of diarrhea disease data per year obtained from health office center at Banyuwangi Regency. The purpose of this grouping is to know which groups of health care center have low diarrhea rate, groups of health care center that have succeeded in reducing the number of diarrhea rate, and groups of health care center that must improve services therefore the number of diarrhea sufferers can be reduced.
Banyuwangi Regency has 45 health care centers consisting 18 Nursing health care centers and 27 Non-Nursing health care centers, and 105 Helper health care centers spread in 24 Districts and 217 Villages [3]. A health care center must formulate strategical policies including internal efficiency (organization, management, and human resources) and must be able to quickly and accurately make a decision to improve a community service, so they can be responsive, innovative, effective, efficient and profitable [13].
The research conducted by Simhachalam and Ganesan [2] was to classify patients against several different groups of thyroid disease using the Possibilistic Fuzzy C-Mean (PFCM) and Fuzzy C-Mean (FCM) methods. In the medical diagnostic system, Fuzzy clusters use membership degrees in such a way to allow the same object to multiple clustering simultaneously with different degrees. This method is important to increase sensitivity in diagnosing thyroid disease. Based on results from this research that have been carried out repeatedly, the result show that positive Fuzzy C-Means clustering gives better grouping results than Fuzzy C-Means. That is way this method can help to diagnose the disease better for patients based on cluster obtained from the method. The research that was done by Ramya [16] predicted disease based on the symptoms experienced by patients by using the Fuzzy C-Means method therefore prevention could be made of the diseases suffered by patients in reducing the risk of death. The system would analyze patient's health condition and the result of analysis would be used in taking action according to the results obtained, so that it could help practitioners or doctors to analyze patient health. The results of this research had been done by calculating the numerical value based on the severity of the symptoms, hence the patient's disease could be predicted more accurately based on the patient's medical history.
Chetty, Vaisla, dan Patil [10] analyzed the method of improvement in predicting disease using a Fuzzy approach, this study predicted diabetes and liver disorders with two different approaches, among others classification with the K-Nearest Neighbor (KNN) algorithm and Fuzzy C-Mean (FCM) Clustering algorithm in clustering data and classified by using KNN. The results achieved in this study are that the proposed method works well and is more accurate than the first approach. The last, the research that was carried out by Imianvan A. A., Anosike U.F, and Obi J. C. [4] using Fuzzy C-Mean clustering system to diagnose HIV. Based on the symptoms suffered by the patient, a series of methodological and analytical decision steps is made on the resulting cluster. Based on a series of symptoms in the assessment of diagnostic patterns by determining the level of individual symptom membership, the system can provide precise results compared to traditional systems.
From several studies that have been proposed, this study focuses on predictions of grouping health care centers using the Fuzzy C-Mean Clustering algorithm, therefore the group can determine the number of diarrhea patients, groups that have succeeded in reducing the number of diarrhea sufferers, and groups who must improve services to the community so that the number of diarrhea sufferers can be reduced. From the data grouping, the health office can provide additional assignments for health care center groups that have high rates of diabetes sufferers in providing counseling to the community in consequence the number of diarrhea sufferers can be reduced.
The scope of the research is not only used in cases of infant mortality due to diarrhea in Banyuwangi, but it can be applied nationally so that the number of cases of infant mortality can be pressed and Indonesia's health index can increase. The next section is designed as follows: Theoretical analysis to explain the methods used in this study. Algorithm analysis describes sequences of cluster system flows. Discussion and the results of the study explain the results of a summary of the research in the proposed system flow experiment. The last section is the conclusion of this research.

A. Clustering
Clustering is also called classification refers to the dataset partition of objects into the most similar groups of objects. These objects can be numerical, categorical or both [2]. The ability of clustering algorithms is to reveal the underlying structure of data that can be exploited in a variety of applications, including classification, numerical taxonomy, image processing, pattern recognition, medicine, economics, ecology, artificial intelligence, data mining, modeling and identification. Fuzzy clustering uses membership degrees that allows an object to enter multiple clusters simultaneously, with various levels of membership.

B. Fuzzy Logic
Fuzzy logic is used to map problems from input to expected output. There is fuzzy clustering which is one method for determining optimal clusters in a vector space based on the normal Euclidian form for distance between vectors [12]. Fuzzy logic method namely Fuzzy C-Mean is often used to cluster the data.

C. Fuzzy C-Means
Fuzzy C-Means (FCM) is a soft grouping technique where each point has a member level of a cluster based on fuzzy logic [12]. Fuzzy C-Means clustering (FCM) uses fuzzy partitions accordingly a data point can be owned by more than one group with different membership values (weights) ranging from 0 to 1 [2]. The closer the data point to the cluster center, a membership of its data point will be towards the center of the cluster. The sum of the membership values of each object must be equal to one [10]. The basic steps of the Fuzzy C-Means algorithm are shown as follows: • The first step: insert the data to be clustered in the form of a matrix measuring n x m, then specify the number of cluster c (1 ≤ c ≤ N), matrix partition, power w, maximun iteration, least expected error ɛ > 0, initial objective function P0 = 0, and first iteration t = 1.
• Second step: set a random numbers (µik) as the initial partition matrix element U. Calculate the number of each column:

A. Data Preparation and Pre-Process
Data on diarrheal disease sufferers in infants is obtained from the Banyuwangi District health office from 2016 to 2019 as trial data in this study. The data is shown at table I. There are 45 health care centers data that are spread in Banyuwangi District, with data usage from 2016 to 2019. Based on these data, normalization data is used for data on diarrhea patients in table I, making it easier to calculate using Fuzzy C-Mean. The normalization process is performed for scaling the data values within a specified range [5]. Clustering is the process of grouping the similar data. The predominantly used existing clustering algorithm is K-Means. But, the K-means algorithm demands to specify the number of clusters in prior, they do not have the ability to resolve the overlapping data into two clusters, and the algorithm cannot handle noisy data, categorical data, and non-liner dataset [5].
Min-Max normalization is a simple technique where the technique can specifically fit the data in a pre-defined boundary with a pre-defined boundary [15]. The min-max formulation is shown in the following equation [18]: Where, Xn = new value for variable X X0 = current value for variable X Xmin = minimum value in data set Xmax = maximum value in data set Using Min-Max normalization to scale the values in the range Characterized. After normalizing, the new values are matrix between 0 and 1, which is most used to select tasks [9]. Block diagram system is used in knowing the input data needed, the process that occurs, the process results, and the interactions that occur between users of the system described in the diagram generated from the results of the interaction in Figure 1. In this study, clustering of data is obtained by using data input from 45 health care centers data from 2016 to 2018 shown in table 1.

B. Performance Analysis
First, all of data will be normalized in simplifying calculations in using the Fuzzy C-Mean method. From the input data used based on the data with the cluster method so that the data grouping will be generated. Details of the Fuzzy C-Means process are shown at figure 2. Each step will be described as follows: This section is the most important part of the FCM calculation process because it will determine the final result at the data clustering • Calculate cluster center. If the cluster center has been calculated for all data, then it will perform the objective function calculation • Next step, calculate of the matrix partition changes to the value of the objective function on the previous iteration • If the epsilon value is not fulfilled, then it will be checked whether the maximum iteration has been finished. Otherwise, recalculate the next iteration • If one of the epsilon or maximum iteration values has been fulfilled, then show the cluster results, and the FCM calculation process stops

IV. Discussion and Result
A. Set the Initial Value Before starting the process of clustering using the Fuzzy C-means method, first determine the initial value that will be used in the FCM calculation. This is important to do because it will affect the result that will be produced in this research. The initial data values shown at table 2.
From the predetermined initial values, it will affect the result of the health care center data cluster decision. Then we will know the health care center data that have the smallest number of diarrhea patients, health care center data that have succeeded in reducing the number of diarrhea sufferers, and a health care center data that are lacking in dealing with diarrhea.

B. Clustering of Health Care Center Data
Based in initial values that has been set, Fuzzy C-Means algorithm will be performed. The data will be normalized for the first time based on data shown at table I. data normalization is done by using the Min-max method where the smallest and largest values will be searched based on the data, and the calculation process is performed on each data. The result shown that data will have values between 0 to 1 from this process, making it easier for the next calculation process.
Next, generate the initial matrix partition. The data will be divided into 3 clusters then calculate toward the cluster center by calculating the degree of membership in the i cluster toward data on diarrhea sufferers. Then recalculate the square of the degree of membership of the i data to find out the value of the overall objective function. This objective function value will be used to check the value of the next objective function whether the difference in value has been below the value of the smallest error expected or not. If the difference between the two objective function values is not below the expected smallest error value, repeat the iteration and the membership degree value will use the results of the calculation of the partition matrix value µ. The calculation will stop after the difference in the value of the objective function between the iterations calculated against the previous iteration has a value below the expected smallest error value or maximum iteration.

C. Data Analysis
From the results of calculations that have been done, it is obtained the results of data clustering of health care centers in Banyuwangi District on diarrhea patient data. After finding the difference in objective function values below the expected smallest error value, the clustering decision on 45 health care center data is divided into 3 clusters. Cluster results are shown in table 3.
Based on the data obtained from the results of clustering using the Fuzzy C-Means algorithm, it was found that from 45 data centers the health care center was divided into 3 clusters. The first cluster has 11 community health care center data, i.e. Wongsorejo, Bajulmati, Paspan, Singotrunan, Gitik, Wonosobo, Kedungrejo, Tegaldlimo, Purwoharjo, Kembiritan, and Sepanjang. In this first cluster have a smallest rate who have a diarrhea sufferer for children who are under 5 years of age. Whereas the second cluster contained 12 health care center data including Klatak, Sobo, Kabat, Badean, Singojuruh, Songgon, Tembokrejo, Benculuk, Tegalsari, Genteng Kulon, Kalibaru Kulon, and Siliragung. This second cluster is a health care center data with diarrhea rates for children under 5 years that are better than the third cluster. The last cluster included 22 health care center data including Kelir, Mojopanggung, Licin, Kertosari, Gladag, Kebaman, Parijatah Kulon, Sumberberas, Tapanrejo, Kedungwungu, Grajagan, Tampo, Jajag, Yosomulyo, Sempu, Karangsari, Gendoh, Tulungrejo, Kebondalem, Sambirejo, Pesanggaran, and Sumberagung. The data in this cluster are data from community health care centers that have high rates of diarrhea for children under 5 years.

V. Conclusion
Based on the results obtained from the research that has been done, the results of grouping on health care center data in Banyuwangi Regency using the Fuzzy C-Means method can be used as a reference by the Banyuwangi Health Office in reducing the number of diarrhea sufferers for children under 5 years old, especially for health care centers that have high rates of diarrhea sufferers.
The health office can provide special assignments to the health care centers in providing counseling and education to the society on how to deal with factors especially that can cause diarrhea so that the number of diarrhea sufferers can be reduced for children under 5 years of age so that the infant mortality rate due to diarrheal diseases can be suppressed.
And from these results can be a reference for health care centers in the third cluster in improving their services to reduce the number of diarrhea sufferers. The impact of the results from this study can be used as a reference in improving national public health so that the Indonesia's health level can increase. In the next research, we will develop this method with a new approach so that the result will be even better to reduce a diarrhea patient.