Extractive Text Summarization of Student Essay Assignment Using Sentence Weight Features and Fuzzy C-Means

display essential


Introduction
The rapid development of information technology makes a massive increase in the number of digital texts. According to MGI research [1] that in 2010, people in the world stored more than 6 exabytes of digital text data on devices such as personal computers (PCs), notebooks, and mobile phones. Digital text is text in digital format in the form of a representation of the binary alphabet [2]. It can be found in article documents, news, books, scientific papers, or student assignments collected online. These documents often contain long text, so it requires time to filter the information in it. One example, it takes a lot of time for lecturers to give a proper assessment of student essay assignment documents. This because they have to read all the documents to get the core information. Text summarization becomes one of the quick and appropriate solutions to find the core sentences of the essay document so that it can speed up the assessment process.
Summarizing the text is to condense it into a shorter version while maintaining the primary information and its overall meaning. It is challenging to summarize large text documents manually [3]. Therefore, this study proposes an automatic digital text summarization system to summarize student essay assignment documents and display essential sentences. This study uses the technique of extracting materials extensively with the Fuzzy C-Means method, which is to give sentence weight in each sentence the student answers.
Previous studies on summarizing documents used as a basis for this research include Research [9], which implements Fuzzy C-Means to summarize English-language journal documents. The application created can summarize with input pdf format text. Sentence weighting uses the TF IDF method.
Research [10] developed a text summarization system model consisting of four stages. The preprocessing step changes the unstructured text into the structured one.
Other research [11] proposed the topic of sentence-based Bayes models to summarize documents using terms and associations. An efficient variational Bayesian algorithm is derived for estimating model parameters. Experimental results on benchmark data sets indicate the effectiveness of the proposed model for multi-document summarizing tasks.
Other studies [12] propose making automatic document summaries using word features and the K-Means method. Automatic document summaries can be used to get text summaries quickly, making it easier for users to get the primary information from a document. This study summarizes documents using word features and the K-Means method.
Previous studies generally use documents such as articles and news which have no multi-topic content and use different languages. As for materials such as essay assignment documents or student scientific papers consisting of several questions and answers, at this time, the summarizing has not been done much. Based on these reasons, research on the automatic summarizing of texts for student essay assignment documents using the Fuzzy C-Means method and weighting sentences (TF ISF) be raised. This research used the data from the assignment of the Udayana University Department of Information Technology students. All of the data are in Indonesian.

Method
Research on Extractive Text Summarization of Student Essay Assignment Using Sentence Weight Features and Fuzzy C-Means has the following initial stages: (1) Conduct a literature study on the concept of summarization and the Fuzzy C-Means method; (2) The collection of test data sourced from digital texts of student essay assignments collected through e-learning systems; (3) Preparation of test data before processing by the system includes data acquisition, data cleaning and conversion of file formats to .txt.
After these stages are carried out, the next step is to design the system flow design and implement the plan. Fig. 1 shows the flow of the system starting from the text of the document to be summarized; of course, it has been through the stage of preparing test data. In the next step, the test data enters the summarization system that begins with indexing, which is the process of finding all the features in the text, both word features, punctuation features, numeric features, and others in the document.
Furthermore, this study has two main processes, namely the summarization process and the results of the summarization evaluation process. The Summarization with extractive method has three general procedure steps, as below [13] [14] Step 1: The first step is making a document representation. Preprocessing techniques performed in this level, including tokenization, stop word removal, stemming, sentence separation, frequency calculation, and others.
Step 2: The sentence assessment is being done in this step. Three approaches are followed: • Weighting of words to determine which are the important words by using the TF-IDF method • Weighting of sentences such as verifying sentence features (their position in the document, similarity to titles, sentences containing important words) by using the TS-ISF method.
• Graph scoring to analyze the relationships between sentences.
Step 3: In this step, the sentences are sorted, then take the highest score as the final summary in a single document The Summarization Process on this study as presented in Fig. 2 has several stages, such as Indexing, Clustering Process with Fuzzy C-Means, and Sentence Weighting in Clusters. The system evaluation process is comparing results from the summary of the system with the manual review by the expert.

A. Indexation
The indexing process starts from inputting the .txt file as text to be summarized into the system. This process begins by separating each sentence. Separation of sentences is done by using several indicators such as periods (.), Exclamation points (!), And question marks (?). Examples of difficulties encountered are the use of period punctuation (.), it is not only used when ending a sentence but also used to abbreviate names and others. In this process, case folding is carried out to change all the letters into a uniform form. In this study, all words uniformed in lower cases. Furthermore, words are taken in each sentence (tokenization) and discard the words that are the words that appear most often but do not have significant meaning (stop word). The list of stop words taken from the Indonesian stop words database. Next is the stemming process, which is the process of removing affixes to get essential words in Indonesian. This system uses a library from Sastrawi [15].

B. Clustering Sentences with Fuzzy-C-Means Method
The sentence clustering process is needed to organize extensive data collections by partitioning several data sets automatically, so objects that have similarities will be grouped into a group that is different from other groups [5] [16].
The results of the preprocessing stage are used to form clusters that contain sentences whose features are close together. Basically, the Fuzzy C-Means (FCM) algorithm has a lot in common with K-Means. The output of FCM is not a fuzzy inference system, but a row of cluster centers and several membership degrees for each data point [17]. The FCM grouping procedure is as follows: 2. Initialize the membership data as matrix form, X, with the size of n x m (n = number of data samples, m = data attributes), where Xij = index of data sample (i = 1,2,…, n) and attributes (j = 1,2,…,m). The membership value derived from the frequency of words in each sentence.
3. Generating random values from 0 to 1 for each member of the group, μik, i = 1,2,…,n; k = 1,2,…,c; as an elements of initial partition matrix u. μik. Calculate the total number of each column (attribute) with the following formula: (1) Qj is degrees of membership per column = 1 with j=1,2,…m, then calculate with the following formula: ( 4. Calculate the centroids of k: Vkj, with k = 1,2,…,c; and j = 1,2,…,m and value of μik from formula 2. ( 5. Calculate the value of the objective function in the t iteration, Pt. The objective function is used to get the right cluster center. It means the cluster of data in last iteration was achieved. (4) i = 1,2,…,n; k = 1,2,…,m 6. Calculate the difference of membership level for each data in partition matrix: with: i=1,2,…n; and k=1,2,..c.
7. Iteration end if the difference of objective function value less than the Error threshold value (|Pt -Pt-1| < ) or the number of iterations has passed the maximum iteration limit (t > MaxIter). If the above condition not met, t = t + 1 and repeat to step 4.

C. Sentence Weighting with TS-ISF
Term frequency weighting -inverse sentence frequency (TF-ISF) done after getting all the iteration process completed. Before searching for the TF-ISF value, the word weight value must be found using the TF-IDF method [17]. This method is a numerical value to represent how importance the words is in the whole document.TF shows word frequency in documents. IDF is a measure to reduce the weight of words that often appear in the corpus and increase the weight of words that rarely occur.
At this stage, the TF-ISF value of each sentence will be added up and used as the value of a sentence, which will be used at the sentence selection stage in each cluster. The ISF and TF-ISF equations can be seen in the following equations number 1 and 2 [18]: With TFt,s is the frequency of occurrence of the word t in the sentence s, N is the number of sentences in the document, and ISFt is the number of sentences containing the word t. The TFISFt,s value, s, will be high if the word t appears several times in a sentence and rarely appears in another sentence, and low if the word t appears almost in the entire sentence [19].
The final stage of the summarization process is to take the sentence that has the highest sentence weight for each cluster. The number of sentences taken as a result of summarization depends on the selected concise level value, which starts from 10% -50%. For example, if there are 20 sentences in the text tested with a summarization rate of 40%, the system will create 8 clusters (the user selected), then one sentence which is the closest to its center will be taken from each cluster.

D. System Evaluation Techniques
The process carried out in the system evaluation stage is to compare the results of summarization by the system with the manual version. There are several evaluation techniques in measuring the quality of the sentence grouping model, including information matrix, misclassification index, purity, f-measure [20]. This study uses the F-Measure Technique in order to get the value of accuracy, precision, and recall, and f-measure of the summary results issued by the system, this is used as an indicator of the results of this research. The flow of the system evaluation process in this study is shown in Fig. 3. The f-measure measurement is based on the precision and recall values obtained. The concepts of recall and precision can be seen in Table 1 below. The recall is the proportion of sentences rediscovered as a summary, and precision is the proportion of the number of sentences found and considered relevant [21]. From Table 1, the formula for calculating recall, precision, accuracy, and f-measure values described as follows. (3) True positive (tp) is a sentence that is in the expert summary and appears in the system summary. False positives (fp) sentences that are not in the expert summary but appear in the system summary. False-negative (fn) is a sentence that is in the expert summary but does not appear in the system summary. And true negative (tn) is a sentence that is not in the experts and system summary.

Result and Discussion
The summarization system for student essay assignment documents automatically uses the Fuzzy C-Means method built using the Python programming language with WX packet as the system interface. The overall process interface of this system can be seen in Fig. 4.

A. Test Data Collection
The document test data used in this study was obtained from student assignments collected in the e-learning system (ELSE U) at Udayana University in 2019. Document collection is done by downloading randomly without regard to any indicators. The student essay document was achieved from 10 student in the e-learning system with the same assignment topic.
The entire document was selected by looking at the length of the students answers to make it more reliable when tested. The selection process gives ten documents, which are then converted manually by copy-pasting each sentence in the PDF document in accordance with the rules that have been made into a .txt format file.
This collection process takes a long time because each sentence must be checked and adjusted to the rules that have been made i.e., a collection of sentences must be complete, more than three sentences in 1 paragraph, and not a sentence that explains the tables, pictures or formula equations. The results of the conversion resulted in an average of 113 words omitted in each document. The comparison of the number of words after and before the conversion can be seen in Fig. 5 below. formulas because these words are not reliable. So, the remaining 93.13% of the document content is included in the process of converting documents or test data. Test data also used as a manual summarization material conducted by experts to be used as a comparison with the results of a summary of the system to carry out the evaluation process. In this study, it assumed that the results of the manual review are good. Fig. 6 shown the result of testing data that was processed by format conversion and word reduction as mention before. Fig. 6 Example of testing data that converted to .txt format From 10 documents that have been converted, paragraphs that have a minimum 4 sentences are selected, so that each n cluster produced by the Fuzzy C-Means clustering process can be filled with correct members, that will have selected 1 sentence for every 1 cluster [22]. Fuzzy C-Means clustering is also useful for researcher to create variations for ideal extractive summaries [23]. From the process of taking this paragraph, it produces 105 text which will be the test data to be processed in testing the automatic text summarization system using this Fuzzy C-Means method.

B. Implementation of Text Summarization
System implementation is the stage after carrying out the system design process to complete the system to be ready for use and test functionality is done to make sure the algorithm of Fuzzy C-Means with the TF-ISF sentence weighting can produce clusters containing representing sentences to become a summary based on the summary level chosen. This study uses the complete parameters of the Fuzzy C-Means method for this experiment, namely the error threshold (the smallest error expected) = 1,6 x10-5, maximum iteration = 100, and (w)= 2. Fig. 7 displays a system flow diagram that describe program flow from input document until the output process. The first stage of this summarizing process is to input the file that will be summarized by selecting the test data file that has been prepared. In the second stage, choose the summary rate with a range from 10% to 50%. This level of summarization affects the results of summarization, where the smaller the level of summarization chosen, the number of sentences produced will also be less.
The third stage, when the user presses the process button, will get the results in the form of a list of the frequency of words contained in the document and stop word obtained from the test data. The next step is to calculate the results of the weighting sentences to determine the ranking in each cluster. Sometimes some clusters do not have members because the number of sentences tested is not enough to form the cluster that determined at the beginning. If this happens, then the sentence that will have the second-highest closeness of each cluster has members up to the number of sentences selected according to the value of the rank level. The final stage of this system is to bring up the summarized results from the tested inputs.

C. System Evaluation Results
The system evaluation in this study aims to measure the success rate of summary results issued by the system compared to manual summaries by experts. Experts do the manual review by summarizing the contents of documents based on their knowledge to choose which sentences are considered as good as summaries.
Testing is done with a concise level of 10%, 20%, 30%, 40% and 50% for each data. This value is said to be sufficient because a summary is said to be good enough if the summary results are not more than 50% of the entire contents of the automatic text document on the system interface that has been provided [3]. The number of test data is 10. The graph in Fig. 8 shows that a compression rate of 10% has a high accuracy value, but the other value is very low. This means that there is refraction, that is, sentences that should be of high weight cannot be brought up by the system. The best results in this study were obtained at a compression rate of 50%, although the accuracy value decreased the value of precision and recall increased so that the bias that occurs as a compression rate of 10% becomes smaller. Table 2 shows the average results of the system evaluation tests obtained. This indicates that the Fuzzy C-Means method succeeds in summarizing the text in documents by using the sentence weight feature (TF-ISF) as an indicator of the closeness of the sentence to the center of the cluster. Evaluation values obtained are precision values of 0.52, recall of 0.54, the accuracy of 0.70, and fmeasure of 0.52. If the average values of precision, recall, accuracy, and f-measure are compared, they look like in Fig. 8.
The evaluation carried out in this study is very dependent on the manual summary. Manual summary based on the wishes of each expert to choose which sentence is considered suitable as a summarization, this results in different results if the experts are different.
The higher level of summarization makes the number of sentences that come out as a result of the system summary also increases. Meanwhile, the number of summary sentences issued by the system and the good manual summary also greatly influences the value of tp, fn, fp, and tn. The highest accuracy value in this study obtained from the lowest level of summarization; this is because the value of tn and tp tends to decrease in the summary results that produce a large number of sentences (high summarization rate value). Vice versa, if the level of summarization is low, then the number of summary sentences produced will be less so that the accuracy value tends to be higher.
From the evaluation that have been done, the Fuzzy C-Means with sentence weighting has a fairly good accuracy in processing cluster formation of data, and also proves that the Fuzzy C-Means method is dependent on the type of data. If the data character is linear with many dimension, Fuzzy C-Means will get faster performance and better accuracy [24] [25].

Conclusion
Based on the results of the trials conducted, it can be concluded that the clustering text using Fuzzy C-Means makes the summary results change every time the experiment so that the evaluation result level of the summary always changes or is not static because taking the initialization cluster center point is done randomly. In addition, the results of system evaluations also depend on the quality of the manual summaries produced by experts. The results of the system evaluation of 10 test data get an average value of the overall level of compression (compression rate) for the precision of 0.52, recall of 0.54, the accuracy of 0.70, and f-measure of 0.52. These results indicate that the fuzzy c-means performance with TF-ISF sentence weighting that used to produce extractive text summaries cannot be said to be optimal, in further studies it can be developed by applying an optimization method that works with the Fuzzy C-Means method.