Comparative Study of Five Summarization Approaches for Arabic Documents Using Text Classification

Khaled Alwesabi, Abdullah Ayedh, Yahya Al-Ashmoery

  • Khaled Alwesabi Al-Razi University
  • Abdullah Ayedh YemenSoft Company Yiwu- China
  • yahya al-ashmori Al-Razi University
  • Hisham Haider Yusef Sa’ad Al-Razi University
الكلمات المفتاحية: Text summarization, Text Classification, Lakhas method,, Support victor machine (SVM, Centroid Based method




Abstract— Text documents are continuously increasing every day so that long time will be spent to deal with all those documents, in addition, text summarizing reduces the required time and efforts needed to explore and identify the most relevant and salient parts of the body of text. Moreover, text classification helped in facilitating access to study required fields quickly.  We introduced five methods for summarizing Arabic documents to get best method with high efficiency and accuracy. The summarization methods used in this paper are LexRank, Degree Centrality, Continuous LexRank, Centroid Based and Lakhas, while classification method used is supported factor machine (SVM). We examined whether the use of document classification to evaluate what the best method for Arabic document summarization. In other words, we get best approach to summarize document through classification. The summarizer performance is evaluated in terms of the efficiency and accuracy by precision, recall, and the execution time. Finally, a comparison between the summarization methods using the classification is conducted. Experimental results show that the summarization by Centroid Based method then classification can achieve an accuracy by more than 96.96% in a time of 03:42 minutes comparing with other summarization methods. Classification efficiency is also significantly improved when the classification is based on summaries especially when Centroid Based method has been used, rather than full-length documents. In addition, memory space required and run time for classifying summarized documents are less than the memory and time needed for classifying full documents.