Centroid-Based and Bayesian Algorithms Performance

Al-Gaphari, Ghaleb and Ba-Alwi, Fadl M. and Al Dobai, Saeed Abdullah M. (2014) Centroid-Based and Bayesian Algorithms Performance. British Journal of Mathematics & Computer Science, 4 (12). pp. 1642-1664. ISSN 22310851

[thumbnail of Al-Gaphari4122013BJMCS7897.pdf] Text
Al-Gaphari4122013BJMCS7897.pdf - Published Version

Download (626kB)

Abstract

Since, the amount of textual information available on the web is estimated by terra bytes. Then, there should be an efficient algorithm to summarize such information. The algorithm would speed up the process of information reading, information accessing and decision making process. This paper investigates Bayesian classifier (BC) and a Centroid -Based algorithm (CBA) performance in terms of Arabic text summarization problem (ATS). Both algorithms are implemented as a software program. The Centroid -Based algorithm (CBA) extracts the most important sentences in a document or a set of documents (cluster). This algorithm starts computing the similarity between two sentences and evaluating the centrality of each sentence in a cluster based on centrality graph. Then the algorithm extracts the most important sentences in the cluster to include them in a summary. Whereas the Bayesian algorithm categorizes each sentence to be in text summary or out of text summary classes depends on its features vector. Both algorithms are evaluated by human participants and by an automatic metrics. Arabic NEWSWIRE-a corpus is used as a data set in the algorithms evaluation. The F-measure is obtained for both algorithms results. The Centroid -Based algorithm records 0.7199 and the Bayesian algorithm records 0.623.Thereforethe Centroid -Based algorithm (CBA) outperforms the Bayesian algorithm. The CBA results show that, the CBA is a robust algorithm compared to BC. It show a low deviation average that means the CBA gives similar result either contains bugs or not compared to BC. It is able to compress or reduce the text into 25% of its original size without losing the main idea behind the original text. This property makes the algorithm distinguishable among others used for the same purpose. Also, it outperforms all those techniques which are included in this paper when it is used for Arabic text summarization.

Item Type: Article
Subjects: European Scholar > Mathematical Science
Depositing User: Managing Editor
Date Deposited: 05 Jul 2023 03:59
Last Modified: 01 Dec 2023 12:47
URI: http://article.publish4promo.com/id/eprint/1998

Actions (login required)

View Item
View Item