A New Model for Automatic Text Classification
Source: By:Hekmatullah Mumivand, Rasool Seidi Piri, Fatemeh Kheiraei
DOI: https://doi.org/10.30564/ese.v3i1.3170
Abstract: In this paper,a new method for automatic classification of texts is presented.This system includes two phases;text processing and text categorization.In the first phase,various indexing criteria such as bigram,trigram and quad-gram are presented to extract the properties.Then,in the second phase,the W-SMO machine learning algorithm is used to train the system.In order to evaluate and compare the results of the two criteria of accuracy and readability,Macro-F1 and Micro-F1 have been calculated for different indexing methods. The results of experiments performed on 7676 standard text documents of Reuters showed that the best performance is related to w-smo bigram criteria with accuracy of 95.17 micro and 79.86 macro.Also,the results indicated that our proposed method has the best performance compared to the W-j48,Naïve Bayes,K-NN and Decision Tree algorithms. References:[1] Weiyu Zhang; Can Xu, ” Microblog Text Classification System Based on Text CNN and LSA Model”,5th International Conference on Information Science,Computer Technology and Transportation (ISCTT),2020. [2] XiaoyuLuo, ” Efficient English text classification using selected Machine Learning Techniques”,Alexandria Engineering Journal, Volume 60, Issue 3, Pages 3401-3409, June 2021. [3] Y. Lin,Y. Qu, Z. Wang, ”A Novel Feature Selection Algorithm for Text Categorization”, Expert Systems with Applications, Vol. 33, pp(1-5), 2007. [4] http://www.daviddlewis.com/resources/testcollections/reuters21578/. [5] http://www.rapidi.com. [6] C. H. Wan, L. H. Lee , R. Rajkumar , D. Isa,” A Hybrid Text Classification Approach with Low Dependency on Parameter by Integrating K-nearest neighbor and Support Vector Machine”, Elsevir 2012. [7] J. Sreemathy, P. S. Balamurugan,” An Efficient Text Classification Using KNN and Naïve Bayesian”,International Journal on Computer Science and Engineering (IJCSE), Vol. 4 No. 03, March 2012. [8] Li Y. H. and Jain A. K. , “Classification of text documents”.The Computer Journal 41( 8),pp.537-546,1998. [9] A. Guran, S. Akyokus, N. G. Bayazit, M. Zahidbgurbuz, ”Turkish Text Categorization Using n-gram word”, International Symposium on Innovations in Intelligent Systems and Applicaitons, June 29 – July 1, 2009. [10] Wan, C. H., et al. “A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine”. Expert Systems with Applications (2012).DOI: 10.1016/j.eswa.2012.02.068. Elsevir 2012. [11] Cavnar, William B., “N-Gram-Based Text Filtering For TREC-2,” to appear in the proceedings of The Second Text Retrieval Conference (TREC-2), ed. by,Harman, D.K., NIST, Gaithersburg, Maryland,1993. [12] C. H. Wan, L. H. Lee , R. Rajkumar , D. Isa,” A Hybrid Text Classification Approach with Low Dependency on Parameter by Integrating K-nearest neighbor and Support Vector Machine”, Elsevir 2012. [13] Y.Huang, ”Support Vector Machines for Text Categorization Based on Latent Semanticindexing”,Technical report, Electrical and Computer Engineering Department, Johns Hopkins University. [14] Sebastiani, F “Machine Learning in Automated Text Categorization”, ACM Computing Surveys,Vol. 34,No.1, pp. 107-131, 2002. [15] M.H. Aghdam,N. Ghasem-Aghaee,M.E. Basiri.” Text feature selection using ant colony optimization”, Expert Systems with Applications,PP(6843–6853),2009.