Churn Prediction Task in MOOC
Source: By:Author(s)
DOI: https://doi.org/10.30564/jcsr.v1i1.537
Abstract: Churn prediction is a common task for machine learning applications in business. In this paper, this task is adapted for solving problem of low efficiency of massive open online courses (only 5% of all the students finish their course). The approach is presented on course “Methods and algorithms of the graph theory” held on national platform of online education in Russia. This paper includes all the steps to build an intelligent system to predict students who are active during the course, but not likely to finish it. The first part consists of constructing the right sample for prediction, EDA and choosing the most appropriate week of the course to make predictions on. The second part is about choosing the right metric and building models. Also, approach with using ensembles like stacking is proposed to increase the accuracy of predictions. As a result, a general approach to build a churn prediction model for online course is reviewed. This approach can be used for making the process of online education adaptive and intelligent for a separate student. References:[1] Lisitsyna L.S., Efimchik E.А., “Making MOOCs more effective and adaptive on the basis of SAT and game mechanics”, Smart Education and e-Learning, 2018, 75, 56-66. [2] Liubov S. Lisitsyna, Andrey V. Lyamin, Ivan A. Martynikhin, Elena N. Cherepovskaya, “Situation Awareness Training in E-Learning”, Smart Education and Smart e-Learning, 2015, 41, 273-285. [3] Liubov S. Lisitsyna, Andrey V. Lyamin, Ivan A. Martynikhin, Elena N. Cherepovskaya, “Cognitive Trainings Can Improve Intercommunication with e-Learning System”, 6th IEEE international conference series on Cognitive Infocommunications, 2015, 39-44. [4] Lisitsyna L.S., Pershin A.A., Kazakov M.A., “Game Mechanics Used for Achieving Better Results of Massive Online”, Smart Education and Smart e-Learning, 2015, 183-193. [5] A. Pradeep, S. Das, J. J. Kizhekkethottam, A. Cheah et al., "Students dropout factor prediction using EDM techniques", Proc. IEEE Int. Conf. Soft-Computing Netw. Secur. ICSNS 2015, pp. 544-547, 2015. [6] Lisitsyna L.S., Efimchik E.A., “Designing and application of MOOC «Methods and algorithms of graph theory» on National Platform of Open Education of Russian Federation”, Smart Education and e-Learning, 2016, 59, 145-154. [7] Lisitsyna L.S., Efimchik E.A., “An Approach to Development of Practical Exercises of MOOCs based on Standard Design Forms and Technologies”, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2017, 180, 28-35. [8] Laurens van der Maaten, Geoffrey Hinton, “Visualizing Data using t-SNE”, Journal of Machine Learning Research, 2008, 9, 2579-2605. [9] Laurens van der Maaten, “Accelerating t-SNE using Tree-Based Algorithms”, Journal of Machine Learning Research, 2014, 15, 3221-3245. [10] Tom Fawcett, An introduction to ROC analysis, In: Pattern Recognition Letters, 27 (2006), 861–874. [11] D. Michie, D. J. Spiegelhalter, C. C. Taylor, and J. Campbell, editors. Machine learning, neural and statistical classification. Ellis Horwood, Upper Saddle River, NJ, USA, 1994. ISBN 0-13-106360-X. [12] Chao-Ying Joanne, Peng Kuk Lida Lee, Gary M. Ingersoll, “An Introduction to Logistic Regression Analysis and Reporting”, The Journal of Educational Research, 2002, 96, 3-14. [13] Marijana Zekić-Sušac, Nataša Šarlija, Adela Has, Ana Bilandžić, “Predicting company growth using logistic regression and neural networks”, Croatian Operational Research Review, 2016, 7, 229–248. [14] Boulesteix, A.-L., Janitza, S., Kruppa, J. and König, I.R. (2012). Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 6, pp. 493–507. [15] Jerome H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, In: The Annals of Statistics, 2001, 5, 1189-1232. [16] Friedman, Jerome H., “Stochastic gradient boosting”, Computational Statistics and Data Analysis, 2002, 38(4), 367–378. [17] Smyth, P. and Wolpert, D. H., “Linearly Combining Density Estimators via stacking”, Machine Learning Journal, 1999, 36, 59-83. [18] David Opitz, Richard Maclin, Popular Ensemble Methods: An Empirical Study, Journal of Artificial Intelligence Research, 1999, 11, 169-198. [19] Chang, Yin-Wen; Hsieh, Cho-Jui; Chang, Kai-Wei; Ringgaard, Michael; Lin, Chih-Jen (2010). "Training and testing low-degree polynomial data mappings via linear SVM". Journal of Machine Learning Research. 11: 1471–1490.