Prediction of Users’ Professional Profile in MOOCs Only by Utilising Learners’ Written Texts

  • Tahani AljohaniEmail author
  • Filipe Dwan Pereira
  • Alexandra I. Cristea
  • Elaine Oliveira
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12149)


Identifying users’ demographic characteristics is called Author Profiling task (AP), which is a useful task in providing a robust automatic prediction for different social user aspects, and subsequently supporting decision making on massive information systems. For example, in MOOCs, it used to provide personalised recommendation systems for learners. In this paper, we explore intelligent techniques and strategies for solving the task, and mainly we focus on predicting the employment status of users on a MOOC platform. For this, we compare sequential with parallel ensemble deep learning (DL) architectures. Importantly, we show that our prediction model can achieve high accuracy even though not many stylistic text features that are usually used for the AP task are employed (only tokens of words are used). To address our highly unbalanced data, we compare widely used oversampling method with a generative paraphrasing method. We obtained an average of 96.4% high accuracy for our best method, involving sequential DL with paraphrasing overall, as well as per-individual class (employment statuses of users).


Imbalanced data MOOCs Deep Learning Author Profiling 



This work was funded by Ministry of Education of Saudi Arabia.


  1. 1.
    Almatrafi, O., Johri, A.: Systematic review of discussion forums in massive open online courses (MOOCs). IEEE Trans. Learn. Technol. PP, 1 (2018)Google Scholar
  2. 2.
    Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  3. 3.
    Chen, G., et al.: Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: IJCNN Proceedings (2017)Google Scholar
  4. 4.
    Cliche, M.: BB_twtr at SemEval-2017 task 4: Twitter sentiment analysis with CNNs and LSTMs. In: ACL Proceedings, pp. 573–580 (2017)Google Scholar
  5. 5.
    Cohen, J.: Statistical Power Analysis for the Behavioural Sciences. Routledge, New York (2013)CrossRefGoogle Scholar
  6. 6.
    Gamallo, P., Almatarneh, S.: Naive-Bayesian classification for bot detection in twitter notebook for PAN at CLEF 2019. In: CEUR Proceedings (2019)Google Scholar
  7. 7.
    Ganitkevitch, J., Callison-Burch, C.: The multilingual paraphrase database. In: LREC (2014)Google Scholar
  8. 8.
    Gardner, J., Brooks, C.: Student success prediction in MOOCs. User Model. User-Adapt. Interact. 28, 127–203 (2017)CrossRefGoogle Scholar
  9. 9.
    Kellogg, S., et al.: A social network perspective on peer supported learning in MOOCs for educators. Int. Rev. Res. Open Distance Learn. 15, 263–289 (2014)CrossRefGoogle Scholar
  10. 10.
    Kovács, G., et al.: Author profiling using semantic and syntactic features notebook for PAN at CLEF 2019. In: CEUR Proceedings (2019)Google Scholar
  11. 11.
    Liu, H., et al.: Ensemble learning approaches. In: Rule Based Systems for Big Data, pp. 63–73 (2016)Google Scholar
  12. 12.
    Raghunadha Reddy, T., et al.: A survey on Authorship Profiling techniques. Int. J. Appl. Eng. Res. 11(5), 3092–3102 (2016)Google Scholar
  13. 13.
    Rangel, F., Rosso, P.: Overview of the 7th author profiling task at PAN 2019: bots and gender profiling. In: CEUR Proceedings (2019)Google Scholar
  14. 14.
    Reich, J., Tingley, D., Leder-Luis, J., Roberts, M.E., Stewart, B.M.: Computer-assisted reading and discovery for student generated text in massive open online courses. J. Learn. Anal. 2, 156–184 (2015)CrossRefGoogle Scholar
  15. 15.
    Sezerer, E., et al.: A Turkish dataset for gender identification of Twitter users. In: ACL, LAW XII, pp. 203–207 (2019)Google Scholar
  16. 16.
    Vogel, I., Jiang, P.: Bot and gender identification in Twitter using word and character N-Grams notebook for PAN at CLEF 2019. In: CEUR Proceedings (2019)Google Scholar
  17. 17.
    Wassertheil, S., Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Biometrics (1970)Google Scholar
  18. 18.
    Yin, W., et al.: Comparative study of CNN and RNN for natural language processing. CoRR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Tahani Aljohani
    • 1
    Email author
  • Filipe Dwan Pereira
    • 2
  • Alexandra I. Cristea
    • 1
  • Elaine Oliveira
    • 2
  1. 1.Department of Computer ScienceDurham UniversityDurhamUK
  2. 2.Institute of ComputingFederal University of RoraimaBoa VistaBrazil

Personalised recommendations