Skip to main content

MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10179))

Abstract

Recent years have witnessed the rapid development of Massive Open Online Courses (MOOCs). MOOC platforms not only offer a one-stop learning setting, but also aggregate a large number of courses with various kinds of textual content, e.g. video subtitles, quizzes and forum content. MOOCs are also regarded as a large-scale ‘knowledge base’ which covers various domains. However, all the contents generated by instructors and learners are unstructured. In order to process the data to be structured for further knowledge management and mining, the first step could be concept extraction. In this paper, we expect to utilize human knowledge through labeling data, and propose a framework for concept extraction based on machine learning methods. The framework is flexible to support semi-supervised learning, in order to alleviate human effort of labeling training data. Also course-agnostic features are designed for modeling cross-domain data. Experimental results demonstrate that only 10% labeled data can lead to acceptable performance, and the semi-supervised learning method is comparable to the supervised version under the consistent framework. We find the textual contents of various forms, i.e. subtitles, PPTs and questions, should be separately processed due to their formal difference. At last we evaluate a new task: identifying needs of concept comprehension. Our framework can work well in doing identification on forum content while learning a model from subtitles.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Stanford Log-linear Part-Of-Speech Tagger: http://nlp.stanford.edu/software/tagger.shtml.

  2. 2.

    Word2Vec: https://code.google.com/p/word2vec/.

  3. 3.

    Stanford Chinese word segment:http://nlp.stanford.edu/software/segmenter.shtml.

  4. 4.

    Stanford Chinese Named Entity Recognizer (NER): http://nlp.stanford.edu/software/CRF-NER.shtml.

  5. 5.

    Terminology Extraction by Translated Labs: http://labs.translated.net/terminology-extraction/.

References

  1. Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Engaging with massive online courses. In: WWW 2014, pp. 687–698 (2014)

    Google Scholar 

  2. Bin, Y., Shichao, C.: Term extraction method based on mutual information with threshold interval. In: Zhang, J. (ed.) ICAIC 2011. CCIS, vol. 227, pp. 186–194. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23226-8_25

    Chapter  Google Scholar 

  3. Chang, P.C., Galley, M., Manning, C.: Optimizing Chinese word segmentation for machine translation performance. In: WMT 2008, pp. 224–232 (2008)

    Google Scholar 

  4. Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7(2), 239–257 (2002)

    Article  Google Scholar 

  5. Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Zhang, S.S.W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD 2014, pp. 601–610 (2014)

    Google Scholar 

  6. Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the c-value/nc-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000)

    Article  Google Scholar 

  7. Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: ACL 2014, pp. 1262–1273 (2014)

    Google Scholar 

  8. Huang, J., Dasgupta, A., Ghosh, A., Manning, J., Sanders, M.: Superposter behavior in MOOC forums. In: L@S 2014, Atlanta, GA, pp. 117–126, March 2014

    Google Scholar 

  9. Jiang, Z., Zhang, Y., Liu, C., Li, X.: Influence analysis by heterogeneous network in MOOC forums: what can we discover? In: EDM 2015, Madrid, Spain, pp. 242–249, June 2015

    Google Scholar 

  10. Justesona, J.S., Katza, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)

    Google Scholar 

  11. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001, pp. 282–289 (2001)

    Google Scholar 

  12. Liu, A., Jun, G., Ghosh, J.: A self-training approach to cost sensitive uncertainty sampling. Mach. Learn. 76(2–3), 257–270 (2009)

    Article  Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Workshop at ICLR 2013, pp. 1–12 (2013)

    Google Scholar 

  14. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investig. 30(1), 3–26 (2007)

    Article  Google Scholar 

  15. Nojiri, S., Manning, C.D.: Software document terminology recognition. In: AAAI Spring Symposium, pp. 49–54 (2015)

    Google Scholar 

  16. Qin, Y., Zheng, D., Zhao, T., Zhang, M.: Chinese terminology extraction using EM-based transfer learning method. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 139–152. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37247-6_12

    Chapter  Google Scholar 

  17. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL 2009, pp. 147–155 (2009)

    Google Scholar 

  18. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49 (2004)

    Google Scholar 

  19. Sutton, C., McCallum, A.: An introduction to conditional random fields. Mach. Learn. 4(4), 267–373 (2011)

    Article  MATH  Google Scholar 

  20. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL 2003, pp. 252–259 (2003)

    Google Scholar 

  21. Wang, X., Yang, D., Wen, M., Koedinger, K., Rosé, C.P.: Investigating how studentąŕs cognitive behavior in MOOC discussion forums affect learning gains. In: EDM 2015, Madrid, Spain, pp. 226–233, June 2015

    Google Scholar 

  22. Wen, M., Yang, D., Rose, C.: Sentiment analysis in MOOC discussion forums: what does it tell us? In: EDM 2014, pp. 130–137 (2014)

    Google Scholar 

Download references

Acknowledgments

This research is supported by NSFC with Grant No. 61532001 and No. 61472013, and MOE-RCOE with Grant No. 2016ZD201.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhuoxuan Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Jiang, Z., Zhang, Y., Li, X. (2017). MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55705-2_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55704-5

  • Online ISBN: 978-3-319-55705-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics