MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content

Jiang, Zhuoxuan; Zhang, Yan; Li, Xiaoming

doi:10.1007/978-3-319-55705-2_24

MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content

Zhuoxuan Jiang¹⁷,
Yan Zhang¹⁷ &
Xiaoming Li¹⁷

Conference paper
First Online: 22 March 2017

1749 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10179))

Abstract

Recent years have witnessed the rapid development of Massive Open Online Courses (MOOCs). MOOC platforms not only offer a one-stop learning setting, but also aggregate a large number of courses with various kinds of textual content, e.g. video subtitles, quizzes and forum content. MOOCs are also regarded as a large-scale ‘knowledge base’ which covers various domains. However, all the contents generated by instructors and learners are unstructured. In order to process the data to be structured for further knowledge management and mining, the first step could be concept extraction. In this paper, we expect to utilize human knowledge through labeling data, and propose a framework for concept extraction based on machine learning methods. The framework is flexible to support semi-supervised learning, in order to alleviate human effort of labeling training data. Also course-agnostic features are designed for modeling cross-domain data. Experimental results demonstrate that only 10% labeled data can lead to acceptable performance, and the semi-supervised learning method is comparable to the supervised version under the consistent framework. We find the textual contents of various forms, i.e. subtitles, PPTs and questions, should be separately processed due to their formal difference. At last we evaluate a new task: identifying needs of concept comprehension. Our framework can work well in doing identification on forum content while learning a model from subtitles.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Stanford Log-linear Part-Of-Speech Tagger: http://nlp.stanford.edu/software/tagger.shtml.
2.
Word2Vec: https://code.google.com/p/word2vec/.
3.
Stanford Chinese word segment:http://nlp.stanford.edu/software/segmenter.shtml.
4.
Stanford Chinese Named Entity Recognizer (NER): http://nlp.stanford.edu/software/CRF-NER.shtml.
5.
Terminology Extraction by Translated Labs: http://labs.translated.net/terminology-extraction/.

References

Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Engaging with massive online courses. In: WWW 2014, pp. 687–698 (2014)
Google Scholar
Bin, Y., Shichao, C.: Term extraction method based on mutual information with threshold interval. In: Zhang, J. (ed.) ICAIC 2011. CCIS, vol. 227, pp. 186–194. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23226-8_25
Chapter Google Scholar
Chang, P.C., Galley, M., Manning, C.: Optimizing Chinese word segmentation for machine translation performance. In: WMT 2008, pp. 224–232 (2008)
Google Scholar
Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7(2), 239–257 (2002)
Article Google Scholar
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Zhang, S.S.W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD 2014, pp. 601–610 (2014)
Google Scholar
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the c-value/nc-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000)
Article Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: ACL 2014, pp. 1262–1273 (2014)
Google Scholar
Huang, J., Dasgupta, A., Ghosh, A., Manning, J., Sanders, M.: Superposter behavior in MOOC forums. In: L@S 2014, Atlanta, GA, pp. 117–126, March 2014
Google Scholar
Jiang, Z., Zhang, Y., Liu, C., Li, X.: Influence analysis by heterogeneous network in MOOC forums: what can we discover? In: EDM 2015, Madrid, Spain, pp. 242–249, June 2015
Google Scholar
Justesona, J.S., Katza, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001, pp. 282–289 (2001)
Google Scholar
Liu, A., Jun, G., Ghosh, J.: A self-training approach to cost sensitive uncertainty sampling. Mach. Learn. 76(2–3), 257–270 (2009)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Workshop at ICLR 2013, pp. 1–12 (2013)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investig. 30(1), 3–26 (2007)
Article Google Scholar
Nojiri, S., Manning, C.D.: Software document terminology recognition. In: AAAI Spring Symposium, pp. 49–54 (2015)
Google Scholar
Qin, Y., Zheng, D., Zhao, T., Zhang, M.: Chinese terminology extraction using EM-based transfer learning method. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 139–152. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37247-6_12
Chapter Google Scholar
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL 2009, pp. 147–155 (2009)
Google Scholar
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49 (2004)
Google Scholar
Sutton, C., McCallum, A.: An introduction to conditional random fields. Mach. Learn. 4(4), 267–373 (2011)
Article MATH Google Scholar
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL 2003, pp. 252–259 (2003)
Google Scholar
Wang, X., Yang, D., Wen, M., Koedinger, K., Rosé, C.P.: Investigating how studentąŕs cognitive behavior in MOOC discussion forums affect learning gains. In: EDM 2015, Madrid, Spain, pp. 226–233, June 2015
Google Scholar
Wen, M., Yang, D., Rose, C.: Sentiment analysis in MOOC discussion forums: what does it tell us? In: EDM 2014, pp. 130–137 (2014)
Google Scholar

Download references

Acknowledgments

This research is supported by NSFC with Grant No. 61532001 and No. 61472013, and MOE-RCOE with Grant No. 2016ZD201.

Author information

Authors and Affiliations

School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Zhuoxuan Jiang, Yan Zhang & Xiaoming Li

Authors

Zhuoxuan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuoxuan Jiang .

Editor information

Editors and Affiliations

Royal Melbourne Institute of Technology , Melbourne, Australia
Zhifeng Bao
Northwestern University , Evanston, Illinois, USA
Goce Trajcevski
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, Z., Zhang, Y., Li, X. (2017). MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-55705-2_24
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55704-5
Online ISBN: 978-3-319-55705-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics