Skip to main content

Stratified Learning for Reducing Training Set Size

  • Conference paper
  • First Online:
Intelligent Tutoring Systems (ITS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9684))

Included in the following conference series:

Abstract

Educational standards put a renewed focus on strengthening students’ abilities to construct scientific explanations and engage in scientific arguments. Evaluating student explanatory writing is extremely time-intensive, so we are developing techniques to automatically analyze the causal structure in student essays so that effective feedback may be provided. These techniques rely on a significant training corpus of annotated essays. Because one of our long-term goals is to make it easier to establish this approach in new subject domains, we are keenly interested in the question of how much training data is enough to support this. This paper describes our analysis of that question, and looks at one mechanism for reducing that data requirement which uses student scores on a related multiple choice test.

P. Hastings—The assessment project described in this article is funded, in part, by the Institute for Education Sciences, U.S. Department of Education (Grant R305F100007). The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The choice of group size is significant. As mentioned above, the distribution of multiple choice scores was fairly normal, and the least frequent score, 0, was assigned to 31 students. In order to maintain balanced representation of groups in the training set, some aggregation is necessary otherwise we could only test on a maximum of 31 items from each group. If the aggregation was too broad, however, it would decrease any benefit of balance in the training set.

References

  1. Achieve, Inc: Next Generation Science Standards: The common core standards for english language arts and literacy in history/social studies and science and technical subjects. Council of Chief State School Officers (2013)

    Google Scholar 

  2. Britt, M.A., Wallace, P., Blaum, D., Ko, M., Goldman, S.R.: Project READI science design team: multiple representations in science learning and assessment. In: Multiple Representations and Multimedia: Student Learning and Instruction. Symposium Conducted at the Annual Meeting of the AERA, Chicago, April 2015

    Google Scholar 

  3. Britt, M.A., Richter, T., Rouet, J.F.: Scientific literacy: the role of goal-directed reading and evaluation in understanding scientific information. Educ. Psychol. 49(2), 104–122 (2014). doi:10.1080/00461520.2014.916217

    Article  Google Scholar 

  4. Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994). doi:10.1007/BF00993277

    Google Scholar 

  5. Dietterich, T.G.: Machine learning for sequential data: a review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, p. 15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Duschl, R., Osborne, J.: Supporting and promoting argumentation discourse in science education. Stud. Sci. Educ. 38, 39–72 (2002)

    Article  Google Scholar 

  7. Hughes, S., Hastings, P., Britt, M.A., Wallace, P., Blaum, D.: Machine learning for holistic evaluation of scientific essays. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.F. (eds.) AIED 2015. LNCS, vol. 9112, pp. 165–175. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  8. Hughes, S., Hastings, P., Magliano, J., Goldman, S., Lawless, K.: Automated approaches for detecting integration in student essays. In: Cerri, S.A., Clancey, W.J., Papadourakis, G., Panourgia, K. (eds.) ITS 2012. LNCS, vol. 7315, pp. 274–279. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Kelly, G.J., Druker, S., Chen, C.: Students’ reasoning about electricity: combining performance assessments with argumentation analysis. Int. J. Sci. Educ. 20(7), 849–871 (1998)

    Article  Google Scholar 

  10. Meyer, B.J., Freedle, R.O.: Effects of discourse type on recall. Am. Educ. Res. J. 22(1), 121–143 (1984)

    Article  Google Scholar 

  11. Millis, K.K., Morgan, D., Graesser, A.C.: The influence of knowledge-based inferences on the reading time of expository text. Psychol. Learn. Motiv. 25, 197–212 (1990)

    Article  Google Scholar 

  12. Osborne, J., Erduran, S., Simon, S.: Enhancing the quality of argumentation in science classrooms. J. Res. Sci. Teach. 41(10), 994–1020 (2004)

    Article  Google Scholar 

  13. Osborne, J., Patterson, A.: Scientific argument and explanation: a necessary distinction? Sci. Educ. 95, 627–638 (2011)

    Article  Google Scholar 

  14. Shahrokh Esfahani, M., Dougherty, E.R.: Effect of separate sampling on classification accuracy. Bioinformatics 30(2), 242–250 (2014). http://bioinformatics.oxfordjournals.org/content/30/2/242.abstract

    Article  Google Scholar 

  15. Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Hastings .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Hastings, P., Hughes, S., Blaum, D., Wallace, P., Britt, M.A. (2016). Stratified Learning for Reducing Training Set Size. In: Micarelli, A., Stamper, J., Panourgia, K. (eds) Intelligent Tutoring Systems. ITS 2016. Lecture Notes in Computer Science(), vol 9684. Springer, Cham. https://doi.org/10.1007/978-3-319-39583-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39583-8_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39582-1

  • Online ISBN: 978-3-319-39583-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics