Abstract
Eigenspace-based models were shown to exhibit greater effectiveness than their simple vector space counterparts in settings that benefit from fuzziness (such as information retrieval or recommender systems). In settings that require precision in representation structure (such as in essay scoring or for conceptual relationship mining), however, improved means to predict model behaviour from parameter settings could ease applicability and increase efficiency by reducing tuning times.
This chapter reports experiences and experiment results from a systematic investigation of tuning parameters, their potential settings, and interdependencies between them. This includes studying the influence of sanitising operations, sampling, dimensionality changes, and degrees of specialisation. Trends indicate that the smaller the corpus, the more domain-specific documents are required. Moreover, recommendations for vocabulary filtering can be derived, dependent on the size of the corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It turned out that the collected data was too big to analyse on a machine with 32Â GB memory.
References
Berry, M., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM. Rev. 41(2), 335–362 (1999)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Landauer, T., Foltz, P., Laham, D.: An introduction to latent semantic analysis. Discourse. Processes. 25(2–3), 259–284 (1998)
Leydesdorff, L.: Similarity measures, author cocitation analysis, and information theory. J. Am. Soc. Inf. Sci. 56(7), 69–772 (2005)
Quesada, J.: Creating your own LSA space. In: Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis. Lawrence Erlbaum Associates, Mahwah, NJ. (2007)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Wild, F. (2016). Calibrating for Specific Domains. In: Learning Analytics in R with SNA, LSA, and MPIA. Springer, Cham. https://doi.org/10.1007/978-3-319-28791-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-28791-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28789-8
Online ISBN: 978-3-319-28791-1
eBook Packages: Computer ScienceComputer Science (R0)