Calibrating for Specific Domains

Wild, Fridolin

doi:10.1007/978-3-319-28791-1_7

Fridolin Wild²

2147 Accesses

Abstract

Eigenspace-based models were shown to exhibit greater effectiveness than their simple vector space counterparts in settings that benefit from fuzziness (such as information retrieval or recommender systems). In settings that require precision in representation structure (such as in essay scoring or for conceptual relationship mining), however, improved means to predict model behaviour from parameter settings could ease applicability and increase efficiency by reducing tuning times.

This chapter reports experiences and experiment results from a systematic investigation of tuning parameters, their potential settings, and interdependencies between them. This includes studying the influence of sanitising operations, sampling, dimensionality changes, and degrees of specialisation. Trends indicate that the smaller the corpus, the more domain-specific documents are required. Moreover, recommendations for vocabulary filtering can be derived, dependent on the size of the corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It turned out that the collected data was too big to analyse on a machine with 32 GB memory.

References

Berry, M., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM. Rev. 41(2), 335–362 (1999)
Article MATH MathSciNet Google Scholar
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Landauer, T., Foltz, P., Laham, D.: An introduction to latent semantic analysis. Discourse. Processes. 25(2–3), 259–284 (1998)
Article Google Scholar
Leydesdorff, L.: Similarity measures, author cocitation analysis, and information theory. J. Am. Soc. Inf. Sci. 56(7), 69–772 (2005)
Article Google Scholar
Quesada, J.: Creating your own LSA space. In: Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis. Lawrence Erlbaum Associates, Mahwah, NJ. (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Performance Augmentation Lab Department of Computing and Communication Technologies, Oxford Brookes University, Oxford, UK
Fridolin Wild

Authors

Fridolin Wild
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wild, F. (2016). Calibrating for Specific Domains. In: Learning Analytics in R with SNA, LSA, and MPIA. Springer, Cham. https://doi.org/10.1007/978-3-319-28791-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-28791-1_7
Published: 05 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28789-8
Online ISBN: 978-3-319-28791-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics