Abstract
When we process information to identify relations or extract entities, to type or classify them, or to fill out their attributes, we need to gauge how well our algorithms work. KM poses a couple of differences from traditional scientific hypothesis testing. The problems we are dealing with in information retrieval (IR), natural language understanding or processing (NLP), and machine learning (ML) are all statistical classification problems, specifically in binary classification. The most common scoring method to gauge the ‘accuracy’ of these classification problems uses statistical tests based on two metrics: negatives or positives, and true or false. We discuss a variety of statistical tests using the four possible results from these metrics (e.g., false positive). Testing scripts range from standard unit tests applied against platform tools to ones that do coherency and consistency checks across the knowledge structure or create reference standards for machine learning or inform improvements. We offer best practices learned from client deployments in areas such as data treatment and dataset management, creating and using knowledge structures, and testing, analysis, and documentation. Modularity in knowledge graphs, or consistent attention to UTF-8 encoding in data structures, or emphasis on ‘semi-automatic’ approaches, or use of literate programming and notebooks to record tests and procedures are just a few of the examples where lines blur between standard and best practices. Finding ways to identify and agree upon shared vocabularies and understandings is a central task of modeling the domain, and it involves practices in collaboration, naming, and use of these knowledge structures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Some material in this chapter was drawn from the author’s prior articles at the AI3:::Adaptive Information blog: “Listening to the Enterprise: Total Open Solutions, Part 1” (May 2010); “Using Wikis as Pre-Packaged Knowledge Bases” (Jul 2010); “A Reference Guide to Ontology Best Practices” (Sep 2010); “The Conditional Costs of Free” (Feb 2012); “Why Clojure?” (Dec 2014); “A Primer on Knowledge Statistics” (May 2015); “Literate Programming for an Open World” (Jun 2016); “Gold Standards in Enterprise Knowledge Projects” (Jul 2016).
- 2.
The Open Semantic Framework wiki is a contributor to content in this chapter, particularly “NLP and Knowledge Statistics” (http://wiki.opensemanticframework.org/index.php/NLP_and_Knowledge_Statistics) and “Ontology Best Practices” (http://wiki.opensemanticframework.org/index.php/Ontology_Best_Practices).
- 3.
I refer here to statistical classification; clearly, language meanings are not binary but nuanced.
- 4.
- 5.
- 6.
A vocabulary of linking predicates would capture the variety and degrees to which individuals, instances, classes, and concepts are similar or related to objects in other datasets. This purpose is different than, say, voiD (Vocabulary of Interlinked Datasets), which has as its purpose in providing descriptive metadata about the nature of particular datasets.
- 7.
As another commentary on the importance of definitions, see http://ontologyblog.blogspot.com/2010/09/physician-decries-lack-of-definitions.html.
- 8.
The Protégé manual [7] is also a source of good tips, especially with regard to naming conventions and the use of the editor.
- 9.
References
G. Hripcsak, A.S. Rothschild, Agreement, the F-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12, 296–298 (2005)
E. Miltsakaki, R. Prasad, A.K. Joshi, B.L. Webber, The Penn Discourse Treebank (2004)
P.V. Ogren, G.K. Savova, C.G. Chute, Constructing evaluation Corpora for automated clinical named entity recognition, in Medinfo 2007: Proceedings of the 12th World Congress on Health (Medical) Informatics; Building Sustainable Health Systems (IOS Press, Amsterdam, 2007), pp. 2325
V. Stoyanov, C. Cardie, Topic identification for fine-grained opinion analysis, in Proceedings of the 22nd International Conference on Computational Linguistics-Volume (2008), pp. 817–824
K. Dellschaft, S. Staab, On how to perform a gold standard based evaluation of ontology learning, in The Semantic Web-ISWC (Springer, Berlin, Heidelberg, 2006), pp. 228–241
KBART Phase II Working Group, KBART: Knowledge Bases and Related Tools Recommended Practice (NISO, Baltimore, MD, 2014)
M. Horridge, S. Jupp, G. Moulton, A. Rector, R. Stevens, C. Wroe, A Practical Guide to Building OWL Ontologies Using Protégé and CO-ODE Tools (University of Manchester, Manchester, 2007)
E.P.B. Simperl, C. Tempich, Ontology engineering: a reality check, in On the Move to Meaningful Internet Systems (Springer, New York, 2006), pp. 836–854
F. Giasson, Exploding the Domain (Frederick Giasson, 2008)
D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, D. Dennison, Hidden technical debt in machine learning systems, in Advances in Neural Information Processing Systems (2015), pp. 2503–2511
K. Jalan, How to Improve Machine Learning Performance? Lessons from Andrew Ng. https://www.kdnuggets.com/2017/12/improve-machine-learning-performance-lessons-andrew-ng.html
D. E. Knuth, Literate Programming. The Computer Journal 27(2), 97–111 (1984)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bergman, M.K. (2018). Testing and Best Practices. In: A Knowledge Representation Practionary. Springer, Cham. https://doi.org/10.1007/978-3-319-98092-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-98092-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98091-1
Online ISBN: 978-3-319-98092-8
eBook Packages: Computer ScienceComputer Science (R0)