Skip to main content

Testing and Best Practices

  • Chapter
  • First Online:
Book cover A Knowledge Representation Practionary
  • 909 Accesses

Abstract

When we process information to identify relations or extract entities, to type or classify them, or to fill out their attributes, we need to gauge how well our algorithms work. KM poses a couple of differences from traditional scientific hypothesis testing. The problems we are dealing with in information retrieval (IR), natural language understanding or processing (NLP), and machine learning (ML) are all statistical classification problems, specifically in binary classification. The most common scoring method to gauge the ‘accuracy’ of these classification problems uses statistical tests based on two metrics: negatives or positives, and true or false. We discuss a variety of statistical tests using the four possible results from these metrics (e.g., false positive). Testing scripts range from standard unit tests applied against platform tools to ones that do coherency and consistency checks across the knowledge structure or create reference standards for machine learning or inform improvements. We offer best practices learned from client deployments in areas such as data treatment and dataset management, creating and using knowledge structures, and testing, analysis, and documentation. Modularity in knowledge graphs, or consistent attention to UTF-8 encoding in data structures, or emphasis on ‘semi-automatic’ approaches, or use of literate programming and notebooks to record tests and procedures are just a few of the examples where lines blur between standard and best practices. Finding ways to identify and agree upon shared vocabularies and understandings is a central task of modeling the domain, and it involves practices in collaboration, naming, and use of these knowledge structures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Some material in this chapter was drawn from the author’s prior articles at the AI3:::Adaptive Information blog: “Listening to the Enterprise: Total Open Solutions, Part 1” (May 2010); “Using Wikis as Pre-Packaged Knowledge Bases” (Jul 2010); “A Reference Guide to Ontology Best Practices” (Sep 2010); “The Conditional Costs of Free” (Feb 2012); “Why Clojure?” (Dec 2014); “A Primer on Knowledge Statistics” (May 2015); “Literate Programming for an Open World” (Jun 2016); “Gold Standards in Enterprise Knowledge Projects” (Jul 2016).

  2. 2.

    The Open Semantic Framework wiki is a contributor to content in this chapter, particularly “NLP and Knowledge Statistics” (http://wiki.opensemanticframework.org/index.php/NLP_and_Knowledge_Statistics) and “Ontology Best Practices” (http://wiki.opensemanticframework.org/index.php/Ontology_Best_Practices).

  3. 3.

    I refer here to statistical classification; clearly, language meanings are not binary but nuanced.

  4. 4.

    See http://en.wikipedia.org/wiki/Type_I_and_type_II_errors.

  5. 5.

    See http://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram.

  6. 6.

    A vocabulary of linking predicates would capture the variety and degrees to which individuals, instances, classes, and concepts are similar or related to objects in other datasets. This purpose is different than, say, voiD (Vocabulary of Interlinked Datasets), which has as its purpose in providing descriptive metadata about the nature of particular datasets.

  7. 7.

    As another commentary on the importance of definitions, see http://ontologyblog.blogspot.com/2010/09/physician-decries-lack-of-definitions.html.

  8. 8.

    The Protégé manual [7] is also a source of good tips, especially with regard to naming conventions and the use of the editor.

  9. 9.

    See http://obofoundry.org/wiki/index.php/OBO_Foundry_Principles.

References

  1. G. Hripcsak, A.S. Rothschild, Agreement, the F-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12, 296–298 (2005)

    Article  Google Scholar 

  2. E. Miltsakaki, R. Prasad, A.K. Joshi, B.L. Webber, The Penn Discourse Treebank (2004)

    Google Scholar 

  3. P.V. Ogren, G.K. Savova, C.G. Chute, Constructing evaluation Corpora for automated clinical named entity recognition, in Medinfo 2007: Proceedings of the 12th World Congress on Health (Medical) Informatics; Building Sustainable Health Systems (IOS Press, Amsterdam, 2007), pp. 2325

    Google Scholar 

  4. V. Stoyanov, C. Cardie, Topic identification for fine-grained opinion analysis, in Proceedings of the 22nd International Conference on Computational Linguistics-Volume (2008), pp. 817–824

    Google Scholar 

  5. K. Dellschaft, S. Staab, On how to perform a gold standard based evaluation of ontology learning, in The Semantic Web-ISWC (Springer, Berlin, Heidelberg, 2006), pp. 228–241

    Google Scholar 

  6. KBART Phase II Working Group, KBART: Knowledge Bases and Related Tools Recommended Practice (NISO, Baltimore, MD, 2014)

    Google Scholar 

  7. M. Horridge, S. Jupp, G. Moulton, A. Rector, R. Stevens, C. Wroe, A Practical Guide to Building OWL Ontologies Using Protégé and CO-ODE Tools (University of Manchester, Manchester, 2007)

    Google Scholar 

  8. E.P.B. Simperl, C. Tempich, Ontology engineering: a reality check, in On the Move to Meaningful Internet Systems (Springer, New York, 2006), pp. 836–854

    Chapter  Google Scholar 

  9. F. Giasson, Exploding the Domain (Frederick Giasson, 2008)

    Google Scholar 

  10. D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, D. Dennison, Hidden technical debt in machine learning systems, in Advances in Neural Information Processing Systems (2015), pp. 2503–2511

    Google Scholar 

  11. K. Jalan, How to Improve Machine Learning Performance? Lessons from Andrew Ng. https://www.kdnuggets.com/2017/12/improve-machine-learning-performance-lessons-andrew-ng.html

  12. D. E. Knuth, Literate Programming. The Computer Journal 27(2), 97–111 (1984)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bergman, M.K. (2018). Testing and Best Practices. In: A Knowledge Representation Practionary. Springer, Cham. https://doi.org/10.1007/978-3-319-98092-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98092-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98091-1

  • Online ISBN: 978-3-319-98092-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics