Abstract
Automatically extracted data is rarely “clean” with respect to pragmatic (real-world) constraints—which thus hinders applications that depend on quality data. We proffer a solution to detecting pragmatic constraint violations that works via a declarative and semantically enabled constraint-violation checker. In conjunction with an ensemble of automated information extractors, the implemented prototype checks both hard and soft constraints—respectively those that are satisfied or not and those that are satisfied probabilistically with respect to a threshold. An experimental evaluation shows that the constraint checker identifies semantic errors with high precision and recall and that pragmatic error identification can improve results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Fe6: Form-based ensemble with 6 pipeline phases that accepts an OCRed document as input and generates a conceptualization of document-asserted facts as output.
References
Embley, D.W., Liddle, S.W., Woodfield, S.N.: A superstructure for models of quality. In: Indulska, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8823, pp. 147–156. Springer, Heidelberg (2014). doi:10.1007/978-3-319-12256-4_16
Akoka, J., Berti-Equille, L., Boucelma, O., Bouzeghoub, M., Comyn-Wattiau, I., Cosquer, M., Goasdoué-Thion, V., Kedad, Z., Nugier, S., Peralta, V., Cherfi, S.S.: A framework for quality evaluation in data integration systems. In: ICEIS 2007 - Proceedings of the Ninth International Conference on Enterprise Information Systems, pp. 170–175, Funchal, Madeira, Portugal, June 2007
Gutierrez, F., Dou, D., Fickas, S., Wimalasuriya, D., Zong, H.: A hybrid ontology-based information extraction system. J. Inf. Sci. (2015). On-line publication number 0165551515610989
Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
Vanderpoel, G.B. (ed.): The Ely Ancestry: Lineage of RICHARD ELY of Plymouth, England, Who Came to Boston, Mass., about 1655 & settled at Lyme, Conn., in 1660. The Calumet Press, New York (1902)
Liddle, S.W., Embley, D.W., Woodfield, S.N.: Cardinality constraints in semantic data models. Data & Knowl. Eng. 11(3), 235–270 (1993)
Grant, F.J. (ed.): Index to The Register of Marriages and Baptisms in the PARISH OF KILBARCHAN, pp. 1649–1772. J. Skinner & Company, LTD, Edinburgh, Scotland (1912)
Harwood, W.H.: A Genealogical History of the Harwood Families, Descended from Andrew Harwood, Whose English Home Was in Dartmouth, Devonshire, England, and Who Emigrated to America, and Was Living in Boston, Mass., in 1643. Watson H. Harwood, M.D., Chasm Falls, New York, 3rd edn. (1911)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Woodfield, S.N., Lonsdale, D.W., Liddle, S.W., Kim, T.W., Embley, D.W., Almquist, C. (2016). Pragmatic Quality Assessment for Automatically Extracted Data. In: Comyn-Wattiau, I., Tanaka, K., Song, IY., Yamamoto, S., Saeki, M. (eds) Conceptual Modeling. ER 2016. Lecture Notes in Computer Science(), vol 9974. Springer, Cham. https://doi.org/10.1007/978-3-319-46397-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-46397-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46396-4
Online ISBN: 978-3-319-46397-1
eBook Packages: Computer ScienceComputer Science (R0)