Abstract
Family Tree is a wiki-like shared repository of interconnected family genealogies. Because information ingested into the tree requires human authorization as verified in source documents, ingest is tedious and time-consuming. To significantly increase ingest efficiency while maintaining human oversight, we propose a pipeline of tools and techniques to transform source document genealogical assertions into verified information in the Family Tree data repository. The automation pipeline transforms pages of printed, scanned and OCRed family history books into a GEDCOM X conceptualization that can be ingested into Family Tree. All steps of the pipeline are fundamentally grounded in ontological conceptualizations. We report on the pipeline implementation status and give results of initial case studies in semi-automatically ingesting information obtained from family history books into Family Tree.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dori, D.: Model-based Systems Engineering with OPM and SysML. Springer (2015)
Embley, D.: Object Database Development: Concepts and Principles. Addison-Wesley, Reading, Massachusetts (1998)
Embley, D., Campbell, D., Jiang, Y., Liddle, S., Lonsdale, D., Ng, Y.K., Smith, R.: Conceptualmodel-based data extraction from multiple-record web pages. Data & Knowledge Engineering 31(3), 227–251 (1999)
Embley, D., Kurtz, B., Woodfield, S.: Object-oriented Systems Analysis: A Model-Driven Approach. Prentice Hall, Englewood Cliffs, New Jersey (1992)
Embley, D., Liddle, S., Lonsdale, D.: Conceptual modeling foundations for a web of knowledge. In: Embley, D., Thalheim, B. (eds.) Handbook of Conceptual Modeling: Theory, Practice, and Research Challenges, chap. 15, pp. 477–516. Springer, Heidelberg, Germany (2011)
Embley, D., Thalheim, B. (eds.): Handbook of Conceptual Modeling: Theory, Practice, and Research Challenges. Springer, Heidelberg, Germany (2011)
ER web site. http://conceptualmodeling.org/
FamilySearch. http://familysearch.org/
GEDCOM X. http://www.gedcomx.org/
Grant, F. (ed.): Index to The Register of Marriages and Baptisms in the PARISH OF KILBARCHAN, 1649–1772. J. Skinner & Company, LTD, Edinburgh, Scotland (1912)
Kang, H., Getoor, L., Shneiderman, B., Bilgic, M., Licamele, L.: Interactive entity resolution in relational data: A visual analytic tool and its evaluation. IEEE Transactions on Visualization and Computer Graphics 14(5) (September/October 2008)
Kim, T.: A Green Form-Based Information Extraction System for Historical Documents. Master’s thesis, Brigham Young University, Provo, Utah (2017)
Laird, J.: The Soar Cognitive Architecture. The MIT Press, Cambridge, Massachusetts (2012)
Liddle, S., Embley, D., Woodfield, S.: Cardinality constraints in semantic data models. Data & Knowledge Engineering 11(3), 235–270 (1993)
Lindes, P.: OntoSoar: Using Language to Find Genealogy Facts. Master’s thesis, Brigham Young University, Provo, Utah (2014)
Miller Funeral Home Records, 1917 – 1950, Greenville, Ohio (1990)
Nagy, G.: Estimation, learning, and adaptation: Systems that improve with use. In: Proceedings of the Joint IAPR InternationalWorkshop on Structural, Syntactic, and Statistical Pattern Recognition. Hiroshima, Japan (November 2012)
G. Nagy, DDA: Decision Directed Adaptation. personal communication
. Olivé, A.: Conceptual Modeling of Information Systems. Springer, Berlin, Germany (2007)
Packer, T.: Scalable Detection and Extraction of Data in Lists in OCRed Text for Ontology Population Using Semi-Supervised and Unsupervised Active Wrapper Induction. Ph.D. thesis, Brigham Young University (2014)
Park, J.: FROntIER: A Framework for Extracting and Organizing Biographical Facts in Historical Documents. Master’s thesis, Brigham Young University, Provo, Utah (2015)
SeleniumHQ: Browser automation. http://www.seleniumhq.org/
Tao, C., Embley, D., Liddle, S.: FOCIH: Form-based ontology creation and information harvesting. In: Proceedings of the 28th International Conference on Conceptual Modeling (ER2009). pp. 346–359. Gramado, Brazil (November 2009)
Thalheim, B.: Entity-Relationship Modeling: Foundations of Database Technology. Springer, Berlin, Germany (2000)
Vanderpoel, G. (ed.): The Ely Ancestry: Lineage of RICHARD ELY of Plymouth, England, who came to Boston, Mass., about 1655 & settled at Lyme, Conn., in 1660. The Calumet Press, New York, New York (1902)
Woodfield, S., Lonsdale, D., Liddle, S., Kim, T., Embley, D., Almquist, C.: Pragmatic quality assessment for automatically extracted data. In: Proceedings of ER 2016. vol. LNCS 9974, pp. 212–220. Gifu, Japan (November 2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Embley, D.W., Liddle, S.W., Eastmond, T.S., Lonsdale, D.W., Price, J.P., Woodfield, S.N. (2017). Conceptual Modeling in Accelerating Information Ingest into Family Tree . In: Cabot, J., Gómez, C., Pastor, O., Sancho, M., Teniente, E. (eds) Conceptual Modeling Perspectives. Springer, Cham. https://doi.org/10.1007/978-3-319-67271-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-67271-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67270-0
Online ISBN: 978-3-319-67271-7
eBook Packages: Computer ScienceComputer Science (R0)