Abstract
Entity resolution plays an important role in many fields. Due to its importance, it has been widely studied. However, in big data era, entity resolution brings new challenges including high scalability, coexistence of tautonymy and synonym, complex similarity metrics as well as the requirement of data quality evaluation based on entity resolution. Facing these challenges, we introduce our solutions briefly and discuss the possible future work for entity resolution in big data era.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fan, W., Geerts, F., Wijsen, J.: Determining the currency of data. ACM Trans. Database Syst. 37(4), 25 (2012)
Wang, H., Li, J., Gao, H.: Data model for dirty databases. J. Softw. 23(3), 539–549 (2012)
Wang, H., Zhang, X., Li, J., Gao, H.: ProductSeeker: entity-based product retrieval for e-commerce. In: The 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1085–1086. ACM (2013)
Wang, L., Zhang, R., Sha, C., Wang, X., Zhou, A.: A product normalization method for e-commerce. Chin. J. Comput. 34(2), 312–325 (2014)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Koudas, N., Sarawagi, S., Srivastava, D.: Record linkage: similarity measures and algorithms. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 802–803. ACM (2006)
Wang, H., Fan, W.: Object identification on complex data: a survey. Chin. J. Comput. 34(10), 1843–1852 (2011)
Li, L., Li, J., Gao, H.: Rule-based method for entity resolution. IEEE Trans. Knowl. Data Eng. 27(1), 250–263 (2015)
Li, L., Li, J., Gao, H.: Evaluating entity-description conflict on duplicated data. J. Comb. Optim. 31(2), 918–941 (2016)
Li, L., Wang, H., Gao, H., Li, J.: EIF: a framework of effective entity identification. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 717–728. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14246-8_68
Altowim, Y., Mehrotra, S.: Parallel progressive approach to entity resolution using MapReduce. In: 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, 19–22 April 2017, pp. 909–920 (2017)
Ma, K., Yang, B.: Parallel NoSQL entity resolution approach with MapReduce. In: 2015 International Conference on Intelligent Networking and Collaborative Systems, INCoS 2015, Taipei, Taiwan, 2–4 September 2015, pp. 384–389 (2015)
Huo, R., Wang, H., Zhu, R., Li, J., Gao, H.: Map-reduce based entity identification in big data. J. Comput. Res. Dev. 50(2), 170–179 (2013)
Acknowledgements
This work was supported by NSFC61602159, 61370222.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Li, L. (2018). Entity Resolution in Big Data Era: Challenges and Applications. In: Liu, C., Zou, L., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10829. Springer, Cham. https://doi.org/10.1007/978-3-319-91455-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-91455-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91454-1
Online ISBN: 978-3-319-91455-8
eBook Packages: Computer ScienceComputer Science (R0)