Advertisement

Chinese Address Similarity Calculation Based on Auto Geological Level Tagging

  • Jing Liu
  • Jianbin WangEmail author
  • Changqing Zhang
  • Xiubo Yang
  • Jianbo Deng
  • Ruihe Zhu
  • Xiaojie Nan
  • Qinghua ChenEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11555)

Abstract

How to quickly measure the similarity of addresses has become an urgent need in various fields including financial anti-fraud. Traditional string-based similarity calculation methods have not completed this task perfectly. Taking into account the hierarchical nature of addresses, we constructed a framework for calculating the similarity of Chinese addresses. First, the whole address strings are split and annotated with proper level by a LM-LSTM-CRF model, and then sub-string level similarities are calculated. Last, similarity scores are combining by BP neural networks. This framework has achieved good results in practice for financial anti-fraud tasks.

Keywords

Address similarity Auto geological level tagging LM-LSTM-CRF model 

References

  1. 1.
    Budanitsky, A., Hirst, G.: Semantic distance in wordnet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources 2, 2–2 (2001)Google Scholar
  2. 2.
    Chang, C.H., Huang, C.Y., Su, Y.S.: On chinese postal address and associated information extraction. In: The 26th Annual Conference of the Japanese Society for Artificial Intelligence, pp. 1–7 (2012)Google Scholar
  3. 3.
    Chen, Z., Lee, K.F.: A new statistical approach to chinese pinyin input. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 241–247. Association for Computational Linguistics (2000)Google Scholar
  4. 4.
    Fellbaum, C.: WordNet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010).  https://doi.org/10.1007/978-90-481-8847-5_10Google Scholar
  5. 5.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with lstm. Neural Comput. 12(10), 2451–2471 (2000)Google Scholar
  6. 6.
    Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. Neural Net. 1, 347–352 (1996)Google Scholar
  7. 7.
    Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)Google Scholar
  8. 8.
    Hou, X., Gao, Z., Wang, Q.: Internet finance development and banking market discipline: evidence from china. J. Financ. Stab. 22, 88–100 (2016)Google Scholar
  9. 9.
    Julstrom, B.A., Hinkemeyer, B.: Starting from scratch: growing longest common subsequences with evolution. In: Runarsson, T.P., Beyer, H.-G., Burke, E., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 930–938. Springer, Heidelberg (2006).  https://doi.org/10.1007/11844297_94Google Scholar
  10. 10.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing. Pearson, London (2014)Google Scholar
  11. 11.
    Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005).  https://doi.org/10.1007/11575832_13Google Scholar
  12. 12.
    Liu, L., et al.: Empower sequence labeling with task-aware neural language model. arXiv preprint arXiv:1709.04109 (2017)
  13. 13.
    Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. pp. 1064–1074 (2016)Google Scholar
  14. 14.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)Google Scholar
  15. 15.
    Perkins, J.: Python Text Processing With NLTK 2.0 Cookbook. Packt Publishing Ltd, Birmingham (2010)Google Scholar
  16. 16.
    Ta, L.: The risk and prevention of internet finance. In: 2017 4th International Conference on Industrial Economics System and Industrial Security Engineering, pp. 1–5 (2017)Google Scholar
  17. 17.
    Yujian, L., Bo, L.: A normalized levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007)Google Scholar
  18. 18.
    Zhang, D., Xu, H., Su, Z., Xu, Y.: Chinese comments sentiment classification based on word2vec and svmperf. Expert Syst. Appl. 42(4), 1857–1863 (2015)Google Scholar
  19. 19.
    Zhao, Y., Wang, L., Chou, A.: A fusion method of marine sub-bottom acoustic spatial data based on features and applications. Sci. Surv. Map. 38(5), 74–76 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jing Liu
    • 1
  • Jianbin Wang
    • 2
    Email author
  • Changqing Zhang
    • 2
  • Xiubo Yang
    • 2
  • Jianbo Deng
    • 3
  • Ruihe Zhu
    • 3
  • Xiaojie Nan
    • 2
  • Qinghua Chen
    • 1
    Email author
  1. 1.School of Systems ScienceBeijing Normal UniversityBeijingChina
  2. 2.Credit Harmony Research, Building 3 District 3 Hanwei InternationalBeijingChina
  3. 3.Swarma ClubBeijingChina

Personalised recommendations