Attention-based multi-modal fusion for improved real estate appraisal: a case study in Los Angeles

  • Junchi Bin
  • Bryan Gardiner
  • Zheng LiuEmail author
  • Eric Li


The geographical presentation of a house, which refers to the sightseeing and topography near the house, is a critical factor to a house buyer. The street map is a type of common data in our daily life, which contains natural geographical presentation. This paper sources real estate data and corresponding street maps of houses in the city of Los Angeles. In the case study, we proposed an innovative method, attention-based multi-modal fusion, to incorporate the geographical presentation from street maps into the real estate appraisal model with a deep neural network. We firstly combine the house attribute features and street map imagery features by applying the attention-based neural network. After that, we apply boosted regression trees to estimate the house price from the fused features. This work explored the potential of attention mechanism and data fusion in the applications of real estate appraisal. The experimental results indicate the competitiveness of proposed method among state-of-the-art methods.


Real estate appraisal Convolutional neural network Multi-modal fusion Boosted regression trees 



This study was supported by Mitacs Accelerate Program (IT10011) through the collaboration between Data Nerds and the University of British Columbia (Okanagan). The authors present the appreciation to Fang Shi, Shuo Liu (University of British Columbia), Dr. Huan Liu (China University of Geosciences) and Kaiqi Zhang (AECOM New York) for the precious discussion when the work was carried out.


  1. 1.
    Ai Y, Li Z, Gan M, Zhang Y, Yu D, Chen W, Ju Y (2018) A deep learning approach on short-term spatiotemporal distribution forecasting of dockless bike-sharing system. Neural Comput ApplGoogle Scholar
  2. 2.
    Antipov EA, Pokryshevskaya EB (2012) Mass appraisal of residential apartments: an application of Random forest for valuation and a CART-based approach for model diagnostics. Expert Syst Appl 39(2):1772–1778CrossRefGoogle Scholar
  3. 3.
    Bency AJ, Rallapalli S, Ganti RK, Srivatsa M, Manjunath BS (2017) Beyond spatial auto-regressive models: Predicting housing prices with satellite imagery. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 320–329Google Scholar
  4. 4.
    Bessinger Z, Jacobs N (2016) Quantifying curb appeal. In: 2016 IEEE International Conference on Image Processing (ICIP), pp 4388–4392Google Scholar
  5. 5.
    Bidanset PE, Lombard JR (2014) Evaluating spatial model accuracy in mass real estate appraisal: A comparison of geographically weighted regression and the spatial lag model. Cityscape: A J Policy Dev Res 16(3):169–182Google Scholar
  6. 6.
    Bin J, Tang S, Liu Y, Wang G, Gardiner B, Liu Z, Li E (2017) Regression model for appraisal of real estate using recurrent neural network and boosting tree. In: 2017 2nd IEEE international conference on computational intelligence and applications (ICCIA), pp 209–213Google Scholar
  7. 7.
    Cao J, Cao M, Wang J, Yin C, Wang D, Vidal PP (2018) Urban noise recognition with convolutional neural network. Multimed Tools ApplGoogle Scholar
  8. 8.
    Ċetkoviċ J, Lakiċ S, Lazarevska M, żarkoviċ M, Vujoṡeviċ S, Cvijoviċ J, Gogiċ M (2018) Assessment of the real estate market value in the european market by artificial neural networks application. Complex 2018:1–10zbMATHGoogle Scholar
  9. 9.
    Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining - KDD’16, San Francisco, CA, USA, pp 785–794Google Scholar
  10. 10.
    Chollet F et al (2015) Keras., Accessed April 16 2018
  11. 11.
    Crosby H, Davis P, Jarvis SA (2015) Exploring new data sources to improve UK land parcel valuation. In: Proceedings of the 1st international ACM SIGSPATIAL workshop on smart cities and urban analytics - UrbanGIS’15, pp 32–35Google Scholar
  12. 12.
    Demetriou D (2016) A spatially based artificial neural network mass valuation model for land consolidation. Environ Plan B: Urban Anal City Sci 44(5):864–883Google Scholar
  13. 13.
    Dimopoulos T, Yiorkas C (2017) Implementing GIS in real estate price prediction and mass valuation: the case study of nicosia district. In: 5th international conference on remote sensing and geoinformation of the environment (RSCy2017)Google Scholar
  14. 14.
    Dubey A, Naik N, Parikh D, Raskar R, Hidalgo CA (2016) Deep learning the city: Quantifying urban perception at a global scale. In: Computer vision – ECCV 2016, Cham, pp 196–212Google Scholar
  15. 15.
    Estated (2018) Property reports., Accessed May 01 2018
  16. 16.
    Fan GZ, Ong SE, Koh HC (2006) Determinants of house price: a decision tree approach. Urban Stud 43(12):2301–2315CrossRefGoogle Scholar
  17. 17.
    Friedman JH (2001) Greedy function approximation: A gradient boosting machine. Ann Stat 29(5):1189–1232MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimed 19(9):2045–2055. CrossRefGoogle Scholar
  19. 19.
    Gebru T, Krause J, Wang Y, Chen D, Deng J, Aiden EL, Fei-Fei L (2017) Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the united states. Proc Natl Acad Sci USA 114(50):13,108–13,113CrossRefGoogle Scholar
  20. 20.
    Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, CambridgezbMATHGoogle Scholar
  21. 21.
    Graczyk M, Lasota T, Trawiński B, Trawiński K (2010) Comparison of bagging, boosting and stacking ensembles applied to real estate appraisal. In: Proceedings of the Second International Conference on Intelligent Information and Database Systems: Part II, Hue City, Vietnam, pp 340–350Google Scholar
  22. 22.
    Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167v3
  23. 23.
    Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: Cortes C, Lawrence N D, Lee D D, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems, vol 28. Curran Associates Inc., pp 2017–2025Google Scholar
  24. 24.
    Johnson K, Kuhn M (2013) Applied predictive modeling. Springer, BerlinzbMATHGoogle Scholar
  25. 25.
    Kauko TJ (2002) Modelling the locational determinants of house prices: neural network and value tree approaches. PhD thesis, Universiteit Utrecht, Utrecht, NetherlandsGoogle Scholar
  26. 26.
    Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980v9
  27. 27.
    Konig D, Adam M, Jarvers C, Layher G, Neumann H, Teutsch M (2017) Fully convolutional region proposal networks for multispectral person detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)Google Scholar
  28. 28.
    Lasota T, Sachnowski P, Trawiṅski B (2009) Comparative analysis of regression tree models for premises valuation using statistica data miner. In: Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems. Springer, Berlin, pp 776–787Google Scholar
  29. 29.
    Liang Y, Ke S, Zhang J, Yi X, Zheng Y (2018) GeoMAN: Multi-level attention networks for geo-sensory time series prediction. In: Proceedings of the 27th international joint conference on artificial intelligenceGoogle Scholar
  30. 30.
    Liu X, Xu Q, Yang J, Thalman J, Yan S, Luo J (2018) Learning multi-instance deep ranking and regression network for visual house appraisal. IEEE Tran Knowl Data En 30(8):1496–1506. CrossRefGoogle Scholar
  31. 31.
    Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, Omnipress, USA, ICML’10, pp 807–814Google Scholar
  32. 32.
    Ordonez V, Berg TL (2014) Learning high-level judgments of urban perception. In: Computer Vision – ECCV 2014, Cham, pp 494–510Google Scholar
  33. 33.
    Park B, Bae JK (2015) Using machine learning algorithms for housing price prediction: the case of Fairfax County, Virginia housing data. Expert Syst Appl 42 (6):2928–2934CrossRefGoogle Scholar
  34. 34.
    Ramírez I, Cuesta-Infante A, Pantrigo JJ, Montemayor AS, Moreno JL, Alonso V, Anguita G, Palombarani L (2018) Convolutional neural networks for computer vision-based detection and recognition of dumpsters. Neural Comput ApplGoogle Scholar
  35. 35.
    Salesses P, Schechtner K, Hidalgo CA (2013) The collaborative image of the city: mapping the inequality of urban perception. PLoS ONE 8(7):e68,400CrossRefGoogle Scholar
  36. 36.
    Selim H (2009) Determinants of house prices in turkey: Hedonic regression versus artificial neural network. Expert Syst Appl 36(2):2843–2852CrossRefGoogle Scholar
  37. 37.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations (ICLR 2015)Google Scholar
  38. 38.
    Sırmaċek B, Ünsalan C (2010) Using local features to measure land development in urban regions. Pattern Recognit Lett 31(10):1155–1159CrossRefGoogle Scholar
  39. 39.
    Song J, Guo Y, Gao L, Li X, Hanjalic A, Shen HT (2018) From deterministic to generative: Multimodal stochastic RNNs for video captioning. IEEE Trans Neural Net Learn pp 1–12.
  40. 40.
    Song J, Zhang H, Li X, Gao L, Wang M, Hong R (2018) Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Trans Image Process 27(7):3210–3221. MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetzbMATHGoogle Scholar
  42. 42.
    Stamen (2017) Openstreetmap., Accessed April 01 2018
  43. 43.
    Sun F, Li W, Guan Y (2018).
  44. 44.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), vol 2015Google Scholar
  45. 45.
    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  46. 46.
    Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg U V, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc., pp 5998–6008Google Scholar
  47. 47.
    Vo N (2014) A new conceptual automated property valuation model for residential housing market. PhD thesis, Victoria University, Victoria, CanadaGoogle Scholar
  48. 48.
    Vrijdag K (2016) Auction price prediction: An instance-transfer learning approach. PhD thesis, Eindhoven University of Technology, Eindhoven, NetherlandsGoogle Scholar
  49. 49.
    Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: Saliency-aware 3-d CNN with LSTM for video action recognition. IEEE Signal Process Lett 24(4):510–514. CrossRefGoogle Scholar
  50. 50.
    Wang X, Gao L, Wang P, Sun X, Liu X (2018) Two-stream 3-d convNet fusion for action recognition in videos with arbitrary size and length. IEEE Trans Multimed 20(3):634–644. CrossRefGoogle Scholar
  51. 51.
    Wilson I, Paris S, Ware J, Jenkins D (2002) Residential property price time series forecasting with neural networks. Knowl-Based Syst 15(5):335–341CrossRefGoogle Scholar
  52. 52.
    Worzala E, Lenk M, Silva A (1995) An exploration of neural networks and its application to real estate valuation. J Real Estate Res 10(2):185–201Google Scholar
  53. 53.
    Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol 37, pp 2048–2057Google Scholar
  54. 54.
    Yang Z, Yu W, Liang P, Guo H, Xia L, Zhang F, Ma Y, Ma J (2018) Deep transfer learning for military object recognition under small training set condition. Neural Comput ApplGoogle Scholar
  55. 55.
    You Q, Pang R, Cao L, Luo J (2017) Image-based appraisal of real estate properties. IEEE Trans Multimed 19(12):2751–2759CrossRefGoogle Scholar
  56. 56.
    Zhang L, Lin L, Liang X, He K (2016) Is faster r-CNN doing well for pedestrian detection?. In: Computer vision – ECCV, vol 2016, pp 443–457Google Scholar
  57. 57.
    Zhou X, Shen Y, Zhu Y, Huang L (2018) Predicting multi-step citywide passenger demands using attention-based neural networks. In: Proceedings of the 11th ACM international conference on web search and data mining - WSDM’18Google Scholar
  58. 58.
    Zhou Y, Zhang L, Yi Z (2017) Predicting movie box-office revenues using deep neural networks. Neural Comput Appl.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Applied ScienceUniversity of British ColumbiaKelownaCanada
  2. 2.Data NerdsKelownaCanada
  3. 3.Faculty of ManagementUniversity of British ColumbiaKelownaCanada

Personalised recommendations