Real estate price estimation in French cities using geocoding and machine learning

Abstract

This paper reviews real estate price estimation in France, a market that has received little attention. We compare seven popular machine learning techniques by proposing a different approach that quantifies the relevance of location features in real estate price estimation with high and fine levels of granularity. We take advantage of a newly available open dataset provided by the French government that contains 5 years of historical data of real estate transactions. At a high level of granularity, we obtain important differences regarding the models’ prediction powers between cities with medium and high standards of living (precision differences beyond 70% in some cases). At a low level of granularity, we use geocoding to add precise geographical location features to the machine learning algorithm inputs. We obtain important improvements regarding the models’ forecasting powers relative to models trained without these features (improvements beyond 50% for some forecasting error measures). Our results also reveal that neural networks and random forest techniques particularly outperform other methods when geocoding features are not accounted for, while random forest, adaboost and gradient boosting perform well when geocoding features are considered. For identifying opportunities in the real estate market through real estate price prediction, our results can be of particular interest. They can also serve as a basis for price assessment in revenue management for durable and non-replenishable products such as real estate.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    www.pinel-loi-gouv.fr/.

  2. 2.

    The link to the dataset “Demands of land values” is:https://www.data.gouv.fr/fr/datasets/5c4ae55a634f4117716d5656/.

  3. 3.

    https://geo.api.gouv.fr/adresse.

References

  1. Abidoye, R. B., Chan, A. P., Abidoye, F. A., & Oshodi, O. S. (2019). Predicting property price index using artificial intelligence techniques. International Journal of Housing Markets and Analysis, 12, 1072.

    Article  Google Scholar 

  2. Akyildirim, E., Goncu, A., & Sensoy, A. (2020). Prediction of cryptocurrency returns using machine learning. Annals of Operations Research. https://doi.org/10.1007/s10479-020-03575-y.

    Article  Google Scholar 

  3. Anselin, L. (2013). Spatial Econometrics: Methods and Models. Berlin: Springer.

    Google Scholar 

  4. Anthony, M., & Bartlett, P. L. (2009). Neural Network Learning: Theoretical Foundations. Cambridge: Cambridge University Press.

    Google Scholar 

  5. Basu, S., & Thibodeau, T. G. (1998). Analysis of spatial autocorrelation in house prices. The Journal of Real Estate Finance and Economics, 17(1), 61–85.

    Article  Google Scholar 

  6. Bekoulis, G., Deleu, J., Demeester, T., & Develder, C. (2018). An attentive neural architecture for joint segmentation and parsing and its application to real estate ads. Expert Systems with Applications, 102, 100–112.

    Article  Google Scholar 

  7. Berk, E., Gürler, Ü., & Yıldırım, G. (2009). On pricing of perishable assets with menu costs. International Journal of Production Economics, 121(2), 678–699.

    Article  Google Scholar 

  8. Baldominos, A., Blanco, I., Moreno, A. J., Iturrarte, R., Bernárdez, Ó., & Afonso, C. (2018). Identifying real estate opportunities using machine learning. Applied Sciences, 8, 2321.

    Article  Google Scholar 

  9. Bidanset, P.E., et al. (2017). “Further evaluating the impact of kernel and bandwidth specifications of geographically weighted regression on the equity and uniformity of mass appraisal models.” In Advances in Automated Valuation Modeling, Springer, 191–99.

  10. Bitter, C., Mulligan, G. F., & Dall’erba, S. . (2007). Incorporating spatial variation in housing attribute prices: a comparison of geographically weighted regression and the spatial expansion method. Journal of Geographical Systems, 9(1), 7–27.

    Article  Google Scholar 

  11. Bogataj, D., McDonnell, D. R., & Bogataj, M. (2016). Management, financing and taxation of housing stock in the shrinking cities of aging societies. International journal of production economics, 181, 2–13.

    Article  Google Scholar 

  12. Borde, S., Rane, A., Shende, G., & Shetty, S. (2017). Real estate investment advising using machine learning. International Research Journal of Engineering and Technology (IRJET), 4(3), 1821–1825.

    Google Scholar 

  13. Borst, R. A., & McCluskey, W. J. (2008). Using geographically weighted regression to detect housing submarkets: Modeling large-scale spatial variations in value. Journal of Property Tax Assessment & Administration, 5(1), 21–54.

    Google Scholar 

  14. Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Conference on Learning Theory (pp: 144–152).

  15. Botchkarev, A. (2019). A new typology design of performance metrics to measure errors in machine learning regression algorithms. Interdisciplinary Journal of Information, Knowledge & Management, 14, 45.

    Article  Google Scholar 

  16. Bourassa, S. C., Cantoni, E., & Hoesli, M. (2007). Spatial dependence, housing submarkets, and house price prediction. The Journal of Real Estate Finance and Economics, 35(2), 143–160.

    Article  Google Scholar 

  17. Bourassa, S. C., Hamelink, F., Hoesli, M., & MacGregor, B. D. (1999). Defining housing submarkets. Journal of Housing Economics, 8(2), 160–183.

    Article  Google Scholar 

  18. Bourassa, S. C., Hoesli, M., & Vincent, S. P. (2003). Do Housing Submarkets Really Matter? Journal of Housing Economics, 12(1), 12–28.

    Article  Google Scholar 

  19. Bourassa, S., Eva, C., & Hoesli, M. (2010). Predicting House Prices with Spatial Dependence: A Comparison of Alternative Methods. Journal of Real Estate Research, 32(2), 139–159.

    Google Scholar 

  20. Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  21. Case, B., John, C., Robin, D., & Rodriguez, M. (2004). Modeling spatial and temporal house price patterns: A comparison of four models. The Journal of Real Estate Finance and Economics, 29(2), 167–191.

    Article  Google Scholar 

  22. Čeh, M., Kilibarda, M., Lisec, A., & Bajat, B. (2018). Estimating the performance of random forest versus multiple regression for predicting prices of the apartments. ISPRS International Journal of Geo-Information, 7(5), 168.

    Article  Google Scholar 

  23. Chen, B., Bai, R., Li, J., Liu, Y., Xue, N., & Ren, J. (2020). A multiobjective single bus corridor scheduling using machine learning-based predictive models. International Journal of Production Research. https://doi.org/10.1080/00207543.2020.1766716.

    Article  Google Scholar 

  24. Choi, T. M., Wallace, S. W., & Wang, Y. (2018). Big data analytics in operations management. Production and Operations Management, 27, 1868–1883.

    Article  Google Scholar 

  25. Clapp, J. M. (2003). A semiparametric method for valuing residential locations: application to automated valuation. The Journal of Real Estate Finance and Economics, 27(3), 303–320.

    Article  Google Scholar 

  26. Cohen, M. C. (2018). Big data and service operations. Production and Operations Management, 27(9), 1709–1723.

    Article  Google Scholar 

  27. Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273–297.

    Google Scholar 

  28. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. Information Theory, 13(1), 21–27.

    Article  Google Scholar 

  29. Cui, R., Gallino, S., Moreno, A., & Zhang, D. J. (2018). The operational value of social media information. Production and Operations Management, 27(10), 1749–1769.

    Article  Google Scholar 

  30. D’Amato, V., Di Lorenzo, E., Haberman, S. et al. 2019. “Pension Schemes versus Real Estate.” Annals of Operations Research: 1–13.

  31. d’Amato, M., & Kauko, T. (2017). Advances in Automated Valuation Modeling. Berlin: Springer.

    Google Scholar 

  32. Dana, J. D., Jr. (2008). New directions in revenue management research. Production and Operations Management, 17(4), 399–401.

    Article  Google Scholar 

  33. Devroye, L., Györfi, L., & Lugosi, G. (1996).A Probabilistic Theory of Pattern Recognition, Springer, Berlin

  34. Din, A., Hoesli, M., & Bender, A. (2001). Environmental variables and real estate prices. Urban Studies, 38(11), 1989–2000.

    Article  Google Scholar 

  35. Doumpos, M., Papastamos, D., Andritsos, D., & Zopounidis, C. (2020). Developing automated valuation models for estimating property values: a comparison of global and locally weighted approaches. Annals of Operations Research. https://doi.org/10.1007/s10479-020-03556-1.

    Article  Google Scholar 

  36. Garcia, J. C. E., & Alfandari, L. (2018). Robust location of new housing developments using a choice model. Annals of Operations Research, 271(2), 527–550.

    Article  Google Scholar 

  37. Fik, T. J., Ling, D. C., & Mulligan, G. F. (2003). Modeling spatial variation in housing prices: a variable interaction approach. Real Estate Economics, 31(4), 623–646.

    Article  Google Scholar 

  38. Freund, Y., & Schapire, R. E. (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In European Conference on Computational Learning Theory (pp 23–37).

  39. Geraghty, M. K., & Johnson, E. (1997). Revenue management saves national car rental. Interfaces, 27(1), 107–127.

    Article  Google Scholar 

  40. Gomes, L. F. A. M. (2009). An application of the TODIM method to the multicriteria rental evaluation of residential properties. European Journal of Operational Research, 193(1), 204–211.

    Article  Google Scholar 

  41. Gomes, L. F. A. M., & Rangel, L. A. D. (2009). Determining the utility functions of criteria used in the evaluation of real estate. International Journal of Production Economics, 117(2), 420–426.

    Article  Google Scholar 

  42. Goodman, A. C., & Thibodeau, T. G. (1998). Housing market segmentation. Journal of Housing Economics, 7(2), 121–143.

    Article  Google Scholar 

  43. Goodman, A. C., & Thibodeau, T. G. (2003). Housing market segmentation and hedonic prediction accuracy. Journal of Housing Economics, 12(3), 181–201.

    Article  Google Scholar 

  44. Goodman, A. C., & Thibodeau, T. G. (2007). The spatial proximity of metropolitan area housing submarkets. Real Estate Economics, 35(2), 209–232.

    Article  Google Scholar 

  45. Gröbel, S., & Thomschke, L. (2018). Hedonic pricing and the spatial structure of housing data–an application to Berlin. Journal of Property Research, 35(3), 185–208.

    Article  Google Scholar 

  46. Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223–2273.

    Article  Google Scholar 

  47. Harewood, S. I. (2006). Managing a Hotel’s perishable inventory using bid prices. International Journal of Operations & Production Management. https://doi.org/10.1108/01443570610691094.

    Article  Google Scholar 

  48. Helbich, M., & Griffith, D. A. (2016). Spatially varying coefficient models in real estate: eigenvector spatial filtering and alternative approaches. Computers, Environment and Urban Systems, 57, 1–11.

    Article  Google Scholar 

  49. Hu, L., et al. (2019). Monitoring housing rental prices based on social media: An integrated approach of machine-learning algorithms and hedonic modeling to inform equitable housing policies. Land Use Policy, 82, 657–673.

    Article  Google Scholar 

  50. Huang, Y. (2019). Predicting home value in California, United States via machine learning modeling. Statistics, Optimization & Information Computing, 7(1), 66–74.

    Article  Google Scholar 

  51. Isakson, H. R. (1988). Valuation analysis of commercial real estate using the nearest neighbors appraisal technique. Growth and Change, 19(2), 11–24.

    Article  Google Scholar 

  52. Johnson, M. P. (2003). Single-period location models for subsidized housing: Tenant-based subsidies. Annals of Operations Research, 123, 105–124.

    Article  Google Scholar 

  53. Koetter, M., & Poghosyan, T. (2010). Real estate prices and bank stability. Journal of Banking & Finance, 34(6), 1129–1138.

    Article  Google Scholar 

  54. Kok, N., Koponen, E. L., & Martínez-Barbosa, C. A. (2017). Big data in real estate? The Journal of Portfolio Management, 43(6), 202–211.

    Article  Google Scholar 

  55. Kontrimas, V., & Verikas, A. (2011). The mass appraisal of the real estate by computational intelligence. Applied Soft Computing, 11(1), 443–448.

    Article  Google Scholar 

  56. Kuşan, H., Aytekin, O., & Özdemir, İ. (2010). The use of fuzzy logic in predicting house selling price. Expert systems with Applications, 37(3), 1808–1813.

    Article  Google Scholar 

  57. Kusiak, A. (2020). Convolutional and generative adversarial neural networks in manufacturing. International Journal of Production Research, 58(5), 1594–1604.

    Article  Google Scholar 

  58. Lam, K. C., Yu, C. Y., & Lam, C. K. (2009). Support vector machine and entropy based decision support system for property valuation. Journal of Property Research, 26(3), 213–233.

    Article  Google Scholar 

  59. Li, J., & Tang, O. (2012). Capacity and pricing policies with consumer overflow behavior. International Journal of Production Economics, 140(2), 825–832.

    Article  Google Scholar 

  60. Lockwood, T., & Rossini, P. (2011). Efficacy in modelling location within the mass appraisal process. Pacific Rim Property Research Journal, 17(3), 418–442.

    Article  Google Scholar 

  61. Lolli, F., Balugani, E., Ishizaka, A., Gamberini, R., Rimini, B., & Regattieri, A. (2019). Machine learning for multi-criteria inventory classification applied to intermittent demand. Production Planning and Control, 30(1), 76–89.

    Article  Google Scholar 

  62. Mayer, M., Bourassa, S. C., Hoesli, M., & Scognamiglio, D. (2018) Estimation and updating methods for hedonic valuation. Swiss Finance Institute Research Paper (18–76).

  63. McCluskey, W. J., et al. (2013). Prediction accuracy in mass appraisal: A comparison of modern approaches. Journal of Property Research, 30(4), 239–265.

    Article  Google Scholar 

  64. McCluskey, W. J., & Borst, R. A. (2011). Detecting and validating residential housing submarkets. International Journal of Housing Markets and Analysis, 4, 290.

    Article  Google Scholar 

  65. McCluskey, W. J., Daud, D. Z., & Kamarudin, N. (2014). Boosted regression trees: An application for the mass appraisal of residential property in Malaysia. Journal of Financial Management of Property and Construction. https://doi.org/10.1108/JFMPC-06-2013-0022.

    Article  Google Scholar 

  66. McNeill, G., & Hale, S. A. (2017). Generating tile maps (pp. 435–445). Wiley Online Library: In Computer Graphics Forum.

    Google Scholar 

  67. Morano, P., Tajani, F., & Locurcio, M. (2018). Multicriteria analysis and genetic algorithms for mass appraisals in the Italian property market. International Journal of Housing Markets and Analysis. https://doi.org/10.1108/IJHMA-04-2017-0034.

    Article  Google Scholar 

  68. Mullainathan, S., & Spiess, J. (2017). Machine learning: An applied econometric approach. Journal of Economic Perspectives, 31(2), 87–106.

    Article  Google Scholar 

  69. Shigaki, I., & Narazaki, H. (1999). A machine-learning approach for a sintering process using a neural network. Production Planning and Control, 10(8), 727–734.

    Article  Google Scholar 

  70. Narula, S. C., Wellington, J. F., & Lewis, S. A. (2012). Valuating residential real estate using parametric programming. European Journal of Operational Research, 217(1), 120–128.

    Article  Google Scholar 

  71. Orford, S. (2017). Valuing the built environment: GIS and house price analysis. London: Routledge.

    Google Scholar 

  72. Padhi, S. S., Theogrosse-Ruyken, P., & Das, D. (2015). Strategic revenue management under uncertainty: A case study on real estate projects in India. Journal of Multi-Criteria Decision Analysis, 22(3–4), 213–229.

    Article  Google Scholar 

  73. Pagourtzi, E., Assimakopoulos, V., Hatzichristos, T., & French, N. (2003) Real estate appraisal: A review of valuation methods. Journal of Property Investment & Finance.

  74. Pedersen, A. M. B., Weissensteiner, A., & Poulsen, R. (2013). Financial planning for young households. Annals of Operations Research, 205, 55–73.

    Article  Google Scholar 

  75. Lins, M. P. E., de Lyra Novaes, L. F., & Legey, L. F. L. (2005). Real estate appraisal : A double perspective data envelopment analysis approach. Annals of Operations Research, 138, 79–96.

    Article  Google Scholar 

  76. Pérez-Rave, J. I., Correa-Morales, J. C., & González-Echavarría, F. (2019). A machine learning approach to big data regression analysis of real estate prices for inferential and predictive purposes. Journal of Property Research, 36(1), 59–96.

    Article  Google Scholar 

  77. Di Pietro, G., & Rinnone, F. (2017). Online geocoding services: A benchmarking analysis to some European cities. In 2017 Baltic Geodetic Congress (BGC Geomatics), IEEE, 273–81.

  78. Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: from theory to algorithms. Cambridge: Cambridge University Press.

    Google Scholar 

  79. Shin, C. K., & Park, S. C. (2000). A machine learning approach to yield management in semiconductor manufacturing. International Journal of Production Research, 38(17), 4261–4271.

    Article  Google Scholar 

  80. Shmueli, G., & Yahav, I. (2018). The forest or the trees? Tackling Simpson’s paradox with classification trees. Production and Operations Management, 27(4), 696–716.

    Article  Google Scholar 

  81. Singh, S. K. (2017). Evaluating two freely available geocoding tools for geographical inconsistencies and geocoding errors. Open Geospatial Data, Software and Standards, 2(1), 11.

    Article  Google Scholar 

  82. Stigler, S. M. (1981). Gauss and the invention of least squares. Annals of Statistics, 9(3), 465–474.

    Article  Google Scholar 

  83. Thériault, M., Des Rosiers, F., Villeneuve, P., & Kestens, Y. (2003). Modelling interactions of location with specific value of housing attributes. Property Management. https://doi.org/10.1108/02637470310464472.

    Article  Google Scholar 

  84. Valier, A. (2020). Who performs better? AVMs vs Hedonic Models”. Journal of Property Investment & Finance, 38, 213.

    Article  Google Scholar 

  85. Viriato, J. C. (2019). AI and machine learning in real estate investment. The Journal of Portfolio Management, 45(7), 43–54.

    Article  Google Scholar 

  86. Wang, D., & Li, V. J. (2019). Mass appraisal models of real estate in the 21st century: A systematic literature review. Sustainability, 11(24), 7006.

    Article  Google Scholar 

  87. Wen, X., Xu, C., & Hu, Q. (2016). Dynamic capacity management with uncertain demand and dynamic price. International Journal of Production Economics, 175, 121–131.

    Article  Google Scholar 

  88. Wu, R. C. (1997). Neural network models: Foundations and applications to an audit decision problem. Annals of Operations Research, 75, 291–301.

    Article  Google Scholar 

  89. Xu, T. (2008). Heterogeneity in housing attribute prices. International Journal of Housing Markets and Analysis, 1, 166.

    Article  Google Scholar 

  90. Yacim, J. A., & Boshoff, D. G. B. (2018). Impact of artificial neural networks training algorithms on accurate prediction of property values. Journal of Real Estate Research, 40(3), 375–418.

    Google Scholar 

  91. Yu, D., & Wu, C. (2006). Incorporating remote sensing information in modeling house values. Photogrammetric Engineering & Remote Sensing, 72(2), 129–138.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Dieudonné Tchuente.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tchuente, D., Nyawa, S. Real estate price estimation in French cities using geocoding and machine learning. Ann Oper Res (2021). https://doi.org/10.1007/s10479-021-03932-5

Download citation

Keywords

  • Real estate market
  • Automated valuation models
  • Investment
  • Geocoding
  • French cities
  • Machine learning
  • Artificial intelligence