Estimation of missing prices in real-estate market agent-based simulations with machine learning and dimensionality reduction methods

  • Iván García-MagariñoEmail author
  • Carlos Medrano
  • Jorge Delgado
Original Article


The opacity of real-estate market involves some challenges in their agent-based simulation. While some real-estate Web sites provide the prices of a great amount of houses publicly, the prices of the rest are not available. The estimation of these prices is necessary for simulating their evolution from a complete initial set of houses. Additionally, this estimation could also be useful for other purposes such as appraising houses, letting buyers know which are the best offered prices (i.e., the lowest ones compared to the appraisals) and recommending the buyers to set an initial price. This work proposes combining dimensionality reduction methods with machine learning techniques to obtain the estimated prices. In particular, this work analyzes the use of nonnegative factorization, recursive feature elimination and feature selection with a variance threshold, as dimensionality reduction methods. It compares the application of linear regression, support vector regression, the k-nearest neighbors and a multilayer perceptron neural network, as machine learning techniques. This work has applied a tenfold cross-validation for comparing the estimations and errors and assessing the improvement over a basic estimator commonly used in the beginning of simulations. The developed software and the used dataset are freely available from a data research repository for the sake of reproducibility and the support to other researchers.


Agent-based simulation Machine learning Real-estate market Simulation setup 



This work has been supported by the program “Estancias de movilidad en el extranjero José Castillejo para jóvenes doctores” funded by the Spanish Ministry of Education, Culture and Sport with reference CAS17/00005. This work also acknowledges the research project “Diseño de actividades de aprendizaje colaborativas con Big Data” with reference PIIDUZ_16_120 funded by University of Zaragoza. We acknowledge the research project “Construcción de un framework para agilizar el desarrollo de aplicaciones móviles en el ámbito de la salud” funded by University of Zaragoza and Foundation Ibercaja with grant reference JIUZ-2017-TEC-03. We also acknowledge support from “Universidad de Zaragoza,” “Fundación Bancaria Ibercaja” and “Fundación CAI” in the “Programa Ibercaja-CAI de Estancias de Investigación” with reference IT1/18. This work was partially supported by the Spanish Research grant MTM2015-65433-P (MINECO/FEDER), Gobierno de Aragón and Fondo Social Europeo. Furthermore, we acknowledge the “Fondo Social Europeo” and the “Departamento de Tecnología y Universidad del Gobierno de Aragón” for their joint support with grant number Ref-T81.

Compliance with ethical standards

Conflict of interest

The authors declare that there is not any conflict of interest about this work.


  1. 1.
    Anya O, Moore B, Kieliszewski C, Maglio P, Anderson L (2015) Understanding the practice of discovery in enterprise big data science: an agent-based approach. Procedia Manuf 3:882–889CrossRefGoogle Scholar
  2. 2.
    Bárcena Ruiz MJ, Menéndez P, Palacios MB, Tusell Palmer FJ (2011) Measuring the effect of the real estate bubble: a house price index for Bilbao. Biltoki 5463. Last accessed 19 July 2017
  3. 3.
    Becker T, Illigen C, McKelvey B, Hülsmann M, Windt K (2016) Using an agent-based neural-network computational model to improve product routing in a logistics facility. Int J Prod Econ 174:156–167CrossRefGoogle Scholar
  4. 4.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkzbMATHGoogle Scholar
  5. 5.
    Borges F, Gutierrez-Milla A, Luque E, Suppi R (2017) Care HPS: a high performance simulation tool for parallel and distributed agent-based modeling. Future Gener Comput Syst 68:59–73CrossRefGoogle Scholar
  6. 6.
    Bosch M, Carnero MA, Farré L (2015) Rental housing discrimination and the persistence of ethnic enclaves. SERIEs 6(2):129–152CrossRefGoogle Scholar
  7. 7.
    Brown JM, Phelps JJ, Barkwith A, Hurst MD, Ellis MA, Plater AJ (2016) The effectiveness of beach mega-nourishment, assessed over three management epochs. J Environ Manag 184:400–408CrossRefGoogle Scholar
  8. 8.
    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at Last accessed 19 July 2017
  9. 9.
    Chang CC, Chao CH, Yeh JH (2016) The role of buy-side anchoring bias: evidence from the real estate market. Pacific-Basin Finance J 38:34–58CrossRefGoogle Scholar
  10. 10.
    Chasco Yrigoyen C, Le Gallo J (2012) Hierarchy and spatial autocorrelation effects in hedonic models. Econ Bull 32(2):1474–1480Google Scholar
  11. 11.
    Chen J, Feng S, Liu J (2014) Topic sense induction from social tags based on non-negative matrix factorization. Inf Sci 280:16–25MathSciNetCrossRefGoogle Scholar
  12. 12.
    Chiarazzo V, Caggiani L, Marinelli M, Ottomanelli M (2014) A neural network based model for real estate price estimation considering environmental quality of property location. Transp Res Procedia 3:810–817.,, 17th Meeting of the EURO working group on transportation, EWGT2014, 2–4 July 2014, Sevilla, Spain
  13. 13.
    Chung H, Badeau R, Plourde E, Champagne B (2018) Training and compensation of class-conditioned nmf bases for speech enhancement. Neurocomputing 284:107–118CrossRefGoogle Scholar
  14. 14.
    Cicirelli F, Furfaro A, Giordano A, Nigro L (2011) HLA\_ACTOR\_REPAST: an approach to distributing RePast models for high-performance simulations. Simul Modell Pract Theory 19(1):283–300CrossRefGoogle Scholar
  15. 15.
    Cui G, Zhuang G, Lu J (2016) Neural-network-based distributed adaptive synchronization for nonlinear multi-agent systems in pure-feedback form. Neurocomputing 218:234–241CrossRefGoogle Scholar
  16. 16.
    Davidsson P (2002) Agent based social simulation: a computer science view. J Artif Soc Soc Simul 5(1):1–7Google Scholar
  17. 17.
    Dismuke C, Lindrooth R (2006) Ordinary least squares. In: Chumney E, Simpson NK (eds) Methods and designs for outcomes research. American Society of Health-System Pharmacists, Bethesda, pp 93–104Google Scholar
  18. 18.
    Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, HobokenzbMATHGoogle Scholar
  19. 19.
    Faul F, Erdfelder E, Lang AG, Buchner A (2007) G* power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39(2):175–191CrossRefGoogle Scholar
  20. 20.
    Galey M (2005) System and method of online real estate listing and advertisement. US Patent App. 10/896,331Google Scholar
  21. 21.
    Garca N, Gmez M, Alfaro E (2008) Ann+gis: an automated system for property valuation. Neurocomputing 71(4):733–742.,, Neural Networks: algorithms and applications 50 years of artificial intelligence: a neuronal approach
  22. 22.
    García M (2010) The breakdown of the spanish urban growth model: social and territorial effects of the global crisis. Int J Urban Reg Res 34(4):967–980CrossRefGoogle Scholar
  23. 23.
    García-Magariño I, Lacuesta R (2017) Agent-based simulation of real-estate transactions. J Comput Sci 21:60–76CrossRefGoogle Scholar
  24. 24.
    García-Magariño I, Plaza I (2017) ABS-MindHeart: an agent based simulator of the influence of mindfulness programs on heart rate variability. J Comput Sci 19:11–20CrossRefGoogle Scholar
  25. 25.
    García-Magariño I, Gómez-Rodríguez A, González-Moreno JC, Palacios-Navarro G (2015) PEABS: a process for developing efficient agent-based simulators. Eng Appl Artif Intell 46:104–112CrossRefGoogle Scholar
  26. 26.
    García-Magariño I, Medrano C, Delgado J (2017) Python code for the estimation of missing prices in real-estate market with a dataset of house prices from Teruel city. Mendeley Data, v2
  27. 27.
    Gilbert N, Terna P (2000) How to build and use agent-based models in social science. Mind Soc 1(1):57–72CrossRefGoogle Scholar
  28. 28.
    Gómez-Sanz JJ, Fernández CR, Arroyo J (2010) Model driven development and simulations with the INGENIAS agent framework. Simul Model Pract Theory 18(10):1468–1482CrossRefGoogle Scholar
  29. 29.
    Hassan S, Garmendia L, Pavón J (2010) Introducing uncertainty into social simulation: using fuzzy logic for agent-based modelling. Int J Reasoning-based Intell Syst 2(2):118–124CrossRefGoogle Scholar
  30. 30.
    Houari R, Bounceur A, Kechadi MT, Tari AK, Euler R (2016) Dimensionality reduction in data mining: a copula approach. Expert Syst Appl 64:247–260CrossRefGoogle Scholar
  31. 31.
    Jalalimanesh A, Haghighi HS, Ahmadi A, Soltani M (2017) Simulation-based optimization of radiotherapy: agent-based modeling and reinforcement learning. Math Comput Simul 133:235–248MathSciNetCrossRefGoogle Scholar
  32. 32.
    Jayaram D, Manrai AK, Manrai LA (2015) Effective use of marketing technology in Eastern Europe: web analytics, social media, customer analytics, digital campaigns and mobile applications. J Econ Finance Adm Sci 20(39):118–132Google Scholar
  33. 33.
    Jiang GM, Hu ZP, Jin JY (2007) Quantitative evaluation of real estate’s risk based on AHP and simulation. Syst Eng Theory Pract 27(9):77–81CrossRefGoogle Scholar
  34. 34.
    Khalil KM, Abdel-Aziz M, Nazmy TT, Salem ABM (2015) MLIMAS: a framework for machine learning in interactive multi-agent systems. Procedia Comput Sci 65:827–835CrossRefGoogle Scholar
  35. 35.
    Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791zbMATHCrossRefGoogle Scholar
  36. 36.
    Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562Google Scholar
  37. 37.
    Li ZX (2006) Using fuzzy neural network in real estate prices prediction. In: 2007 Chinese control conference, pp 399–402.
  38. 38.
    Maltamo M, Kangas A (1998) Methods based on k-nearest neighbor regression in the prediction of basal area diameter distribution. Can J For Res 28(8):1107–1115CrossRefGoogle Scholar
  39. 39.
    Maruyama R, Maeda K, Moroda H, Kato I, Inoue M, Miyakawa H, Aonishi T (2014) Detecting cells using non-negative matrix factorization on calcium imaging data. Neural Netw 55:11–19CrossRefGoogle Scholar
  40. 40.
    Nguyen N, Cripps A (2001) Predicting housing value: a comparison of multiple regression analysis and artificial neural networks. J Real Estate Res 22(3):313–336Google Scholar
  41. 41.
    North MJ, Collier NT, Ozik J, Tatara ER, Macal CM, Bragen M, Sydelko P (2013) Complex adaptive systems modeling with Repast Simphony. Complex Adapt Syst Model 1(1):1CrossRefGoogle Scholar
  42. 42.
    Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5:111–126CrossRefGoogle Scholar
  43. 43.
    Park B, Bae JK (2015) Using machine learning algorithms for housing price prediction: the case of Fairfax county, Virginia housing data. Expert Syst Appl 42(6):2928–2934. CrossRefGoogle Scholar
  44. 44.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830MathSciNetzbMATHGoogle Scholar
  45. 45.
    Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59CrossRefGoogle Scholar
  46. 46.
    Pyhrr SA (1973) A computer simulation model to measure the risk in real estate investment. Real Estate Econ 1(1):48–78CrossRefGoogle Scholar
  47. 47.
    Reiser L, Mueller LA, Rhee SY (2002) Surviving in a sea of data: a survey of plant genome data resources and issues in building data management systems. Functional genomics. Springer, Berlin, pp 59–74CrossRefGoogle Scholar
  48. 48.
    Sabarina K, Priya N (2015) Lowering data dimensionality in big data for the benefit of precision agriculture. Procedia Comput Sci 48:548–554CrossRefGoogle Scholar
  49. 49.
    Simovici D (2012) Linear algebra tools for data mining. World Scientific Publishing, SingaporezbMATHCrossRefGoogle Scholar
  50. 50.
    Sun Y, Wen G (2017) Cognitive facial expression recognition with constrained dimensionality reduction. Neurocomputing 230:397–408CrossRefGoogle Scholar
  51. 51.
    Symeonidis S, Effrosynidis D, Arampatzis A (2018) A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis. Expert Syst Appl 110:298–310CrossRefGoogle Scholar
  52. 52.
    Tratalos J, Haines-Young R, Potschin M, Fish R, Church A (2016) Cultural ecosystem services in the UK: lessons on designing indicators to inform management and policy. Ecol Indic 61:63–73CrossRefGoogle Scholar
  53. 53.
    Urbanavičiene V, Kaklauskas A, Zavadskas EK (2009) The conceptual model of construction and real estate negotiation. Int J Strateg Prop Manag 13(1):53–70CrossRefGoogle Scholar
  54. 54.
    Wang R, Hou J, He X (2017) Real estate price and heterogeneous investment behavior in China. Econ Model 60:271–280CrossRefGoogle Scholar
  55. 55.
    Wang S, Wan J, Zhang D, Li D, Zhang C (2016) Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination. Comput Netw 101:158–168CrossRefGoogle Scholar
  56. 56.
    Wojtusiak J, Warden T, Herzog O (2012) Machine learning in agent-based stochastic simulation: inferential theory and evaluation in transportation logistics. Comput Math Appl 64(12):3658–3665zbMATHCrossRefGoogle Scholar
  57. 57.
    Yaqoob I, Hashem IAT, Gani A, Mokhtar S, Ahmed E, Anuar NB, Vasilakos AV (2016) Big data: from beginning to future. Int J Inf Manag 36(6):1231–1247CrossRefGoogle Scholar
  58. 58.
    Zhang L, Wang Z, Sagotsky JA, Deisboeck TS (2009) Multiscale agent-based cancer modeling. J Math Biol 58(4–5):545–559MathSciNetzbMATHCrossRefGoogle Scholar
  59. 59.
    Zhuge C, Shao C, Gao J, Dong C, Zhang H (2016) Agent-based joint model of residential location choice and real estate price for land use and transport model. Comput Environ Urban Syst 57:93–105CrossRefGoogle Scholar
  60. 60.
    Žibert J, Cedilnik J, Pražnikar J (2016) Particulate matter (pm10) patterns in Europe: an exploratory data analysis using non-negative matrix factorization. Atmos Environ 132:217–228CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and Engineering of Systems, EduQTechUniversity of Zaragoza, Escuela Universitaria Politécnica de TeruelTeruelSpain
  2. 2.Instituto de Investigación Sanitaria AragónUniversity of ZaragozaZaragozaSpain
  3. 3.Department of Electronics Engineering and Communications, EduQTechUniversity of Zaragoza, Escuela Universitaria Politécnica de TeruelTeruelSpain
  4. 4.Department of Applied MathematicsUniversity of Zaragoza, Escuela Universitaria Politécnica de TeruelTeruelSpain

Personalised recommendations