Prediction of Socioeconomic Levels Using Cell Phone Records

  • Victor Soto
  • Vanessa Frias-Martinez
  • Jesus Virseda
  • Enrique Frias-Martinez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6787)


The socioeconomic status of a population or an individual provides an understanding of its access to housing, education, health or basic services like water and electricity. In itself, it is also an indirect indicator of the purchasing power and as such a key element when personalizing the interaction with a customer, especially for marketing campaigns or offers of new products. In this paper we study if the information derived from the aggregated use of cell phone records can be used to identify the socioeconomic levels of a population. We present predictive models constructed with SVMs and Random Forests that use the aggregated behavioral variables of the communication antennas to predict socioeconomic levels. Our results show correct prediction rates of over 80% for an urban population of around 500,000 citizens.


Feature Selection Random Forest Cell Phone Support Vector Regression Target Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Propper, C., Diamiano, M., Leckie, G., Dixon, J.: Impact of patients’ socioeconomic status on the distance travelled for hospital admission in the english national health service. Journal Health Serv. Res. Policy 12(3), 153–159 (2007)CrossRefGoogle Scholar
  2. 2.
    Carlsson-Kanyama, A., Liden, A.: Travel patterns and environmental effects now and in the future: implications of differences in energy consumption among socio-economic groups. Ecological Economics 30(3), 405–417 (1999)CrossRefGoogle Scholar
  3. 3.
    Rubio, A., Frias-Martinez, V., Frias-Martinez, E., Oliver, N.: Human mobility in advanced and developing economies: A comparative analysis. In: AAAI Spring Symposia Artificial Intelligence for Development, AI-D, Stanford, USA (2010)Google Scholar
  4. 4.
    Frias-Martinez, V., Virseda, J., Frias-Martinez, E.: Socio-economic levels and human mobility. In: Qual Meets Quant Workshop - QMQ 2010 at the Int. Conf. on Information & Communication Technologies and Development, ICTD (2010)Google Scholar
  5. 5.
    Eagle, N.: Network diversity and economic development. Science 328(5981) (2010)Google Scholar
  6. 6.
    Frias-Martinez, V., Virseda, J., Rubio, A., Frias, E.: Towards large scale technology impact analyses: Automatic residential localization from mobile phone-call data. In: Int. Conf. on Inf. & Comm. Technologies and Development (ICTD), UK (2010)Google Scholar
  7. 7.
    Lane, M., Carpenter, L., Whitted, T., Blinn, J.: Scan line methods for displaying parametrically defined surfaces. Communications ACM 23(1) (1980)Google Scholar
  8. 8.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238 (2005)CrossRefGoogle Scholar
  9. 9.
    Ding, C.H.Q., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinformatics and Comp. Biol. 3(2), 185–206 (2005)CrossRefGoogle Scholar
  10. 10.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at
  11. 11.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
  12. 12.
    Burbidge, R., Buxton, B.: An introduction to support vector machines for data mining. Technical report, Computer Science Department, UCL (2001)Google Scholar
  13. 13.
    Frias-martinez, E., Chen, S.Y., Liu, X.: Survey of data mining approaches to user modeling for adaptive hypermedia. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews 36(6), 734–749 (2006)CrossRefGoogle Scholar
  14. 14.
    Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, Department of Computer Science, Taiwan Univ. (2003)Google Scholar
  15. 15.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRefGoogle Scholar
  16. 16.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  17. 17.
    Frias-Martinez, E., Chen, S., Liu, X.: Automatic cognitive style identification of digital library users for personalization. Journal of the American Society for Information Science and Technology 58(2), 237–251 (2007)CrossRefGoogle Scholar
  18. 18.
    Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression machines. In: NIPS, pp. 155–161 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Victor Soto
    • 1
  • Vanessa Frias-Martinez
    • 1
  • Jesus Virseda
    • 1
  • Enrique Frias-Martinez
    • 1
  1. 1.Telefonica ResearchMadridSpain

Personalised recommendations