Modelling socioeconomic attributes of public transit passengers


The lack of personal and economic attributes in emerging public transit big data (such as smart card data) is a general issue that needs to be addressed. Passengers in the public transit network are from different socioeconomic classes, and their trip attributes usually depend on their personal and economic attributes. For instance, age as a demographic attribute plays an important role in trip attributes; adolescent passengers travel to school, young professionals travel to work, and old passengers travel to medical facilities more often. Relations between the socioeconomic and trip attributes of the passengers can be examined by developing a Bayesian network that represents the relations between the attributes by directed acyclic graphs, and calculating the joint and conditional probability values in the graph. This study infers the socioeconomic attributes of the public transit passengers from the trip attributes through developing a Bayesian network. Considered socioeconomic attributes are age, gender, and income; considered trip attributes are start time and duration of the trip, stay duration, and available origin and destination land use types. First, potential structures of the Bayesian network are examined by comparing network scores and arc strength test. After learning the network’s parameters, the reasoning is done through both prediction and diagnosis in the network. Also, the most likely combinations of the socioeconomic and trip attributes are discovered. The case study for developing the Bayesian network is a Household Travel Survey dataset from Queensland, Australia, that contains both socioeconomic and trip attributes. Results clearly show how the socioeconomic attributes can be inferred from the trip attributes. Discovered probability distributions can be used to enrich the smart card datasets with the socioeconomic attributes. Moreover, the Bayesian classifier is applied to the dataset to validate the capability of the model in predicting the socioeconomic attributes. In the end, the developed network is implemented on a set of smart card records to discuss the potential applications.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. Aletras N, Chamberlain BP (2018) Predicting twitter user socioeconomic attributes with network and language information. In: Proceedings of the 29th on hypertext and social media, ACM, New York, pp 20–24

  2. Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3):345–370

    Article  Google Scholar 

  3. Brunsdon C, Charlton M, Rigby JE (2018) An open source geodemographic classification of small areas in the Republic of Ireland. Appl Spat Anal Policy 11(2):183–204

    Article  Google Scholar 

  4. Buntine W (1996) A guide to the literature on learning probabilistic networks from data. IEEE Trans Knowl Data Eng 8(2):195–210

    Article  Google Scholar 

  5. Chen C, Zhang G, Wang H, Yang J, Jin PJ, Walton CM (2015) Bayesian network-based formulation and analysis for toll road utilization supported by traffic information provision. Transp Res Part C Emerg Technol 60:339–359

    Article  Google Scholar 

  6. ChickeringDM, Heckerman D, Meek C (1997) A Bayesian approach to learning Bayesian networks with local structure. In: Proceedings of the thirteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 80–89

  7. Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347

    Google Scholar 

  8. Corman F, Kecman P (2018) Stochastic prediction of train delays in real-time using Bayesian networks. Transp Res Part C Emerg Technol 95:599–615

    Article  Google Scholar 

  9. Farber S, Marino MG (2017) Transit accessibility, land development and socioeconomic priority: a typology of planned station catchment areas in the Greater Toronto and Hamilton Area. J Transp Land Use 10(1):879–902

    Article  Google Scholar 

  10. Faroqi H, Mesbah M, Kim J (2018a) Applications of transit smart cards beyond a fare collection tool: a literature review. Adv Transp Stud 45:105–122

    Google Scholar 

  11. Faroqi H, Mesbah M, Kim J, Tavassoli A (2018b) A model for measuring activity similarity between public transit passengers using smart card data. Travel Behav Soc 13:11–25

    Article  Google Scholar 

  12. Faroqi H, Mesbah M, Kim J (2018) Inferring socioeconomic attributes of public transit passengers using classifiers. In: Proceedings of the 40th Australian transport research forum (ATRF)

  13. Foygel R, Drton M (2010) Extended Bayesian information criteria for Gaussian graphical models. In: Advances in neural information processing systems, pp 604–612

  14. Friedman N, Koller D (2003) Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Mach Learn 50(1–2):95–125

    Article  Google Scholar 

  15. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    Article  Google Scholar 

  16. Gregoriades A, Mouskos KC (2013) Black spots identification through a Bayesian Networks quantification of accident risk index. Transp Res Part C Emerg Technol 28:28–43

    Article  Google Scholar 

  17. Grossman D, Domingos P, Domingos P (2004) Learning Bayesian network classifiers by maximizing conditional likelihood. In: Proceedings of the twenty-first international conference on machine learning, ACM, p 46

  18. Kim J, Wang G (2016) Diagnosis and prediction of traffic congestion on urban road networks using Bayesian networks. Transp Res Rec 2595(1):108–118

    Article  Google Scholar 

  19. Korb KB, Nicholson AE (2010) Bayesian artificial intelligence. CRC Press, New York

    Google Scholar 

  20. Lampos V, Aletras N, Geyti JK, Zou B, Cox IJ (2016) Inferring the socioeconomic status of social media users based on behaviour and language. In: European conference on information retrieval, Springer, Cham, pp 689–695

  21. Luo S, Morone F, Sarraute C, Travizano M, Makse HA (2017) Inferring personal economic status from social network location. Nat Commun 8:15227

    Article  Google Scholar 

  22. Maghrebi M, Waller ST (2014) Exploring experts decisions in concrete delivery dispatching systems using Bayesian network learning techniques. In: 2014 2nd international conference on artificial intelligence, modelling and simulation, IEEE, pp 103–108

  23. Neff J, Pham L (2007). A profile of public transportation passenger demographics and travel characteristics reported in on-board surveys

  24. Nielsen TD, Jensen FV (2009) Bayesian networks and decision graphs. Springer Science & Business Media, Berlin

    Google Scholar 

  25. Pascale A, Nicoli M (2011) Adaptive Bayesian network for traffic flow prediction. In: 2011 IEEE statistical signal processing workshop (SSP), IEEE, pp 177–180

  26. Pearl J (2014) Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier, Amsterdam

    Google Scholar 

  27. Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

  28. Samaranayake S, Blandin S, Bayen A (2011) Learning the dependency structure of highway networks for traffic forecast. In: 2011 50th IEEE conference on decision and control and European control conference, IEEE, pp 5983–5988

  29. Scutari M, Denis JB (2014) Bayesian networks: with examples in R. Chapman and Hall/CRC, New York

    Google Scholar 

  30. Sun L, Lu Y, Jin JG, Lee DH, Axhausen KW (2015) An integrated Bayesian approach for passenger flow assignment in metro networks. Transp Res Part C Emerg Technol 52:116–131

    Article  Google Scholar 

  31. Tao X, Fu Z, Comber AJ (2019) An Analysis of Modes of Commuting in Urban and Rural Areas. Appl Spat Anal Policy 12(4):831–845

    Article  Google Scholar 

  32. Vega A, Kilgarriff P, O’Donoghue C, Morrissey K (2017) The spatial impact of commuting on income: a spatial microsimulation approach. Appl Spat Anal Policy 10(4):475–495

    Article  Google Scholar 

  33. Wang D, Chai Y (2009) The jobs–housing relationship and commuting in Beijing, China: the legacy of Danwei. J Transp Geogr 17(1):30–38

    Article  Google Scholar 

  34. Yaakub N, Napiah M (2011) Public bus passenger demographic and travel characteristics a study of public bus passenger profile in Kota Bharu, Kelantan. In: 2011 national postgraduate conference, IEEE, pp 1–6

  35. Yang S, Chang KC (2002) Comparison of score metrics for Bayesian network learning. IEEE Trans Syst Man Cybernet Part A Syst Hum 32(3):419–428

    Article  Google Scholar 

  36. Yu YJ, Cho MG (2008) A short-term prediction model for forecasting traffic information using Bayesian network. In: 2008 third international conference on convergence and hybrid information technology, IEEE, vol 1, pp 242–247

  37. Zhang Y, Cheng T (2018) Inferring social-demographics of travellers based on smart card data. In: 2nd international conference on advanced research methods and analytics (CARMA 2018), Editorial Universitat Politècnica de València, pp 55–62

  38. Zhang K, Taylor MA (2006) Effective arterial road incident detection: a Bayesian network based algorithm. Transp Res Part C Emerg Technol 14(6):403–417

    Article  Google Scholar 

  39. Zhao P, Lü B, De Roo G (2011) Impact of the jobs-housing balance on urban commuting in Beijing in the transformation era. J Transp Geogr 19(1):59–69

    Article  Google Scholar 

  40. Zhu Z, Li Z, Liu Y, Chen H, Zeng J (2017) The impact of urban characteristics and residents’ income on commuting in China. Transp Res Part D Transp Environ 57:474–483

    Article  Google Scholar 

  41. Zhu Y, Chen F, Li M, Wang Z (2018) Inferring the economic attributes of urban rail transit passengers based on individual mobility using multisource data. Sustainability 10(11):4178

    Article  Google Scholar 

Download references


This study is not funded by any organization.

Author information



Corresponding author

Correspondence to Hamed Faroqi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Faroqi, H., Mesbah, M. & Kim, J. Modelling socioeconomic attributes of public transit passengers. J Geogr Syst (2020).

Download citation


  • Probabilistic models
  • Decision graphs
  • Data mining
  • Spatial analyses
  • Travel surveys
  • Smart card data

JEL Classification

  • R00