MobilityMirror: Bias-Adjusted Transportation Datasets

Rodriguez, Luke; Salimi, Babak; Ping, Haoyue; Stoyanovich, Julia; Howe, Bill

doi:10.1007/978-3-030-11238-7_2

Luke Rodriguez¹³,
Babak Salimi¹³,
Haoyue Ping¹⁴,
Julia Stoyanovich¹⁵ &
…
Bill Howe¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 926))

Included in the following conference series:

Workshop on Big Social Data and Urban Computing

376 Accesses
1 Altmetric

Abstract

We describe customized synthetic datasets for publishing mobility data. Companies are providing new transportation modalities, and their data is of high value for integrative transportation research, policy enforcement, and public accountability. However, these companies are disincentivized from sharing data not only to protect the privacy of individuals (drivers and/or passengers), but also to protect their own competitive advantage. Moreover, demographic biases arising from how the services are delivered may be amplified if released data is used in other contexts.

We describe a model and algorithm for releasing origin-destination histograms that removes selected biases in the data using causality-based methods. We compute the origin-destination histogram of the original dataset then adjust the counts to remove undesirable causal relationships that can lead to discrimination or violate contractual obligations with data owners. We evaluate the utility of the algorithm on real data from a dockless bike share program in Seattle and taxi data in New York, and show that these adjusted transportation datasets can retain utility while removing bias in the underlying data.

J. Stoyanovich—This work was supported in part by NSF Grant No. 1741047.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Many methods for comparing ranked lists have been proposed. We opt for a measure in which identity of the items being ranked (histogram buckets) is deemed important. This is in contrast to typical IR measures such as NDCG or MAP, where item identity is disregarded, and only item quality or relevance scores are retained.
2.
Two datasets are neighbors if they differ in the presence or absence of a single record, following the differential privacy definition.

References

Amazon doesn’t consider the race of its customers. should it? Bloomberg (2016)
Google Scholar
Acs, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through lossy compression. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 1–10. IEEE (2012)
Google Scholar
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias: risk assessments in criminal sentencing. ProPublica, 23 May 2016
Google Scholar
Barocas, S., Selbst, A.: Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016)
Google Scholar
Brauneis, R., Goodman, E.P.: Algorithmic transparency for the smart city. Yale J. Law Technol., forthcoming
Google Scholar
Brock, A.M., et al.: SIG: making maps accessible and putting accessibility in maps. In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, p. SIG03. ACM (2018)
Google Scholar
Chen, R., Fung, B.C., Yu, P.S., Desai, B.C.: Correlated network data publication via differential privacy. VLDB J. 23(4), 653–676 (2014)
Article Google Scholar
Cormode, G., Procopiuc, M., Srivastava, D., Tran, T.T.: Differentially private publication of sparse data. arXiv preprint arXiv:1103.0825 (2011)
Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: IEEE SP, pp. 598–617 (2016)
Google Scholar
Datta, A., Tschantz, M.C., Datta, A.: Automated experiments on ad privacy settings. PoPETs 2015(1), 92–112 (2015)
Google Scholar
Day, W.-Y., Li, N.: Differentially private publishing of high-dimensional data using sensitivity control. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, pp. 451–462. ACM (2015)
Google Scholar
de Montjoye, Y.-A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013)
Article Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Chapter Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, pp. 259–268. ACM, New York (2015)
Google Scholar
Ferris, B., Watkins, K., Borning, A.: OneBusAway: results from providing real-time arrival information for public transit. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1807–1816. ACM (2010)
Google Scholar
Galhotra, S., Brun, Y., Meliou, A.: Fairness testing: testing software for discrimination. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, 4–8 September 2017, pp. 498–510 (2017)
Google Scholar
Ge, Y., Knittel, C.R., MacKenzie, D., Zoepf, S.: Racial and gender discrimination in transportation network companies. Working Paper 22776, National Bureau of Economic Research, October 2016
Google Scholar
Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)
Article MathSciNet Google Scholar
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1–2), 1021–1032 (2010)
Article Google Scholar
Kilbertus, N., Carulla, M.R., Parascandolo, G., Hardt, M., Janzing, D., Schölkopf, B.: Avoiding discrimination through causal reasoning. In: Advances in Neural Information Processing Systems, pp. 656–666 (2017)
Google Scholar
Kirkpatrick, K.: It’s not the algorithm, it’s the data. Commun. ACM 60(2), 21–23 (2017)
Article MathSciNet Google Scholar
Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010, pp. 571–580 (2010)
Google Scholar
Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Advances in Neural Information Processing Systems, pp. 4069–4079 (2017)
Google Scholar
Lu, W., Miklau, G., Gupta, V.: Generating private synthetic databases for untrusted system evaluation. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 652–663. IEEE (2014)
Google Scholar
Ma, S., Zheng, Y., Wolfson, O.: Real-time city-scale taxi ridesharing. IEEE Trans. Knowl. Data Eng. 27, 1782–1795 (2015)
Article Google Scholar
Markovsky, I.: Low Rank Approximation: Algorithms, Implementation, Applications. Springer, Heidelberg (2011). https://doi.org/10.1007/978-1-4471-2227-2
Book MATH Google Scholar
McFarland, D.A., McFarland, H.R.: Big data and the danger of being precisely inaccurate. Big Data Soc. 2(2), 2053951715602495 (2015)
Article Google Scholar
Meng, X., Li, H., Cui, J.: Different strategies for differentially private histogram publication. J. Commun. Inf. Netw. 2(3), 68–77 (2017)
Article Google Scholar
MetroLab Network. First, do no harm: Ethical guidelines for applying predictive tools within human services (2018, forthcoming). http://www.alleghenycountyanalytics.us/
Nabi, R., Shpitser, I.: Fair inference on outcomes. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, vol. 2018, p. 1931. NIH Public Access (2018)
Google Scholar
NYC Taxi and Limousine Commission. TLC trip record data (2018). http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml. Accessed 2 June 2018
Rastogi, V., Nath, S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 735–746. ACM (2010)
Google Scholar
Rubin, D.B.: Causal inference using potential outcomes: design, modeling, decisions. J. Am. Stat. Assoc. 100(469), 322–331 (2005)
Article MathSciNet Google Scholar
Salimi, B., Gehrke, J., Suciu, D.: Bias in OLAP queries: detection, explanation, and removal. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1021–1035. ACM (2018)
Google Scholar
Sweeney, L.: Discrimination in online Ad delivery. Commun. ACM 56(5), 44–54 (2013)
Article Google Scholar
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2011)
Article Google Scholar
Xiao, Y., Xiong, L., Fan, L., Goryczka, S.: Dpcube: differentially private histogram release through multidimensional partitioning. arXiv preprint arXiv:1202.5358 (2012)
Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., Winslett, M.: Differentially private histogram publication. VLDB J. 22(6), 797–822 (2013)
Article Google Scholar
Zemel, R.S., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: ICML, pp. 325–333 (2013)
Google Scholar
Zhang, Y., Thomas, T., Brussel, M., van Maarseveen, M.: Expanding bicycle-sharing systems: lessons learnt from an analysis of usage. PLoS One 11(12), e0168604 (2016)
Article Google Scholar
Zliobaite, I.: Measuring discrimination in algorithmic decision making. Data Min. Knowl. Discov. 31(4), 1060–1089 (2017)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Washington, Seattle, USA
Luke Rodriguez, Babak Salimi & Bill Howe
Drexel University, Philadelphia, USA
Haoyue Ping
New York University, New York City, USA
Julia Stoyanovich

Authors

Luke Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Babak Salimi
View author publications
You can also search for this author in PubMed Google Scholar
Haoyue Ping
View author publications
You can also search for this author in PubMed Google Scholar
Julia Stoyanovich
View author publications
You can also search for this author in PubMed Google Scholar
Bill Howe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luke Rodriguez .

Editor information

Editors and Affiliations

Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
Jonice Oliveira
Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
Claudio M. Farias
Inria/CNRS, University of Montpellier, Montpellier, France
Esther Pacitti
University of Calabria (Unical), Rende (CS), Italy
Giancarlo Fortino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodriguez, L., Salimi, B., Ping, H., Stoyanovich, J., Howe, B. (2019). MobilityMirror: Bias-Adjusted Transportation Datasets. In: Oliveira, J., Farias, C., Pacitti, E., Fortino, G. (eds) Big Social Data and Urban Computing. BiDU 2018. Communications in Computer and Information Science, vol 926. Springer, Cham. https://doi.org/10.1007/978-3-030-11238-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-11238-7_2
Published: 23 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11237-0
Online ISBN: 978-3-030-11238-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics