Skip to main content

Towards Collaborative Data Analysis with Diverse Crowds – A Design Science Approach

  • Conference paper
  • First Online:
Designing for a Digital and Globalized World (DESRIST 2018)

Abstract

The last years have witnessed an increasing shortage of data experts capable of analyzing the omnipresent data and producing meaningful insights. Furthermore, some data scientists mention data preprocessing to take up to 80% of the whole project time. This paper proposes a method for collaborative data analysis that involves a crowd without data analysis expertise. Orchestrated by an expert, the team of novices conducts data analysis through iterative refinement of results up to its successful completion. To evaluate the proposed method, we implemented a tool that supports collaborative data analysis for teams with mixed level of expertise. Our evaluation demonstrates that with proper guidance data analysis tasks, especially preprocessing, can be distributed and successfully accomplished by non-experts. Using the design science approach, iterative development also revealed some important features for the collaboration tool, such as support for dynamic development, code deliberation, and project journal. As such we pave the way for building tools that can leverage the crowd to address the shortage of data analysts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.upwork.com/.

  2. 2.

    www.kaggle.com/wikunia/d/census/2013-americancommunity-survey/earnings-by-occupation-sex/.

  3. 3.

    https://www.kaggle.com/ampaho/d/kaggle/hillary-clinton-emails/foreign-policy-map-through-hrc-s-emails/code.

  4. 4.

    https://www.kaggle.com/lplewa/d/reddit/reddit-comments-may-2015/communication-styles-vs-ranks/code.

  5. 5.

    https://www.kaggle.com/apapiu/d/benhamner/2016-us-election/predictions-in-the-republican-primary.

  6. 6.

    https://github.com/.

References

  1. Davenport, T.H., Patil, D.J.: Data_Scientist-the_Sexiest_Job_of_the_21St_Century.Pdf (2012)

    Google Scholar 

  2. Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: Human Factors in Computing Systems, pp. 3363–3372. ACM (2011). https://doi.org/10.1145/1978942.1979444

  3. Bernstein, A., Klein, M., Malone, T.W.: Programming the global brain. Commun. ACM 55, 41 (2012). https://doi.org/10.1145/2160718.2160731

    Article  Google Scholar 

  4. Sere, F.C., Swigger, K., Alpaslan, F.N., Brazile, R., Dafoulas, G., Lopez, V.: Online collaboration: collaborative behavior patterns and factors affecting globally distributed team performance. Comput. Hum. Behav. 27, 490–503 (2011). https://doi.org/10.1016/j.chb.2010.09.017

    Article  Google Scholar 

  5. Van Noorden, R.: Online collaboration: scientists and the social network. Nature 512, 126–129 (2014). https://doi.org/10.1038/512126a

    Article  Google Scholar 

  6. MacDonald, J.: Assessing online collaborative learning: Process and product. Comput. Educ. 40, 377–391 (2003). https://doi.org/10.1016/S0360-1315(02)00168-9

    Article  Google Scholar 

  7. Yadav, M.S., Pavlou, P.A.: Marketing in computer-mediated environments: research synthesis and new directions. J. Mark. 78, 20–40 (2014). https://doi.org/10.1509/jm.12.0020

    Article  Google Scholar 

  8. Tseng, H., Wang, C.-H., Ku, H.-Y., Sun, L.: Key factors in online collaboration and their relationship to teamwork satisfaction. Q. Rev. Distance Educ. 10, 195–206 (2009)

    Google Scholar 

  9. Salehi, N., McCabe, A., Valentine, M., Bernstein, M.S.: Huddler: convening stable and familiar crowd teams despite unpredictable availability. In: Proceedings of the 20th ACM Conference on Computer Supported Cooperative Work & Social Computing (2016)

    Google Scholar 

  10. Yukl, G.: Leadership in organizations. In: Personnel Psychology, 7th edn, p. 542 (2001). https://doi.org/10.1016/1048-9843(95)90027-6

  11. Kulkarni, A., Can, M., Hartmann, B.: Collaboratively crowdsourcing workflows with turkomatic. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work - CSCW 2012, p. 1003 (2012). https://doi.org/10.1145/2145204.2145354

  12. Kittur, A., Smus, B., Kraut, R.: CrowdForge Crowdsourcing complex work. In: Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA 2011. p. 1801 (2011). https://doi.org/10.1145/1979742.1979902

  13. Kittur, A., Khamkar, S., André, P., Kraut, R.E.: CrowdWeaver: visually managing complex crowd work. In: Scenario, pp. 1033–1036 (2012). https://doi.org/10.1145/2145204.2145357

  14. Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K.: Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, pp. 313–322 (2010). https://doi.org/10.1145/1866029.1866078

  15. Carpenter, J.: May the best analyst win. Science (New York) 331, 698–699 (2011). https://doi.org/10.1126/science.331.6018.698

    Article  Google Scholar 

  16. Dissanayake, I., Zhang, J., Gu, B.: Virtual team performance in crowdsourcing contests: a social network perspective. In: ICIS 2015 Proceedings, pp. 1–16 (2014)

    Google Scholar 

  17. Heer, J., Viégas, F.B., Wattenberg, M.: Voyagers and voyeurs: supporting asynchronous collaborative visualization. Commun. ACM 52, 87–97 (2009). https://doi.org/10.1145/1240624.1240781

    Article  Google Scholar 

  18. Viegas, F.B., Wattenberg, M., Van Ham, F., Kriss, J., McKeon, M.: Many Eyes: a site for visualization at internet scale. IEEE Trans. Vis. Comput. Graph. 13, 1121–1128 (2007). https://doi.org/10.1109/TVCG.2007.70577

    Article  Google Scholar 

  19. Willett, W., Heer, J., Hellerstein, J.M., Agrawala, M.: CommentSpace: structured support for collaborative visual analysis. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3131–3140 (2011). https://doi.org/10.1145/1978942.1979407

  20. Haas, D., Krishnan, S., Wang, J., Franklin, M.J., Wu, E.: Wisteria: nurturing scalable data cleaning infrastructure. In: Proceedings of the 41st International Conference on Very Large Data Bases, vol. 8, pp. 2004–2007 (2015). https://doi.org/10.14778/2824032.2824122

    Article  Google Scholar 

  21. dos Santos, F., Bazzan, A.L.C.: An ant based algorithm for task allocation in large-scale and dynamic multiagent scenarios. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation - GECCO 2009, p. 73 (2009). https://doi.org/10.1145/1569901.1569912

  22. Campbell, A., Wu, A.S.: Multi-agent role allocation: issues, approaches, and multiple perspectives. Auton. Agents Multi-Agent Syst. 22, 317–355 (2011). https://doi.org/10.1007/s10458-010-9127-4

    Article  Google Scholar 

  23. Chandrasekaran, B., Josephson, J.R., Benjamins, V.R.: Ontology of tasks and methods. Knowl. Acquis. 1–25 (1998). Spring symposium series technical report (AAAI Technical Report SS-97-06)

    Google Scholar 

  24. Stefik, M.: Planning with constraints (MOLGEN: part 1). Artif. Intell. 16, 111–139 (1981). https://doi.org/10.1016/0004-3702(81)90007-2

    Article  Google Scholar 

  25. Malone, T.W., Crowston, K., Lee, J., Pentland, B., Dellarocas, C., Wyner, G., Quimby, J., Osborn, C., Bernstein, A., Herman, G., Klein, M., O’Donnell, E.: Tools for inventing organizations: toward a handbook of organizational processes. Manag. Sci. 45, 425–443 (1999)

    Article  Google Scholar 

  26. Howison, J., Crowston, K.: Collaboration through open superposition. Mis Q. 38(1), 29–50 (2014)

    Article  Google Scholar 

  27. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design science in information systems research. MIS Q. 28, 75–105 (2004). https://doi.org/10.2307/25148625

    Article  Google Scholar 

  28. Gregor, S.: The nature of theory in information systems. MIS Q. 30, 611–642 (2006). https://doi.org/10.2307/25148742

    Article  Google Scholar 

  29. Reinecke, K., Bernstein, A.: Knowing what a user likes: a design science approach to interfaces that automatically adapt to culture. MIS Q. 37, 427–453 (2013)

    Article  Google Scholar 

  30. Peffers, K.E.N., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research methodology for information systems research. Decis. Sci. 24, 45–77 (2008). https://doi.org/10.2753/MIS0742-1222240302

    Article  Google Scholar 

  31. Redmiles, D.: Software requirements for supporting collaboration through categories (2000)

    Google Scholar 

  32. Krishnan, S., Wang, J., Franklin, M.J., Goldberg, K., Kraska, T., Milo, T., Wu, E.: SampleClean: fast and reliable analytics on dirty data. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 38(3), 59–75 (2015)

    Article  Google Scholar 

  33. Agrawal, A., Horton, J., Lacetera, N., Lyons, E.: Digitization and the contract labor market: a research agenda. NBER Working Paper, vol. 37 (2013). https://doi.org/10.3386/w19525

  34. Mascha, E.J.: Equivalence and noninferiority testing in anesthesiology research. Anesthesiology 113, 779–781 (2010). https://doi.org/10.1097/ALN.0b013e3181ec621

    Article  Google Scholar 

  35. Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research methodology for information systems research. J. Manag. Inf. Syst. 24(3), 45–77 (2007)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Swiss National Science Foundation under contract number 14341.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Feldman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Feldman, M., Anastasiu, C., Bernstein, A. (2018). Towards Collaborative Data Analysis with Diverse Crowds – A Design Science Approach. In: Chatterjee, S., Dutta, K., Sundarraj, R. (eds) Designing for a Digital and Globalized World. DESRIST 2018. Lecture Notes in Computer Science(), vol 10844. Springer, Cham. https://doi.org/10.1007/978-3-319-91800-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91800-6_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91799-3

  • Online ISBN: 978-3-319-91800-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics