Advertisement

Towards Collaborative Data Analysis with Diverse Crowds – A Design Science Approach

  • Michael Feldman
  • Cristian Anastasiu
  • Abraham Bernstein
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10844)

Abstract

The last years have witnessed an increasing shortage of data experts capable of analyzing the omnipresent data and producing meaningful insights. Furthermore, some data scientists mention data preprocessing to take up to 80% of the whole project time. This paper proposes a method for collaborative data analysis that involves a crowd without data analysis expertise. Orchestrated by an expert, the team of novices conducts data analysis through iterative refinement of results up to its successful completion. To evaluate the proposed method, we implemented a tool that supports collaborative data analysis for teams with mixed level of expertise. Our evaluation demonstrates that with proper guidance data analysis tasks, especially preprocessing, can be distributed and successfully accomplished by non-experts. Using the design science approach, iterative development also revealed some important features for the collaboration tool, such as support for dynamic development, code deliberation, and project journal. As such we pave the way for building tools that can leverage the crowd to address the shortage of data analysts.

Keywords

Collaborative data analysis Crowdsourcing Design science 

Notes

Acknowledgments

This work was supported by the Swiss National Science Foundation under contract number 14341.

References

  1. 1.
    Davenport, T.H., Patil, D.J.: Data_Scientist-the_Sexiest_Job_of_the_21St_Century.Pdf (2012)Google Scholar
  2. 2.
    Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: Human Factors in Computing Systems, pp. 3363–3372. ACM (2011).  https://doi.org/10.1145/1978942.1979444
  3. 3.
    Bernstein, A., Klein, M., Malone, T.W.: Programming the global brain. Commun. ACM 55, 41 (2012).  https://doi.org/10.1145/2160718.2160731CrossRefGoogle Scholar
  4. 4.
    Sere, F.C., Swigger, K., Alpaslan, F.N., Brazile, R., Dafoulas, G., Lopez, V.: Online collaboration: collaborative behavior patterns and factors affecting globally distributed team performance. Comput. Hum. Behav. 27, 490–503 (2011).  https://doi.org/10.1016/j.chb.2010.09.017CrossRefGoogle Scholar
  5. 5.
    Van Noorden, R.: Online collaboration: scientists and the social network. Nature 512, 126–129 (2014).  https://doi.org/10.1038/512126aCrossRefGoogle Scholar
  6. 6.
    MacDonald, J.: Assessing online collaborative learning: Process and product. Comput. Educ. 40, 377–391 (2003).  https://doi.org/10.1016/S0360-1315(02)00168-9CrossRefGoogle Scholar
  7. 7.
    Yadav, M.S., Pavlou, P.A.: Marketing in computer-mediated environments: research synthesis and new directions. J. Mark. 78, 20–40 (2014).  https://doi.org/10.1509/jm.12.0020CrossRefGoogle Scholar
  8. 8.
    Tseng, H., Wang, C.-H., Ku, H.-Y., Sun, L.: Key factors in online collaboration and their relationship to teamwork satisfaction. Q. Rev. Distance Educ. 10, 195–206 (2009)Google Scholar
  9. 9.
    Salehi, N., McCabe, A., Valentine, M., Bernstein, M.S.: Huddler: convening stable and familiar crowd teams despite unpredictable availability. In: Proceedings of the 20th ACM Conference on Computer Supported Cooperative Work & Social Computing (2016)Google Scholar
  10. 10.
    Yukl, G.: Leadership in organizations. In: Personnel Psychology, 7th edn, p. 542 (2001).  https://doi.org/10.1016/1048-9843(95)90027-6
  11. 11.
    Kulkarni, A., Can, M., Hartmann, B.: Collaboratively crowdsourcing workflows with turkomatic. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work - CSCW 2012, p. 1003 (2012).  https://doi.org/10.1145/2145204.2145354
  12. 12.
    Kittur, A., Smus, B., Kraut, R.: CrowdForge Crowdsourcing complex work. In: Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA 2011. p. 1801 (2011).  https://doi.org/10.1145/1979742.1979902
  13. 13.
    Kittur, A., Khamkar, S., André, P., Kraut, R.E.: CrowdWeaver: visually managing complex crowd work. In: Scenario, pp. 1033–1036 (2012).  https://doi.org/10.1145/2145204.2145357
  14. 14.
    Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K.: Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, pp. 313–322 (2010).  https://doi.org/10.1145/1866029.1866078
  15. 15.
    Carpenter, J.: May the best analyst win. Science (New York) 331, 698–699 (2011).  https://doi.org/10.1126/science.331.6018.698CrossRefGoogle Scholar
  16. 16.
    Dissanayake, I., Zhang, J., Gu, B.: Virtual team performance in crowdsourcing contests: a social network perspective. In: ICIS 2015 Proceedings, pp. 1–16 (2014)Google Scholar
  17. 17.
    Heer, J., Viégas, F.B., Wattenberg, M.: Voyagers and voyeurs: supporting asynchronous collaborative visualization. Commun. ACM 52, 87–97 (2009).  https://doi.org/10.1145/1240624.1240781CrossRefGoogle Scholar
  18. 18.
    Viegas, F.B., Wattenberg, M., Van Ham, F., Kriss, J., McKeon, M.: Many Eyes: a site for visualization at internet scale. IEEE Trans. Vis. Comput. Graph. 13, 1121–1128 (2007).  https://doi.org/10.1109/TVCG.2007.70577CrossRefGoogle Scholar
  19. 19.
    Willett, W., Heer, J., Hellerstein, J.M., Agrawala, M.: CommentSpace: structured support for collaborative visual analysis. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3131–3140 (2011).  https://doi.org/10.1145/1978942.1979407
  20. 20.
    Haas, D., Krishnan, S., Wang, J., Franklin, M.J., Wu, E.: Wisteria: nurturing scalable data cleaning infrastructure. In: Proceedings of the 41st International Conference on Very Large Data Bases, vol. 8, pp. 2004–2007 (2015).  https://doi.org/10.14778/2824032.2824122CrossRefGoogle Scholar
  21. 21.
    dos Santos, F., Bazzan, A.L.C.: An ant based algorithm for task allocation in large-scale and dynamic multiagent scenarios. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation - GECCO 2009, p. 73 (2009).  https://doi.org/10.1145/1569901.1569912
  22. 22.
    Campbell, A., Wu, A.S.: Multi-agent role allocation: issues, approaches, and multiple perspectives. Auton. Agents Multi-Agent Syst. 22, 317–355 (2011).  https://doi.org/10.1007/s10458-010-9127-4CrossRefGoogle Scholar
  23. 23.
    Chandrasekaran, B., Josephson, J.R., Benjamins, V.R.: Ontology of tasks and methods. Knowl. Acquis. 1–25 (1998). Spring symposium series technical report (AAAI Technical Report SS-97-06) Google Scholar
  24. 24.
    Stefik, M.: Planning with constraints (MOLGEN: part 1). Artif. Intell. 16, 111–139 (1981).  https://doi.org/10.1016/0004-3702(81)90007-2CrossRefGoogle Scholar
  25. 25.
    Malone, T.W., Crowston, K., Lee, J., Pentland, B., Dellarocas, C., Wyner, G., Quimby, J., Osborn, C., Bernstein, A., Herman, G., Klein, M., O’Donnell, E.: Tools for inventing organizations: toward a handbook of organizational processes. Manag. Sci. 45, 425–443 (1999)CrossRefGoogle Scholar
  26. 26.
    Howison, J., Crowston, K.: Collaboration through open superposition. Mis Q. 38(1), 29–50 (2014)CrossRefGoogle Scholar
  27. 27.
    Hevner, A.R., March, S.T., Park, J., Ram, S.: Design science in information systems research. MIS Q. 28, 75–105 (2004).  https://doi.org/10.2307/25148625CrossRefGoogle Scholar
  28. 28.
    Gregor, S.: The nature of theory in information systems. MIS Q. 30, 611–642 (2006).  https://doi.org/10.2307/25148742CrossRefGoogle Scholar
  29. 29.
    Reinecke, K., Bernstein, A.: Knowing what a user likes: a design science approach to interfaces that automatically adapt to culture. MIS Q. 37, 427–453 (2013)CrossRefGoogle Scholar
  30. 30.
    Peffers, K.E.N., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research methodology for information systems research. Decis. Sci. 24, 45–77 (2008).  https://doi.org/10.2753/MIS0742-1222240302CrossRefGoogle Scholar
  31. 31.
    Redmiles, D.: Software requirements for supporting collaboration through categories (2000)Google Scholar
  32. 32.
    Krishnan, S., Wang, J., Franklin, M.J., Goldberg, K., Kraska, T., Milo, T., Wu, E.: SampleClean: fast and reliable analytics on dirty data. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 38(3), 59–75 (2015)CrossRefGoogle Scholar
  33. 33.
    Agrawal, A., Horton, J., Lacetera, N., Lyons, E.: Digitization and the contract labor market: a research agenda. NBER Working Paper, vol. 37 (2013).  https://doi.org/10.3386/w19525
  34. 34.
    Mascha, E.J.: Equivalence and noninferiority testing in anesthesiology research. Anesthesiology 113, 779–781 (2010).  https://doi.org/10.1097/ALN.0b013e3181ec621CrossRefGoogle Scholar
  35. 35.
    Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research methodology for information systems research. J. Manag. Inf. Syst. 24(3), 45–77 (2007) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Michael Feldman
    • 1
  • Cristian Anastasiu
    • 1
  • Abraham Bernstein
    • 1
  1. 1.Department of InformaticsUniversity of ZurichZurichSwitzerland

Personalised recommendations