Abstract
Streaming APIs allow for big data processing of native data structures by providing MapReduce-like operations over these structures. However, unlike traditional big data systems, these data structures typically reside in shared memory accessed by multiple cores. Although popular, this emerging hybrid paradigm opens the door to possibly detrimental behavior, such as thread contention and bugs related to non-execution and non-determinism. This study explores the use and misuse of a popular streaming API, namely, Java 8 Streams. The focus is on how developers decide whether or not to run these operations sequentially or in parallel and bugs both specific and tangential to this paradigm. Our study involved analyzing 34 Java projects and 5:53 million lines of code, along with 719 manually examined code patches. Various automated, including interprocedural static analysis, and manual methodologies were employed. The results indicate that streams are pervasive, parallelization is not widely used, and performance is a crosscutting concern that accounted for the majority of fixes. We also present coincidences that both confirm and contradict the results of related studies. The study advances our understanding of streams, as well as benefits practitioners, programming language and API designers, tool developers, and educators alike.
Chapter PDF
Similar content being viewed by others
Keywords
References
Ahmed, S., and Bagherzadeh, M.: What Do Concurrency Developers Ask About?: A Large-scale Study Using Stack Overflow. In: International Symposium on Empirical Software Engineering and Measurement, 30:1–30:10 (2018). https://doi.org/10.1145/3239235.3239524
AOL: AOL/cyclops: An advanced, but easy to use, platform for writing functional applications in Java 8. (2019). http://git.io/fjxzF (visited on 08/29/2019)
Bagherzadeh, M., and Khatchadourian, R.: Going Big: A Large-scale Study on What Big Data Developers Ask. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, pp. 432–442. ACM, Tallinn, Estonia (2019). https://doi.org/10.1145/3338906.3338939
Bagherzadeh, M., and Rajan, H.: Order Types: Static Reasoning About Message Races in Asynchronous Message Passing Concurrency. In: International Workshop on Programming Based on Actors, Agents, and Decentralized Control, pp. 21–30 (2017). https://doi.org/10.1145/3141834.3141837
Biboudis, A., Palladinos, N., Fourtounis, G., and Smaragdakis, Y.: Streams a la carte: Extensible Pipelines with Object Algebras. In: European Conference on Object-Oriented Programming, pp. 591–613 (2015). https://doi.org/10.4230/LIPIcs.ECOOP.2015.591
Bloch, J.: Effective Java. Prentice Hall, Upper Saddle River, NJ, USA (2018)
Bordet, S.: Pull Request #2837 \(\bullet \) eclipse/jetty.project, Webtide. (2018). http://git.io/JeBAF (visited on 10/20/2019)
Casalnuovo, C., Devanbu, P., Oliveira, A., Filkov, V., and Ray, B.: Assert Use in GitHub Projects. In: International Conference on Software Engineering. ICSE ’15, pp. 755–766. IEEE Press, Florence, Italy (2015). http://dl.acm.org/citation.cfm?id=2818754.2818846
Casalnuovo, C., Suchak, Y., Ray, B., and Rubio-González, C.: GitcProc: A Tool for Processing and Classifying GitHub Commits. In: International Symposium on Software Testing and Analysis. ISSTA 2017, pp. 396–399. ACM, Santa Barbara, CA, USA (2017). https://doi.org/10.1145/3092703.3098230
Dean, J., and Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Dyer, R., Rajan, H., Nguyen, H.A., and Nguyen, T.N.: Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features. In: International Conference on Software Engineering. ICSE 2014, pp. 779–790. ACM, Hyderabad, India (2014)
Eclipse Foundation: Eclipse Java development tools (JDT), Eclipse Foundation. (2019). http://eclipse.org/jdt (visited on 10/19/2019)
Engler, D., Chen, D.Y., Hallem, S., Chou, A., and Chelf, B.: Bugs As Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In: Symposium on Operating Systems Principles. SOSP ’01, pp. 57–72. ACM, Banff, Alberta, Canada (2001). https://doi.org/10.1145/502034.502041
EPFL: Collections–Mutable and Immutable Collections–Scala Documentation, (2017). http://scala-lang.org/api/2.12.3/scala/collection/index.html (visited on 08/24/2018)
Erdfelt, J.: Pull Request #2837 \(\bullet \) eclipse/jetty.project, Eclipse Foundation. (2018). http://git.io/JeBAM (visited on 10/20/2019)
Fink, S.J., Yahav, E., Dor, N., Ramalingam, G., and Geay, E.: Effective Typestate Verification in the Presence of Aliasing. ACM Transactions on Software Engineering and Methodology 17(2), 91–934 (2008). https://doi.org/10.1145/1348250.1348255
Gharbi, S., Mkaouer, M.W., Jenhani, I., and Messaoud, M.B.: On the Classification of Software Change Messages Using Multi-label Active Learning. In: Symposium on Applied Computing. SAC ’19, pp. 1760–1767. ACM, Limassol, Cyprus (2019). https://doi.org/10.1145/3297280.3297452
Jin, H., Qiao, K., Sun, X.-H., and Li, Y.: Performance Under Failures of MapReduce Applications. In: International Symposium on Cluster, Cloud and Grid Computing. CCGRID ’11, pp. 608–609. IEEE Computer Society, Washington, DC, USA (2011). https://doi.org/10.1109/ccgrid.2011.84
Kavulya, S., Tan, J., Gandhi, R., and Narasimhan,P.: An Analysis of Traces from a Production MapReduce Cluster. In: International Conference on Cluster, Cloud and Grid Computing. CCGrid 2010, pp. 94–103. IEEE, Melbourne, Australia (2010). https://doi.org/10.1109/CCGRID.2010.112
Ketkar, A., Mesbah, A., Mazinanian, D., Dig, D., and Aftandilian, E.: Type Migration in Ultra-large-scale Codebases. In: International Conference on Software Engineering. ICSE ’19, pp. 1142–1153. IEEE Press, Montreal, Quebec, Canada (2019). https://doi.org/10.1109/ICSE.2019.00117
Khatchadourian, R., and Masuhara, H.: Automated Refactoring of Legacy Java Software to Default Methods. In: International Conference on Software Engineering, pp. 82–93 (2017). https://doi.org/10.1109/ICSE.2017.16
Khatchadourian, R., and Masuhara, H.: Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods. In: International Conference on the Art, Science, and Engineering of Programming, 6:1–6:30 (2018). https://doi.org/10.22152/programming-journal.org/2018/2/6
Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ahmed, S.: A Tool for Optimizing Java 8 Stream Software via Automated Refactoring. In: International Working Conference on Source Code Analysis and Manipulation, pp. 34–39 (2018). https://doi.org/10.1109/SCAM.2018.00011
Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ahmed, S.: Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams. In: International Conference on Software Engineering. ICSE ’19, pp. 619–630. IEEE Press (2019). https://doi.org/10.1109/ICSE.2019.00072
Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ray, B.: An Empirical Study on the Use and Misuse of Java 8 Streams, (2020). https://doi.org/10.5281/zenodo.3677449. Feb. 2020.
Kochhar, P.S., and Lo, D.: Revisiting Assert Use in GitHub Projects. In: International Conference on Evaluation and Assessment in Software Engineering. EASE’17, pp. 298–307. ACM, Karlskrona, Sweden (2017). https://doi.org/10.1145/3084226.3084259
Lau, J.: Future of Java 8 Language Feature Support on Android. Android Developers Blog (2017). http://android-developers.googleblog.com/2017/03/future-of-java-8-language-feature.html (visited on 08/24/2018)
Lu, S., Park, S., Seo, E., and Zhou, Y.: Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics. In: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 329–339. ACM (2008). https://doi.org/10.1145/1346281.1346323
Lucas, W., Bonifácio, R., Canedo, E.D., MarcÃlio, D., and Lima, F.: Does the Introduction of Lambda Expressions Improve the Comprehension of Java Programs? In: Brazilian Symposium on Software Engineering. SBES 2019, pp. 187–196. ACM, Salvador, Brazil (2019). https://doi.org/10.1145/3350768.3350791
Luontola, E.: Pull Request #140 \(\bullet \) orfjackal/retrolambda, Nitor Creations. (2018). http://git.io/JeBAQ (visited on 10/20/2019)
Marin, M., Moonen, L., and Deursen, A. van: An Integrated Crosscutting Concern Migration Strategy and its Application to JHotDraw. In: International Working Conference on Source Code Analysis and Manipulation (2007)
Mazinanian, D., Ketkar, A., Tsantalis, N., and Dig, D.: Understanding the Use of Lambda Expressions in Java. Proc. ACM Program. Lang. 1(OOPSLA), 85:1–85:31 (2017). https://doi.org/10.1145/3133909
Microsoft: LINQ: .NET Language Integrated Query, (2018). http://msdn.microsoft.com/en-us/library/bb308959.aspx (visited on 08/24/2018)
Moncsek, A.: allow OnShow when Perspective is initialized, fixed issues with OnShow/OnHide in perspective \(\bullet \) JacpFX/JacpFX@f2d92f7, JacpFX. (2015). http://git.io/Je0X8 (visited on 10/24/2019)
Naftalin, M.: Mastering Lambdas: Java Programming in a Multicore World. McGraw-Hill (2014)
Nielebock, S., Heumüller, R., and Ortmeier, F.: Programmers Do Not Favor Lambda Expressions for Concurrent Object-oriented Code. Empirical Softw. Engg. 24(1), 103–138 (2019). https://doi.org/10.1007/s10664-018-9622-9
Oracle: Collectors (Java Platform SE 10 & JDK 10)–groupingByConcurrent, (2018). http://docs.oracle.com/javase/10/docs/api/java/util/stream/Collectors.html#groupingByConcurrent(java.util.function.Function) (visited on 08/29/2019)
Oracle: HashSet (Java SE 9) & JDK 9, (2017). http://docs.oracle.com/javase/9/docs/api/java/util/HashSet.html (visited on 04/07/2018)
Oracle: java.util.stream (Java SE 9 & JDK 9), (2017). http://docs.oracle.com/javase/9/docs/api/java/util/stream/package-summary.html (visited on 02/22/2020)
Oracle: java.util.stream (Java SE 9 & JDK 9)–Parallelism, (2017). http://docs.oracle.com/javase/9/docs/api/java/util/stream/package-summary.html#Parallelism (visited on 02/22/2020)
Oracle: Stream (Java Platform SE 10 & JDK 10)–forEach, (2018). http://docs.oracle.com/javase/10/docs/api/java/util/stream/Stream.html#forEach(java.util.function.Consumer) (visited on 08/29/2019)
Oracle: Thread Interference, (2017). http://docs.oracle.com/javase/tutorial/ essential/concurrency/interfere.html (visited on 04/16/2018)
Parnin, C., Bird, C., and Murphy-Hill, E.: Adoption and Use of Java Generics. Empirical Softw. Engg. 18(6), 1047–1089 (2013). https://doi.org/10.1007/s10664-012-9236-6
Refsnes Data: JavaScript Array map() Method, (2015). http://w3schools.com/jsref/jsrefmap.asp (visited on 02/22/2020)
Rutledge, P.: Pull Request #1 \(\bullet \) RutledgePaulV/monads, Vodori. (2018). http://git.io/JeBAZ (visited on 10/20/2019)
Sangle, S., and Muvva, S.: On the Use of Lambda Expressions in 760 Open Source Python Projects. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, pp. 1232–1234. ACM, Tallinn, Estonia (2019). https://doi.org/10.1145/3338906.3342499
Shilkov, M.: Introducing Stream Processing in F#, (2016). http://mikhail.io/2016/11/introducing-stream-processing-in-fsharp (visited on 07/18/2018)
Stack Overflow: Newest ‘java-stream’ Questions, (2018). http://stackoverflow.com/questions/tagged/java-stream (visited on 03/06/2018)
Strom, R.E., and Yemini, S.: Typestate: A programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering SE-12(1), 157–171 (1986). https://doi.org/10.1109/tse.1986.6312929
Tian, Y., and Ray, B.: Automatically Diagnosing and Repairing Error Handling Bugs in C. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2017, pp. 752–762. ACM, Paderborn, Germany (2017). https://doi.org/10.1145/3106237.3106300
Uesbeck, P.M., Stefik, A., Hanenberg, S., Pedersen, J., and Daleiden, P.: An empirical study on the impact of C++ lambdas and programmer experience. In: International Conference on Software Engineering. ICSE ’16, pp. 760–771. ACM, Austin, Texas (2016). https://doi.org/10.1145/2884781.2884849
WALA Team: T.J. Watson Libraries for Analysis, (2015). http://wala.sf.net (visited on 01/18/2017)
Warburton, R.: Java 8 Lambdas: Pragmatic Functional Programming (2014)
Weiss, T.: Java 8: Behind The Glitz and Glamour of The New Parallelism APIs. OverOps Blog (2014). http://blog.overops.com/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour (visited on 10/20/2019)
Wilkins, G.: Issue #3681 \(\bullet \) eclipse/jetty.project@70311fe, Webtide, LLC. (2019)
Wilkins, G.: Jetty 9.4.x 3681 http fields optimize by gregw \(\bullet \) Pull Request #3682 \(\bullet \) eclipse/jetty.project, Webtide, LLC. (2019). http://git.io/JeBAq (visited on 09/18/2019)
Wilkins, G.: Jetty 9.4.x 3681 http fields optimize by gregw \(\bullet \) Pull Request #3682 \(\bullet \) eclipse/jetty.project. Comment, Webtide, LLC. (2019). http://git.io/Je0MS (visited on 10/24/2019)
Xiao, T., Zhang, J., Zhou, H., Guo, Z., McDirmid, S., Lin, W., Chen, W., and Zhou, L.: Nondeterminism in MapReduce Considered Harmful? An Empirical Study on Non-commutative Aggregators in MapReduce Programs. In: ICSE Companion, pp. 44–53 (2014). https://doi.org/10.1145/2591062.2591177
Zhitnitsky, A.: How Java 8 Lambdas and Streams Can Make Your Code 5 Times Slower. OverOps Blog (2015). http://blog.overops.com/benchmark-how-java-8-lambdas-and-streams-can-make-your-code-5-times-slower (visited on 10/20/2019)
Zhou, H., Lou, J.-G., Zhang, H., Lin, H., Lin, H., and Qin, T.: An Empirical Study on Quality Issues of Production Big Data Platform. In: International Conference on Software Engineering. ICSE 2015, pp. 17–26. ACM, Florence, Italy (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2020 The Author(s)
About this paper
Cite this paper
Khatchadourian, R., Tang, Y., Bagherzadeh, M., Ray, B. (2020). An Empirical Study on the Use and Misuse of Java 8 Streams. In: Wehrheim, H., Cabot, J. (eds) Fundamental Approaches to Software Engineering. FASE 2020. Lecture Notes in Computer Science(), vol 12076. Springer, Cham. https://doi.org/10.1007/978-3-030-45234-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-45234-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45233-9
Online ISBN: 978-3-030-45234-6
eBook Packages: Computer ScienceComputer Science (R0)