Skip to main content

Does BERT Understand Code? – An Exploratory Study on the Detection of Architectural Tactics in Code

  • Conference paper
  • First Online:
Software Architecture (ECSA 2020)

Abstract

Quality-driven design decisions are often addressed by using architectural tactics that are re-usable solution options for certain quality concerns. Creating traceability links for these tactics is useful but costly. Automating the creation of these links can help reduce costs but is challenging as simple structural analyses only yield limited results. Transfer-learning approaches using language models like BERT are a recent trend in the field of natural language processing. These approaches yield state-of-the-art results for tasks like text classification. In this paper, we experiment with treating detection of architectural tactics in code as a text classification problem. We present an approach to detect architectural tactics in code by fine-tuning BERT. A 10-fold cross-validation shows promising results with an average \(F_1\)-Score of 90%, which is on a par with state-of-the-art approaches. We additionally apply our approach on a case study, where the results of our approach show promising potential but fall behind the state-of-the-art. Therefore, we discuss our approach and look at potential reasons as well as downsides and future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adhikari, A., Ram, A., Tang, R., Lin, J.: Docbert: BERT for document classification. arXiv (2019). http://arxiv.org/abs/1904.08398

  2. Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: generating sequences from structured representations of code. In: ICLR (2019)

    Google Scholar 

  3. Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE TSE 28(10), 970–983 (2002). https://doi.org/10.1109/TSE.2002.1041053

    Article  Google Scholar 

  4. Antoniol, G., Casazza, G., Di Penta, M., Fiutem, R.: Object-oriented design patterns recovery. J. Syst. Softw. 59(2), 181–196 (2001)

    Article  Google Scholar 

  5. Babar, M.A., Gorton, I.: A tool for managing software architecture knowledge. In: 2nd SHARK/ADI 2007 ICSE Workshops 2007, pp. 11–11. IEEE (2007)

    Google Scholar 

  6. Bass, L., Clements, P., Kazman, R.: Software Architecture in Practice. Addison-Wesley Professional (2003)

    Google Scholar 

  7. Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document transformer. arXiv (2020). http://arxiv.org/abs/1904.08398

  8. Capilla, R., Nava, F., Pérez, S., Dueñas, J.C.: A web-based tool for managing architectural design decisions. ACM SIGSOFT 31(5), 4 (2006)

    Article  Google Scholar 

  9. Chihada, A., Jalili, S., Hasheminejad, S.M.H., Zangooei, M.H.: Source code and design conformance, design pattern detection from source code by classification approach. Appl. Soft Comput. 26, 357–367 (2015)

    Article  Google Scholar 

  10. Cleland-Huang, J., Berenbach, B., Clark, S., Settimi, R., Romanova, E.: Best practices for automated traceability. Computer 40(6), 27–35 (2007). https://doi.org/10.1109/MC.2007.195

    Article  Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of Deep Bidirectional transformers for language understanding. In: NAACL-HLT (2019). https://doi.org/10.18653/v1/N19-1423

  12. Ducasse, S., Pollet, D.: Software architecture reconstruction: a process-oriented taxonomy. IEEE TSE 35(4), 573–591 (2009)

    Google Scholar 

  13. Egyed, A., Biffl, S., Heindl, M., Grünbacher, P.: Determining the cost-quality trade-off for automated software traceability. In: 20th IEEE/ACM ASE, pp. 360–363. ACM, New York (2005). https://doi.org/10.1145/1101908.1101970

  14. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Elements of reusable object-oriented software. arXiv (1995)

    Google Scholar 

  15. Hey, T., Keim, J., Tichy, W.F., Koziolek, A.: NoRBERT: Transfer learning for requirements classification. In: 2020 IEEE 28th RE. IEEE (2020)

    Google Scholar 

  16. Hoorn, J.F., Farenhorst, R., Lago, P., Van Vliet, H.: The lonesome architect. J. Syst. Softw. 84(9), 1424–1435 (2011)

    Article  Google Scholar 

  17. Howard, J., Ruder, S.: Fine-tuned language models for text classification. arXiv (2018). http://arxiv.org/abs/1801.06146

  18. Keim, J., Kaplan, A., Koziolek, A., Mirakhorli, M.: Gram21/BERT4DAT, July 2020. https://doi.org/10.5281/zenodo.3925165

  19. Keim, J., Kaplan, A., Koziolek, A., Mirakhorli, M.: Using BERT for the detection of architectural tactics in code. Technical report 2, Karlsruhe Institute of Technology (KIT), Karlsruhe (2020). https://doi.org/10.5445/IR/1000121031

  20. Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv (2016). http://arxiv.org/abs/1609.04836

  21. Li, J., Wang, Y., Lyu, M.R., King, I.: Code completion with neural attention and pointer networks. 27th IJCAI, July 2018. https://doi.org/10.24963/ijcai.2018/578

  22. Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam. arXiv (2017). http://arxiv.org/abs/1711.05101

  23. Mirakhorli, M., Cleland-Huang, J.: Detecting, tracing, and monitoring architectural tactics in code. IEEE Trans. Softw. Eng. 42(3), 205–220 (2016). https://doi.org/10.1109/TSE.2015.2479217

    Article  Google Scholar 

  24. Mirakhorli, M., Shin, Y., Cleland-Huang, J., Cinar, M.: A tactic-centric approach for automating traceability of quality concerns. In: 34th ICSE, pp. 639–649, June 2012. https://doi.org/10.1109/ICSE.2012.6227153

  25. Mirakhorli, M., Cleland-Huang, J.: Tracing architectural concerns in high assurance systems. In: 33rd ICSE, pp. 908–911. ACM (2011)

    Google Scholar 

  26. Mirakhorli, M., et al.: Archie. https://github.com/SoftwareDesignLab/Archie

  27. Niven, T., Kao, H.Y.: Probing neural network comprehension of natural language arguments. In: 57th ACL (2019). https://doi.org/10.18653/v1/P19-1459

  28. Prechelt, L.: Why we need an explicit forum for negative results. J. Univ. Comput. Sci. 3(9), 1074–1083 (1997)

    MathSciNet  Google Scholar 

  29. Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: 35th ACM SIGPLAN PLDI, pp. 419–428. New York, NY, USA (2014). https://doi.org/10.1145/2594291.2594321

  30. Sharma, T., Efstathiou, V., Louridas, P., Spinellis, D.: On the feasibility of transfer-learning code smells using deep learning. arXiv (2019). http://arxiv.org/abs/1904.03031

  31. Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune bert for text classification? arXiv (2019). http://arxiv.org/abs/1905.05583

  32. Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. In: 57th ACL, pp. 4593–4601. ACL, Florence, Italy, July 2019. https://doi.org/10.18653/v1/P19-1452

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Keim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Keim, J., Kaplan, A., Koziolek, A., Mirakhorli, M. (2020). Does BERT Understand Code? – An Exploratory Study on the Detection of Architectural Tactics in Code. In: Jansen, A., Malavolta, I., Muccini, H., Ozkaya, I., Zimmermann, O. (eds) Software Architecture. ECSA 2020. Lecture Notes in Computer Science(), vol 12292. Springer, Cham. https://doi.org/10.1007/978-3-030-58923-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58923-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58922-6

  • Online ISBN: 978-3-030-58923-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics