Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements

Mishra, Siba; Sharma, Arpit

doi:10.1007/978-3-030-44429-7_15

Siba Mishra¹² &
Arpit Sharma¹²

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12045))

Included in the following conference series:

International Working Conference on Requirements Engineering: Foundation for Software Quality

1963 Accesses
2 Citations
1 Altmetric

Abstract

[Context and Motivation] Requirements glossary defines specialized and technical terms used in a requirements document. A requirements glossary helps in improving the quality and understandability of requirements documents. [Question/Problem] Manual extraction of glossary terms from a large body of requirements is an expensive and time-consuming task. This paper proposes a fundamentally new approach for automated extraction of glossary terms from large-sized requirements documents. [Principal Ideas/Result] Firstly, our technique extracts the candidate glossary terms by applying text chunking. Next, we apply a novel word embeddings based semantic filter for reducing the number of candidate glossary terms. Since word embeddings are very effective in identifying terms that are semantically very similar, this filter ensures that only domain-specific terms are present in the final set of glossary terms. We create a domain-specific reference corpus for home automation by Wikipedia crawling and use it for computing the semantic similarity scores of candidate glossary terms. We apply our technique to a large-sized requirements document, i.e., a CrowdRE dataset with around 3000 crowd-generated requirements for smart home applications. Semantic filtering reduces the number of glossary terms by 92.7%. To evaluate the quality of our extracted glossary terms we manually create the ground truth data from CrowdRE dataset and use it for computing precision and recall. Additionally, we also compute the requirements coverage of these extracted glossary terms. [Contributions] Our detailed experiments show that word embeddings based semantic filtering can be very useful for extracting glossary terms from a large body of requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aguilera, C., Berry, D.M.: The use of a repeated phrase finder in requirements extraction. J. Syst. Softw. 13(3), 209–230 (1990)
Article Google Scholar
Arora, C., Sabetzadeh, M., Briand, L.C., Zimmer, F.: Automated extraction and clustering of requirements glossary terms. IEEE Trans. Softw. Eng. 43(10), 918–945 (2017)
Article Google Scholar
Dwarakanath, A., Ramnani, R.R., Sengupta, S.: Automatic extraction of glossary terms from natural language requirements. In: 21st IEEE International Requirements Engineering Conference (RE), pp. 314–319, July 2013
Google Scholar
Ferrari, A., Donati, B., Gnesi, S.: Detecting domain-specific ambiguities: an NLP approach based on Wikipedia crawling and word embeddings. In: 25th IEEE International Requirements Engineering Conference Workshops (REW), pp. 393–399, September 2017
Google Scholar
Ferrari, A., Esuli, A., Gnesi, S.: Identification of cross-domain ambiguity with language models. In: 5th International Workshop on Artificial Intelligence for Requirements Engineering (AIRE), pp. 31–38, August 2018
Google Scholar
Gemkow, T., Conzelmann, M., Hartig, K., Vogelsang, A.: Automatic glossary term extraction from large-scale requirements specifications. In: 26th IEEE International Requirements Engineering Conference, pp. 412–417. IEEE Computer Society (2018)
Google Scholar
Goldin, L., Berry, D.M.: AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom. Softw. Eng. 4(4), 375–412 (1997). https://doi.org/10.1023/A:1008617922496
Article Google Scholar
Hull, M.E.C., Jackson, K., Dick, J.: Requirements Engineering, 2nd edn. Springer, Heidelberg (2005). https://doi.org/10.1007/b138335
Book MATH Google Scholar
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)
Article Google Scholar
Kof, L.: Natural language processing for requirements engineering: applicability to large requirements documents. In: Workshop on Automated Software Engineering (2004)
Google Scholar
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS 2014, pp. 2177–2185 (2014)
Google Scholar
Ménard, P.A., Ratté, S.: Concept extraction from business documents for software engineering projects. Autom. Softw. Eng. 23(4), 649–686 (2015). https://doi.org/10.1007/s10515-015-0184-4
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. NIPS 2013, pp. 3111–3119 (2013)
Google Scholar
Mishra, S., Sharma, A.: On the use of word embeddings for identifying domain specific ambiguities in requirements. In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), pp. 234–240 (2019)
Google Scholar
Murukannaiah, P.K., Ajmeri, N., Singh, M.P.: Toward automating crowd RE. In: 25th IEEE International Requirements Engineering Conference (RE), pp. 512–515, September 2017
Google Scholar
Murukannaiah, P.K., Ajmeri, N., Singh, M.P.: Acquiring creative requirements from the crowd: understanding the influences of individual personality and creative potential in crowd RE. In: 24th IEEE International Requirements Engineering Conference (RE), pp. 176–185, September 2016
Google Scholar
Park, Y., Byrd, R.J., Boguraev, B.K.: Automatic glossary extraction: beyond terminology identification. In: COLING 2002: The 19th International Conference on Computational Linguistics (2002). https://www.aclweb.org/anthology/C02-1142
Pohl, K.: Requirements Engineering - Fundamentals, Principles, and Techniques, 1st edn. Springer, Heidelberg (2010)
Google Scholar
Pohl, K., Böckle, G., van der Linden, F.J.: Software Product Line Engineering: Foundations. Principles and Techniques. Springer, Heidelberg (2005). https://doi.org/10.1007/3-540-28901-1
Book MATH Google Scholar
Popescu, D., Rugaber, S., Medvidovic, N., Berry, D.M.: Reducing ambiguities in requirements specifications via automatically created object-oriented models. In: Paech, B., Martell, C. (eds.) Monterey Workshop 2007. LNCS, vol. 5320, pp. 103–124. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89778-1_10
Chapter Google Scholar
Romero, F.P., Olivas, J.A., Genero, M., Piattini, M.: Automatic extraction of the main terminology used in empirical software engineering through text mining techniques. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. ESEM 2008, pp. 357–358 (2008)
Google Scholar
Zou, X., Settimi, R., Cleland-Huang, J.: Improving automated requirements trace retrieval: a study of term-based enhancement methods. Empir. Softw. Eng. 15(2), 119–146 (2010). https://doi.org/10.1007/s10664-009-9114-z
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Indian Institute of Science Education and Research, Bhopal, Madhya Pradesh, India
Siba Mishra & Arpit Sharma

Authors

Siba Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Arpit Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arpit Sharma .

Editor information

Editors and Affiliations

Computer Science, Middlesex College, Western University, London, ON, Canada
Nazim Madhavji
School of Computer Science, University College Dublin, Dublin, Ireland
Liliana Pasquale
Area della Ricerca CNR di Pisa, Istituto di Scienza e Tecnologie dell'Informazione “Alessandro Faedo”, Pisa, Italy
Alessio Ferrari
Area della Ricerca CNR di Pisa, Istituto di Scienza e Tecnologie dell'Informazione “Alessandro Faedo”, Pisa, Italy
Stefania Gnesi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mishra, S., Sharma, A. (2020). Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements. In: Madhavji, N., Pasquale, L., Ferrari, A., Gnesi, S. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2020. Lecture Notes in Computer Science(), vol 12045. Springer, Cham. https://doi.org/10.1007/978-3-030-44429-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-44429-7_15
Published: 18 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44428-0
Online ISBN: 978-3-030-44429-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics