Requirement Text Detection from Contract Packages to Support Project Definition Determination

  • Tuyen LeEmail author
  • Chau Le
  • H. David Jeong
  • Stephen B. Gilbert
  • Evgeny Chukharev-Hudilainen
Conference paper


Project requirements are wishes and expectations of the client toward the design, construction, and other project management processes. The project definition is typically specified in a contract package including a contract document and many other related documents such as drawings, specifications, and government codes. Project definition determination is critical to the success of a project. Due to the lack of efficient tools for requirement processing, the current practices regarding project scoping still heavily rely on a manual basis which is tedious, time-consuming, and error-prone. This study aims to fill that gap by developing an automated method for identifying requirement texts from contractual documents. The study employed Naïve Bayes to train a classification model that can be used to separate requirement statements from non-requirement statements. An experiment was conducted on a manually labeled dataset of 1191 statements. The results revealed that the developed requirement detection model achieves a promising accuracy of over 90%.


Project definition Requirement management Requirement extraction Machine learning Natural language processing Text classification Naïve bayes 


  1. 1.
    Jallow, A.K., Demian, P., Baldwin, A.N., Anumba, C.: An empirical study of the complexity of requirements management in construction projects. Eng. Constr. Archit. Manage. 21(5), 505–531 (2014)CrossRefGoogle Scholar
  2. 2.
    Jallow, A.K., Demian, P., Anumba, C.J., Baldwin, A.N.: An enterprise architecture framework for electronic requirements information management. Int. J. Inf. Manage. 37(5), 455–472 (2017)CrossRefGoogle Scholar
  3. 3.
    Kamara, J., Anumba, C., and Evbuomwan, N.: Requirements processing: a first step towards client satisfaction. Proceedings of CIB W55 & W65 Joint Triennial Symposium—Customer Satisfaction: A focus for research & practice, Cape Town, 5–10. (1999)Google Scholar
  4. 4.
    Ozkaya, I., Akin, Ö.: Tool support for computer-aided requirement traceability in architectural design: the case of designtrack. Autom. Constr. 16(5), 674–684 (2007)CrossRefGoogle Scholar
  5. 5.
    Kamara, J.M., Anumba, C.J., Evbuomwan, N.F.O.: Client requirements processing in construction: a new approach using qfd. J. Archit. Eng. 5(1), 8–15 (1999)CrossRefGoogle Scholar
  6. 6.
    Shah, U.S., Jinwala, D.C.: Resolving ambiguities in natural language software requirements: a comprehensive survey. SIGSOFT Softw. Eng. Notes 40(5), 1–7 (2015)CrossRefGoogle Scholar
  7. 7.
    Dumont, P.R., Edward Gibson Jr., G., Fish, J.R.: Scope management using project definition rating index. J. Manage. Eng. 13(5), 54–60 (1997)CrossRefGoogle Scholar
  8. 8.
    Cambria, Erik, White, Bruce: Jumping NLP curves: a review of natural language processing research [review article]. Comput. Intell. Mag. IEEE 9(2), 48–57 (2014)CrossRefGoogle Scholar
  9. 9.
    Zhao, H., Kit, C.: Integrating unsupervised and supervised word segmentation: the role of goodness measures. Information Sciences 181(1), 163–183 (2011)CrossRefGoogle Scholar
  10. 10.
    Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. Paper presented at the Proceedings of the 14th conference on Computational linguistics-Volume 4. (1992)Google Scholar
  11. 11.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: an architecture for development of robust HLT applications. Paper presented at the proceedings of the 40th annual meeting on association for computational linguistics (2002)Google Scholar
  12. 12.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. Paper presented at the Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. (2003)Google Scholar
  13. 13.
    Costa-Jussa, M.R., Farrús, M., Mariño, J.B., Fonollosa, J.A.: Study and comparison of rule-based and statistical Catalan-Spanish machine translation systems. Comp. Inf. 31(2), 245–270 (2012)zbMATHGoogle Scholar
  14. 14.
    Marcus, M.: New trends in natural language processing: statistical natural language processing. Proc. Natl. Acad. Sci. 92(22), 10052–10059 (1995)CrossRefGoogle Scholar
  15. 15.
    Salama, D.M., El-Gohary, N.M.: Semantic text classification for supporting automated compliance checking in construction. J. Comput. Civil Eng. 30(1), 04014106 (2016)CrossRefGoogle Scholar
  16. 16.
    Zhang, J., El-Gohary, N.M.: Automated information transformation for automated regulatory compliance checking in construction. J. Comput. Civil Eng. 29(4), B4015001 (2015)CrossRefGoogle Scholar
  17. 17.
    Zhang, J., El-Gohary, N.M.: Semantic nlp-based information extraction from construction regulatory documents for automated compliance checking. J. Comput. Civil Eng. 30(2), 04015014 (2016)CrossRefGoogle Scholar
  18. 18.
    Zhou, P., El-Gohary, N.: Ontology-based multilabel text classification of construction regulatory documents. J. Comput. Civil Eng. 30(4), 04015058 (2015)CrossRefGoogle Scholar
  19. 19.
    Zhou, P., El-Gohary, N.: Domain-specific hierarchical text classification for supporting automated environmental compliance checking. J. Comput. Civil Eng. 30(4), 04015057 (2016)CrossRefGoogle Scholar
  20. 20.
    Halpin, D.W., Woodhead, R.W.: Construction Management. Wiley, New York (1998)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Tuyen Le
    • 2
    Email author
  • Chau Le
    • 3
  • H. David Jeong
    • 3
  • Stephen B. Gilbert
    • 1
  • Evgeny Chukharev-Hudilainen
    • 1
  1. 1.Iowa State UniversityAmesUSA
  2. 2.Clemson UniversityClemsonUSA
  3. 3.Texas A&MCollege StationUSA

Personalised recommendations