Investigating the Construct Validity of Performance Comments: Creation of the Great Eight Narrative Dictionary

Abstract

Performance narratives are qualitative text descriptions of an employee’s work performance. Despite containing rich information that can be leveraged by practitioners and researchers, few efforts have systematically examined performance narratives. This study investigated whether performance narratives can automatically and reliably be scored into meaningful performance dimensions. Using the Great Eight as a conceptual framework, a custom dictionary was developed and comments were scored via automated text mining. This dictionary, labeled the Great Eight Narrative Dictionary, was then validated against a set of convergent measures to establish construct validity evidence for the derived narrative scores. Inter-rater agreement in linking word phrases to performance dimensions was high, and the derived performance dimensions had acceptable internal consistency. Narrative scores also displayed evidence of construct validity, with an expected pattern of correlations with text scores from an alternative text mining dictionary and with developmental performance ratings made using traditional numerical formats. Collectively, findings support the use of the Great Eight Narrative Dictionary to score performance narratives, and the dictionary is provided openly to facilitate future use.

This is a preview of subscription content, log in to check access.

Change history

  • 05 December 2018

    A correction has been made to the original online version of this manuscript pertaining to Table 8, which has been updated to include asterisks indicating statistical significance for each numerical value within the table, per the original table footnote.

Notes

  1. 1.

    Within the field of text mining and larger field of natural language processing, a large number of dictionaries exist to assess themes ranging from the many listed in the Harvard General Inquirer (HGI; Stone, Dunphy, & Smith, 1966) to the ingestion and religious themes, among many others, found in LIWC (Pennebaker, Boyd, Jordan, & Blackburn, 2015). However, there are no dictionaries to assess job performance.

  2. 2.

    Although conceptualization is typically done at the broad 8-factor level, each of the competencies can also be broken down into narrower behavioral components (Bartram, 2005).

  3. 3.

    About 7.6% of cases were removed per this decision rule. After removal, the average number of words was 938 with a standard deviation of 713 words. Total words had a slight positive skew (1.61).

  4. 4.

    Whenever “not,” “n’t,” “cannot,” “never,” and “no” occurred we replaced that phrase with “not_” and concatenated that text with the word immediately following it.

  5. 5.

    Supervised learning refers to analyses where a set of predictors are weighted based on relationships with some outcome (i.e., dependent variable) within the training dataset. A common example would be the use of regression where a set of predictors are weighted based on their relationships among themselves and with an outcome of interest. Once established, those weights can then be applied to the predictors in new samples to predict the given outcome. On the other hand, unsupervised learning involves creating algorithms when target outcomes are not present in the training data (e.g., factor analysis).

  6. 6.

    In more developed research domains, such a conceptual mapping might be accompanied by expected convergent correlations. However, the challenge with specifying benchmark cutoffs (i.e., beyond statistical significance that relationships are greater than zero) is that there is little literature on effect sizes for such text analyses. This makes it difficult to a priori indicate what a “minimal” or “expected” effect might be. An additional challenge is that even if there were better literature to establish expected convergent correlations, this would likely differ across the many convergent measures that were included with this study, resulting in an unwieldy set of predicted relationships. For these reasons, mappings represent relationships that are expected to be significant (i.e., greater than zero), and this applies for correlations with LIWC and with the 360 ratings, as discussed in the next section.

  7. 7.

    Additional peer and subordinate results can be made available from the first author upon request.

  8. 8.

    Theoretically, internal consistency estimates using either the individual phrases or the parcels should be identical, given the parcels are simply indicators with a larger ratio of true score variance, but with fewer indicators. The individual word phrases have less true score variance but more indicators. Regardless of operationalization, the total amount of true score variance and total variance remains the same.

References

  1. Ammons-Stephens, S., Cole, H. J., Riehle, C. F., & Weare, W. H. (2009). Developing core leadership competencies for the library profession. Library Leadership & Management, 23(2), 63–74.

    Google Scholar 

  2. Bartram, D. (2005). The Great Eight competencies: A criterion-centric approach to validation. Journal of Applied Psychology, 90(6), 1185–1203.

    PubMed  Article  Google Scholar 

  3. Brutus, S. (2010). Words versus numbers: A theoretical exploration of giving and receiving narrative comments in performance appraisal. Human Resource Management Review, 20, 144–157.

    Article  Google Scholar 

  4. Campbell, J. P. (1990). Modeling the performance prediction problem in industrial and organizational psychology. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 1, pp. 687–732). Palo Alto: Consulting Psychologists Press.

    Google Scholar 

  5. Condon, D. M., & Revelle, W. (2014). The international cognitive ability resource: Development and initial validation of a public-domain measure. Intelligence, 43, 52–64.

    Article  Google Scholar 

  6. Costigan, R. D., & Donahue, L. (2009). Developing the great eight competencies with leaderless group discussion. Journal of Management Education, 33(5), 596–616.

    Article  Google Scholar 

  7. Davenport, E., & El-Sanhury, N. (1991). Phi/phimax: Review and synthesis. Educational and Psychological Measurement, 51, 821–828.

    Article  Google Scholar 

  8. Ferstl, K. L., & Bruskiewicz, K. T. (2000). Self-other agreement and cognitive reactions to multi-rater feedback. Paper presented at 15th Annual Conference of the Society of Industrial and Organizational Psychology, New Orleans, LA.

  9. Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. C. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40, 84–96.

    Article  Google Scholar 

  10. Gorman, C. A., Meriac, J. P., Roch, S. G., Ray, J. L., & Gamble, J. S. (2017). An exploratory study of current performance management practices: human resource executives’ perspectives. International Journal of Selection and Assessment, 25, 193–202.

    Article  Google Scholar 

  11. Hayes, P. A., & Omodei, M. M. (2011). Managing emergencies: Key competencies for incident management teams. The Australasian Journal of Organisational Psychology, 4, 1–10.

    Article  Google Scholar 

  12. Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and job-performance relations: A socioanalytic perspective. Journal of Applied Psychology, 88(1), 100–112.

    PubMed  Article  Google Scholar 

  13. Ignatow, G., & Mihalcea, R. (2017). Text mining: A guidebook for the social sciences. Thousand Oaks: Sage Publications.

    Google Scholar 

  14. Klendauer, R., Berkovich, M., Gelvin, R., Leimeister, J. M., & Krcmar, H. (2012). Towards a competency model for requirements analysts. Information Systems Journal, 22(6), 475–503.

    Article  Google Scholar 

  15. Kobayashi, V. B., Mol, S. T., Berkers, H. A., Kismihók, G., & Den Hartog, D. N. (2017a). Text classification for organizational researchers: A tutorial. Organizational Research Methods, 21(3), 766–799.

    PubMed  PubMed Central  Article  Google Scholar 

  16. Kobayashi, V. B., Mol, S. T., Berkers, H. A., Kismihók, G., & Den Hartog, D. N. (2017b). Text mining in organizational research. Organizational Research Methods, 21(3), 733–765.

    PubMed  PubMed Central  Article  Google Scholar 

  17. Kurz, R., & Bartram, D. (2002). Competency and individual performance: Modeling the world of work. In I. T. Robertson, M. Callinan, & D. Bartram (Eds.), Organizational effectiveness: The role of psychology (pp. 227–255). Chichester: Wiley.

    Google Scholar 

  18. Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87(1), 72–107.

    Article  Google Scholar 

  19. Liu, B. (2012). Sentiment analysis and opinion mining. San Rafael: Morgan & Claypool Publishers.

    Google Scholar 

  20. McDowall, A., & Kurtz, R. (2007). Making the most of psychometric profiles-effective integration into the coaching process. International Coaching Psychology Review, 2(3), 299–309.

    Google Scholar 

  21. O’Neill, T. A., Goffin, R. D., & Tett, R. P. (2009). Content validation is fundamental for optimizing the criterion validity of personality tests. Industrial and Organizational Psychology, 2(4), 509–513.

    Article  Google Scholar 

  22. Pandey, S., & Pandey, S. K. (2017). Applying natural language processing capabilities in computerized textual analysis to measure organizational culture. Organizational Research Methods. Advanced online publication. http://journals.sagepub.com/doi/abs/10.1177/1094428117745648.

  23. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2, 1–135.

    Article  Google Scholar 

  24. Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin: University of Texas at Austin.

    Google Scholar 

  25. Pulakos, E. D., Hanson, R. M., Arad, S., & Moye, N. (2015). Performance management can be fixed: An on-the-job experiential learning approach for complex behavior change. Industrial and Organizational Psychology, 8(1), 51–76.

    Article  Google Scholar 

  26. Pulakos, E. D., & O’Leary, R. S. (2011). Why is performance management broken? Industrial and Organizational Psychology: Perspectives on Science and Practice, 4, 146–164.

    Article  Google Scholar 

  27. R Core Development Team. (2007). R: A language and environment for statistical computing. R Vienna: Foundation for Statistical Computing.

    Google Scholar 

  28. Rojon, C., McDowall, A., & Saunders, M. N. (2015). The relationships between traditional selection assessments and workplace performance criteria specificity: A comparative meta-analysis. Human Performance, 28(1), 1–25.

    Article  Google Scholar 

  29. Scullen, S. E., Mount, M. K., & Judge, T. A. (2003). Evidence of the construct validity of developmental ratings of managerial performance. Journal of Applied Psychology, 88, 50–66.

    PubMed  Article  Google Scholar 

  30. Short, J. C., Broberg, J. C., Cogliser, C. C., & Brigham, K. C. (2010). Construct validation using computer-aided text analysis (CATA): An illustration using entrepreneurial orientation. Organizational Research Methods, 13, 320–347.

    Article  Google Scholar 

  31. Sliter, K. A. (2015). Assessing 21st century skills: Competency modeling to the rescue. Industrial and Organizational Psychology, 8(2), 284–289.

    Article  Google Scholar 

  32. Speer, A. B. (2018). Quantifying with words: An investigation of the validity of narrative-derived performance scores. Personnel Psychology. Advanced online publication, 71, 299–333. https://doi.org/10.1111/peps.12263.

    Article  Google Scholar 

  33. Spendlove, M. (2007). Competencies for effective leadership in higher education. International Journal of Educational Management, 21(5), 407–417.

    Google Scholar 

  34. Stemler, S. E. (2015). Content analysis. In R. Scott & S. S. Kosslyn (Eds.), Emerging trends in the social and behavioral sciences: an interdisciplinary, searchable, and linkable resource (pp. 1–14). New York: Wiley.

    Google Scholar 

  35. Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). The general inquirer: A computer approach to content analysis. Oxford, England: M.I.T. Press.

    Google Scholar 

  36. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.

    Article  Google Scholar 

  37. Tett, R. P., Guterman, H. A., Bleier, A., & Murphy, P. J. (2000). Development and content validation of a “hyperdimensional” taxonomy of managerial competence. Human Performance, 13, 205–251.

    Article  Google Scholar 

  38. Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). Is there a general factor in ratings of job performance? A meta-analytic framework for disentangling substantive and error influences. Journal of Applied Psychology, 90(1), 108–131.

    PubMed  Article  Google Scholar 

  39. Wu, C.-H., & Wang, Y. (2011). Understanding proactive leadership. In W. H. Mobley, M. Li, & Y. Wang (Eds.), Advances in global leadership (Vol. 6, pp. 299–314). Bingley: Emerald Group.

    Google Scholar 

  40. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301–320.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Andrew B. Speer or Andrew P. Tenbrink or Sydney R. Siver.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(DOCX 15 kb)

Appendix

Appendix

Table 9 Great Narrative Dictionary v1.0

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Speer, A.B., Schwendeman, M.G., Reich, C.C. et al. Investigating the Construct Validity of Performance Comments: Creation of the Great Eight Narrative Dictionary. J Bus Psychol 34, 747–767 (2019). https://doi.org/10.1007/s10869-018-9599-9

Download citation

Keywords

  • Job performance
  • Performance management
  • Text mining
  • Narrative comments