Performance narratives are qualitative text descriptions of an employee’s work performance. Despite containing rich information that can be leveraged by practitioners and researchers, few efforts have systematically examined performance narratives. This study investigated whether performance narratives can automatically and reliably be scored into meaningful performance dimensions. Using the Great Eight as a conceptual framework, a custom dictionary was developed and comments were scored via automated text mining. This dictionary, labeled the Great Eight Narrative Dictionary, was then validated against a set of convergent measures to establish construct validity evidence for the derived narrative scores. Inter-rater agreement in linking word phrases to performance dimensions was high, and the derived performance dimensions had acceptable internal consistency. Narrative scores also displayed evidence of construct validity, with an expected pattern of correlations with text scores from an alternative text mining dictionary and with developmental performance ratings made using traditional numerical formats. Collectively, findings support the use of the Great Eight Narrative Dictionary to score performance narratives, and the dictionary is provided openly to facilitate future use.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Within the field of text mining and larger field of natural language processing, a large number of dictionaries exist to assess themes ranging from the many listed in the Harvard General Inquirer (HGI; Stone, Dunphy, & Smith, 1966) to the ingestion and religious themes, among many others, found in LIWC (Pennebaker, Boyd, Jordan, & Blackburn, 2015). However, there are no dictionaries to assess job performance.
Although conceptualization is typically done at the broad 8-factor level, each of the competencies can also be broken down into narrower behavioral components (Bartram, 2005).
About 7.6% of cases were removed per this decision rule. After removal, the average number of words was 938 with a standard deviation of 713 words. Total words had a slight positive skew (1.61).
Whenever “not,” “n’t,” “cannot,” “never,” and “no” occurred we replaced that phrase with “not_” and concatenated that text with the word immediately following it.
Supervised learning refers to analyses where a set of predictors are weighted based on relationships with some outcome (i.e., dependent variable) within the training dataset. A common example would be the use of regression where a set of predictors are weighted based on their relationships among themselves and with an outcome of interest. Once established, those weights can then be applied to the predictors in new samples to predict the given outcome. On the other hand, unsupervised learning involves creating algorithms when target outcomes are not present in the training data (e.g., factor analysis).
In more developed research domains, such a conceptual mapping might be accompanied by expected convergent correlations. However, the challenge with specifying benchmark cutoffs (i.e., beyond statistical significance that relationships are greater than zero) is that there is little literature on effect sizes for such text analyses. This makes it difficult to a priori indicate what a “minimal” or “expected” effect might be. An additional challenge is that even if there were better literature to establish expected convergent correlations, this would likely differ across the many convergent measures that were included with this study, resulting in an unwieldy set of predicted relationships. For these reasons, mappings represent relationships that are expected to be significant (i.e., greater than zero), and this applies for correlations with LIWC and with the 360 ratings, as discussed in the next section.
Additional peer and subordinate results can be made available from the first author upon request.
Theoretically, internal consistency estimates using either the individual phrases or the parcels should be identical, given the parcels are simply indicators with a larger ratio of true score variance, but with fewer indicators. The individual word phrases have less true score variance but more indicators. Regardless of operationalization, the total amount of true score variance and total variance remains the same.
Ammons-Stephens, S., Cole, H. J., Riehle, C. F., & Weare, W. H. (2009). Developing core leadership competencies for the library profession. Library Leadership & Management, 23(2), 63–74.
Bartram, D. (2005). The Great Eight competencies: A criterion-centric approach to validation. Journal of Applied Psychology, 90(6), 1185–1203.
Brutus, S. (2010). Words versus numbers: A theoretical exploration of giving and receiving narrative comments in performance appraisal. Human Resource Management Review, 20, 144–157.
Campbell, J. P. (1990). Modeling the performance prediction problem in industrial and organizational psychology. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 1, pp. 687–732). Palo Alto: Consulting Psychologists Press.
Condon, D. M., & Revelle, W. (2014). The international cognitive ability resource: Development and initial validation of a public-domain measure. Intelligence, 43, 52–64.
Costigan, R. D., & Donahue, L. (2009). Developing the great eight competencies with leaderless group discussion. Journal of Management Education, 33(5), 596–616.
Davenport, E., & El-Sanhury, N. (1991). Phi/phimax: Review and synthesis. Educational and Psychological Measurement, 51, 821–828.
Ferstl, K. L., & Bruskiewicz, K. T. (2000). Self-other agreement and cognitive reactions to multi-rater feedback. Paper presented at 15th Annual Conference of the Society of Industrial and Organizational Psychology, New Orleans, LA.
Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. C. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40, 84–96.
Gorman, C. A., Meriac, J. P., Roch, S. G., Ray, J. L., & Gamble, J. S. (2017). An exploratory study of current performance management practices: human resource executives’ perspectives. International Journal of Selection and Assessment, 25, 193–202.
Hayes, P. A., & Omodei, M. M. (2011). Managing emergencies: Key competencies for incident management teams. The Australasian Journal of Organisational Psychology, 4, 1–10.
Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and job-performance relations: A socioanalytic perspective. Journal of Applied Psychology, 88(1), 100–112.
Ignatow, G., & Mihalcea, R. (2017). Text mining: A guidebook for the social sciences. Thousand Oaks: Sage Publications.
Klendauer, R., Berkovich, M., Gelvin, R., Leimeister, J. M., & Krcmar, H. (2012). Towards a competency model for requirements analysts. Information Systems Journal, 22(6), 475–503.
Kobayashi, V. B., Mol, S. T., Berkers, H. A., Kismihók, G., & Den Hartog, D. N. (2017a). Text classification for organizational researchers: A tutorial. Organizational Research Methods, 21(3), 766–799.
Kobayashi, V. B., Mol, S. T., Berkers, H. A., Kismihók, G., & Den Hartog, D. N. (2017b). Text mining in organizational research. Organizational Research Methods, 21(3), 733–765.
Kurz, R., & Bartram, D. (2002). Competency and individual performance: Modeling the world of work. In I. T. Robertson, M. Callinan, & D. Bartram (Eds.), Organizational effectiveness: The role of psychology (pp. 227–255). Chichester: Wiley.
Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87(1), 72–107.
Liu, B. (2012). Sentiment analysis and opinion mining. San Rafael: Morgan & Claypool Publishers.
McDowall, A., & Kurtz, R. (2007). Making the most of psychometric profiles-effective integration into the coaching process. International Coaching Psychology Review, 2(3), 299–309.
O’Neill, T. A., Goffin, R. D., & Tett, R. P. (2009). Content validation is fundamental for optimizing the criterion validity of personality tests. Industrial and Organizational Psychology, 2(4), 509–513.
Pandey, S., & Pandey, S. K. (2017). Applying natural language processing capabilities in computerized textual analysis to measure organizational culture. Organizational Research Methods. Advanced online publication. http://journals.sagepub.com/doi/abs/10.1177/1094428117745648.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2, 1–135.
Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin: University of Texas at Austin.
Pulakos, E. D., Hanson, R. M., Arad, S., & Moye, N. (2015). Performance management can be fixed: An on-the-job experiential learning approach for complex behavior change. Industrial and Organizational Psychology, 8(1), 51–76.
Pulakos, E. D., & O’Leary, R. S. (2011). Why is performance management broken? Industrial and Organizational Psychology: Perspectives on Science and Practice, 4, 146–164.
R Core Development Team. (2007). R: A language and environment for statistical computing. R Vienna: Foundation for Statistical Computing.
Rojon, C., McDowall, A., & Saunders, M. N. (2015). The relationships between traditional selection assessments and workplace performance criteria specificity: A comparative meta-analysis. Human Performance, 28(1), 1–25.
Scullen, S. E., Mount, M. K., & Judge, T. A. (2003). Evidence of the construct validity of developmental ratings of managerial performance. Journal of Applied Psychology, 88, 50–66.
Short, J. C., Broberg, J. C., Cogliser, C. C., & Brigham, K. C. (2010). Construct validation using computer-aided text analysis (CATA): An illustration using entrepreneurial orientation. Organizational Research Methods, 13, 320–347.
Sliter, K. A. (2015). Assessing 21st century skills: Competency modeling to the rescue. Industrial and Organizational Psychology, 8(2), 284–289.
Speer, A. B. (2018). Quantifying with words: An investigation of the validity of narrative-derived performance scores. Personnel Psychology. Advanced online publication, 71, 299–333. https://doi.org/10.1111/peps.12263.
Spendlove, M. (2007). Competencies for effective leadership in higher education. International Journal of Educational Management, 21(5), 407–417.
Stemler, S. E. (2015). Content analysis. In R. Scott & S. S. Kosslyn (Eds.), Emerging trends in the social and behavioral sciences: an interdisciplinary, searchable, and linkable resource (pp. 1–14). New York: Wiley.
Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). The general inquirer: A computer approach to content analysis. Oxford, England: M.I.T. Press.
Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.
Tett, R. P., Guterman, H. A., Bleier, A., & Murphy, P. J. (2000). Development and content validation of a “hyperdimensional” taxonomy of managerial competence. Human Performance, 13, 205–251.
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). Is there a general factor in ratings of job performance? A meta-analytic framework for disentangling substantive and error influences. Journal of Applied Psychology, 90(1), 108–131.
Wu, C.-H., & Wang, Y. (2011). Understanding proactive leadership. In W. H. Mobley, M. Li, & Y. Wang (Eds.), Advances in global leadership (Vol. 6, pp. 299–314). Bingley: Emerald Group.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301–320.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
About this article
Cite this article
Speer, A.B., Schwendeman, M.G., Reich, C.C. et al. Investigating the Construct Validity of Performance Comments: Creation of the Great Eight Narrative Dictionary. J Bus Psychol 34, 747–767 (2019). https://doi.org/10.1007/s10869-018-9599-9
- Job performance
- Performance management
- Text mining
- Narrative comments