Abstract
Scaling describes the application of numbers, or judgments that are converted to numerical values, to describe the perceived intensity of a sensory experience or the degree of liking or disliking for some experience or product. Scaling forms the basis for the sensory method of descriptive analysis. A variety of methods have been used for this purpose and with some caution, all work well in differentiating products. This chapter discusses theoretical issues as well as practical considerations in scaling.
The vital importance of knowing the properties and limitations of a measuring instrument can hardly be denied by most natural scientists. However, the use of many different scales for sensory measurement is common within food science; but very few of these have ever been validated… .
—(Land and Shepard, 1984, pp. 144–145)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AACC (American Association of Cereal Chemists). 1986. Approved Methods of the AACC, Eighth Edition. Method 90–10. Baking quality of cake flour, rev. Oct. 1982. The American Association of Cereal Chemists, St. Paul, MN, pp. 1–4.
Anderson, N. H. 1974. Algebraic models in perception. In: E. C. Carterette and M. P. Friedman (eds.), Handbook of Perception. Psychophysical Judgment and Measurement, Vol. 2. Academic, New York, pp. 215–298.
Anderson, N. H. 1977. Note on functional measurement and data analysis. Perception and Psychophysics, 21, 201–215.
ASTM. 2008a. Standard test method for unipolar magnitude estimation of sensory attributes. Designation E 1697-05. In: Annual Book of ASTM Standards, Vol. 15.08, End Use Products. American Society for Testing and Materials, Conshohocken, PA, pp. 122–131.
ASTM. 2008b. Standard test method for sensory evaluation of red pepper heat. Designation E 1083-00. In: Annual Book of ASTM Standards, Vol. 15.08, End Use Products. American Society for Testing and Materials, Conshohocken, PA, pp. 49–53.
Aust, L. B., Gacula, M. C., Beard, S. A. and Washam, R. W., II. 1985. Degree of difference test method in sensory evaluation of heterogeneous product types. Journal of Food Science, 50, 511–513.
Baird, J. C. and Noma, E. 1978. Fundamentals of Scaling and Psychophysics. Wiley, New York.
Banks, W. P. and Coleman, M. J. 1981. Two subjective scales of number. Perception and Psychophysics, 29, 95–105.
Bartoshuk, L. M., Snyder, D. J. and Duffy, V. B. 2006. Hedonic gLMS: Valid comparisons for food liking/disliking across obesity, age, sex and PROP status. Paper presented at the 2006 Annual Meeting, Association for Chemoreception Sciences.
Bartoshuk, L. M., Duffy, V. B., Fast, K., Green, B. G., Prutkin, J. and Snyder, D. J. 2003. Labeled scales (e.g. category, Likert, VAS) and invalid across-group comparisons: What we have learned from genetic variation in taste. Food Quality and Preference, 14, 125–138.
Bartoshuk, L. M., Duffy, V. B., Green, B. G., Hoffman, H. J., Ko, C.-W., Lucchina, L. A., Marks, L. E., Snyder, D. J. and Weiffenbach, J. M. 2004a. Valid across-group comparisons with labeled scales: the gLMS versus magnitude matching. Physiology and Behavior, 82, 109–114.
Bartoshuk, L. M., Duffy, V. B., Chapo, A. K., Fast, K., Yiee, J. H., Hoffman, H. J., Ko, C.-W. and Snyder, D. J. 2004b. From psychophysics to the clinic: Missteps and advances. Food Quality and Preference, 14, 617–632.
Bartoshuk, L. M., Duffy, V. B., Fast, K., Green, B. Kveton, J., Lucchina, L. A., Prutkin, J. M., Snyder, D. J. and Tie, K. 1999. Sensory variability, food preferences and BMI in non-medium and supertasters of PROP. Appetite, 33, 228–229.
Basker, D. 1988. Critical values of differences among rank sums for multiple comparisons. Food Technology, 42(2), 79, 80–84.
Baten, W. D. 1946. Organoleptic tests pertaining to apples and pears. Food Research, 11, 84–94.
Bendig, A. W. and Hughes, J. B. 1953. Effect of number of verbal anchoring and number of rating scale categories upon transmitted information. Journal of Experimental Psychology, 46(2), 87–90.
Bi, J. 2006. Sensory Discrimination Tests and Measurement. Blackwell, Ames, IA.
Birch, L. L., Zimmerman, S. I. and Hind, H. 1980. The influence of social-affective context on the formation of children’s food preferences. Child Development, 51, 865–861.
Birch, L. L., Birch, D., Marlin, D. W. and Kramer, L. 1982. Effects of instrumental consumption on children’s food preferences. Appetite, 3, 125–143.
Birnbaum, M. H. 1982. Problems with so-called “direct” scaling. In: J. T. Kuznicki, R. A. Johnson and A. F. Rutkiewic (eds.), Selected Sensory Methods: Problems and Approaches to Hedonics. American Society for Testing and Materials, Philadelphia, pp. 34–48.
Borg, G. 1982. A category scale with ratio properties for intermodal and interindividual comparisons. In: H.-G. Geissler and P. Pextod (Eds.), Psychophysical Judgment and the Process of Perception. VEB Deutscher Verlag der Wissenschaften, Berlin, pp. 25–34.
Borg, G. 1990. Psychophysical scaling with applications in physical work and the perception of exertion. Scandinavian Journal of Work and Environmental Health, 16, 55–58.
Boring, E. G. 1942. Sensation and Perception in the History of Experimental Psychology. Appleton-Century-Crofts, New York.
Brandt, M. A., Skinner, E. Z. and Coleman, J. A. 1963. The texture profile method. Journal of Food Science, 28, 404–409.
Butler, G., Poste, L. M., Wolynetz, M. S., Agar, V. E. and Larmond, E. 1987. Alternative analyses of magnitude estimation data. Journal of Sensory Studies, 2, 243–257.
Cardello, A. V. and Schutz, H. G. 2004. Research note. Numerical scale-point locations for constructing the LAM (Labeled affective magnitude) scale. Journal of Sensory Studies, 19, 341–346.
Cardello, A. V., Lawless, H. T. and Schutz, H. G. 2008. Effects of extreme anchors and interior label spacing on labeled magnitude scales. Food Quality and Preference, 21, 323–334.
Cardello, A. V., Winterhaler, C. and Schutz, H. G. 2003. Predicting the handle and comfort of military clothing fabrics from sensory and instrumental data: Development and application of new psychophysical methods. Textile Research Journal, 73, 221–237.
Cardello, A. V., Schutz, H. G., Lesher, L. L. and Merrill, E. 2005. Development and testing of a labeled magnitude scale of perceived satiety. Appetite, 44, 1–13.
Caul, J. F. 1957. The profile method of flavor analysis. Advances in Food Research, 7, 1–40.
Chambers, E. C. and Wolf, M. B. 1996. Sensory Testing Methods. ASTM Manual Series, MNL 26. ASTM International, West Conshohocken, PA.
Chen, A. W., Resurreccion, A. V. A. and Paguio, L. P. 1996. Age appropriate hedonic scales to measure the food preferences of young children. Journal of Sensory Studies, 11, 141–163.
Chung, S.-J. and Vickers, 2007a. Long-term acceptability and choice of teas differing in sweetness. Food Quality and Preference 18, 963–974.
Chung, S.-J. and Vickers, 2007b. Influence of sweetness on the sensory-specific satiety and long-term acceptability of tea. Food Quality and Preference, 18, 256–267.
Coetzee, H. and Taylor, J. R. N. 1996. The use and adaptation of the paired comparison method in the sensory evaluation of hamburger-type patties by illiterate/semi-literate consumers. Food Quality and Preference, 7, 81–85.
Collins, A. A. and Gescheider, G. A. 1989. The measurement of loudness in individual children and adults by absolute magnitude estimation and cross modality matching. Journal of the Acoustical Society of America, 85, 2012–2021.
Conner, M. T. and Booth, D. A. 1988. Preferred sweetness of a lime drink and preference for sweet over non-sweet foods. Related to sex and reported age and body weight. Appetite, 10, 25–35.
Cordinnier, S. M. and Delwiche, J. F. 2008. An alternative method for assessing liking: Positional relative rating versus the 9-point hedonic scale. Journal of Sensory Studies, 23, 284–292.
Cox, E. P. 1980. The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 18, 407–422.
Curtis, D. W., Attneave, F. and Harrington, T. L. 1968. A test of a two-stage model of magnitude estimation. Perception and Psychophysics, 3, 25–31.
Edwards, A. L. 1952. The scaling of stimuli by the method of successive intervals. Journal of Applied Psychology, 36, 118–122.
Ekman, G. 1964. Is the power law a special case of Fechner’s law? Perceptual and Motor Skills, 19, 730.
Einstein, M. A. 1976. Use of linear rating scales for the evaluation of beer flavor by consumers. Journal of Food Science, 41, 383–385.
El Dine, A. N. and Olabi, A. 2009. Effect of reference foods in repeated acceptability tests: Testing familiar and novel foods using 2 acceptability scales. Journal of Food Science, 74, S97–S105.
Engen, T. 1974. Method and theory in the study of odor preferences. In: A. Turk, J. W. Johnson and D. G. Moulton (Eds.), Human Responses to Environmental Odors. Academic, New York.
Finn, A. and Louviere, J. J. 1992. Determining the appropriate response to evidence of public concern: The case of food safety. Journal of Public Policy and Marketing, 11, 12–25.
Forde, C. G. and Delahunty, C. M. 2004. Understanding the role cross-modal sensory interactions play in food acceptability in younger and older consumers. Food Quality and Preference, 15, 715–727.
Frijters, J. E. R., Kooistra, A. and Vereijken, P. F. G. 1980. Tables of d’ for the triangular method and the 3-AFC signal detection procedure. Perception and Psychophysics, 27, 176–178.
Gaito, J. 1980. Measurement scales and statistics: Resurgence of an old misconception. Psychological Bulletin, 87, 564–587.
Gay, C., and Mead, R. 1992 A statistical appraisal of the problem of sensory measurement. Journal of Sensory Studies, 7, 205–228.
Gent, J. F. and Bartoshuk, L. M. 1983. Sweetness of sucrose, neohesperidin dihydrochalcone and sacchar in is related to genetic ability to taste the bitter substance 6-n-propylthiouracil. Chemical Senses, 7, 265–272.
Gescheider, G. A. 1988. Psychophysical scaling. Annual Review of Psychology, 39, 169–200.
Giovanni, M. E. and Pangborn, R. M. 1983. Measurement of taste intensity and degree of liking of beverages by graphic scaling and magnitude estimation. Journal of Food Science, 48, 1175–1182.
Gracely, R. H., McGrath, P. and Dubner, R. 1978a. Ratio scales of sensory and affective verbal-pain descriptors. Pain, 5, 5–18.
Gracely, R. H., McGrath, P. and Dubner, R. 1978b. Validity and sensitivity of ratio scales of sensory and affective verbal-pain descriptors: Manipulation of affect by Diazepam. Pain, 5, 19–29.
Green, B. G., Shaffer, G. S. and Gilmore, M. M. 1993. Derivation and evaluation of a semantic scale of oral sensation magnitude with apparent ratio properties. Chemical Senses, 18, 683–702.
Green, B. G., Dalton, P., Cowart, B., Shaffer, G., Rankin, K. and Higgins, J. 1996. Evaluating the “Labeled Magnitude Scale” for measuring sensations of taste and smell. Chemical Senses, 21, 323–334.
Greene, J. L., Bratka, K. J., Drake, M. A. and Sanders, T. H. 2006. Effective of category and line scales to characterize consumer perception of fruity fermented flavors in peanuts. Journal of Sensory Studies, 21, 146–154.
Guest, S., Essick, G., Patel, A., Prajpati, R. and McGlone, F. 2007. Labeled magnitude scales for oral sensations of wetness, dryness, pleasantness and unpleasantness. Food Quality and Preference, 18, 342–352.
Hein, K. A., Jaeger, S. R., Carr, B. T. and Delahunty, C. M. 2008. Comparison of five common acceptance and preference methods. Food Quality and Preference, 19, 651–661.
Huskisson, E. C. 1983. Visual analogue scales. In: R. Melzack (Ed.), Pain Measurement and Assessment. Raven, New York, pp. 34–37.
Jaeger, S. R.; Jørgensen, A. S., AAslying, M. D. and Bredie, W. L. P. 2008. Best-worst scaling: An introduction and initial comparison with monadic rating for preference elicitation with food products. Food Quality and Preference, 19, 579–588.
Jaeger, S. R. and Cardello, A. V. 2009. Direct and indirect hedonic scaling methods: A comparison of the labeled affective magnitude (LAM) scale and best-worst scaling. Food Quality and Preference, 20, 249–258.
Jones, F. N. 1974. History of psychophysics and judgment. In: E. C. Carterette and M. P. Friedman (Eds.), Handbook of Perception. Psychophysical Judgment and Measurement, Vol. 2. Academic, New York, pp. 11–22.
Jones, L. V. and Thurstone, L. L. 1955. The psychophysics of semantics: An experimental investigation. Journal of Applied Psychology, 39, 31–36.
Jones, L. V., Peryam, D. R. and Thurstone, L. L. 1955. Development of a scale for measuring soldier’s food preferences. Food Research, 20, 512–520.
Keskitalo, K. Knaapila, A., Kallela, M., Palotie, A., Wessman, M., Sammalisto, S., Peltonen, L., Tuorila, H. and Perola, M. 2007. Sweet taste preference are partly genetically determined: Identification of a trait locus on Chromosome 161–3. American Journal of Clinical Nutrition, 86, 55–63.
Kim, K.-O. and O’Mahony, M. 1998. A new approach to category scales of intensity I: Traditional versus rank-rating. Journal of Sensory Studies, 13, 241–249.
King, B. M. 1986. Odor intensity measured by an audio method. Journal of Food Science, 51, 1340–1344.
Koo, T.-Y., Kim, K.-O., and O’Mahony, M. 2002. Effects of forgetting on performance on various intensity scaling protocols: Magnitude estimation and labeled magnitude scale (Green scale). Journal of Sensory Studies, 17, 177–192.
Kroll, B. J. 1990. Evaluating rating scales for sensory testing with children. Food Technology, 44(11), 78–80, 82, 84, 86.
Kurtz, D. B., White, T. L. and Hayes, M. 2000. The labeled dissimilarity scale: A metric of perceptual dissimilarity. Perception and Psychophysics, 62, 152–161.
Land, D. G. and Shepard, R. 1984. Scaling and ranking methods. In: J. R. Piggott (ed.), Sensory Analysis of Foods. Elsevier Applied Science, London, pp. 141–177.
Lane, H. L., Catania, A. C. and Stevens, S. S. 1961. Voice level: Autophonic scale, perceived loudness and effect of side tone. Journal of the Acoustical Society of America, 33, 160–167.
Larson-Powers, N. and Pangborn, R. M. 1978. Descriptive analysis of the sensory properties of beverages and gelatins containing sucrose or synthetic sweeteners. Journal of Food Science, 43, 47–51.
Lawless, H. T. 1977. The pleasantness of mixtures in taste and olfaction. Sensory Processes, 1, 227–237.
Lawless, H. T. 1989. Logarithmic transformation of magnitude estimation data and comparisons of scaling methods. Journal of Sensory Studies, 4, 75–86.
Lawless, H. T. and Clark, C. C. 1992. Psychological biases in time intensity scaling. Food Technology, 46, 81, 84–86, 90.
Lawless, H. T. and Malone, J. G. 1986a. The discriminative efficiency of common scaling methods. Journal of Sensory Studies, 1, 85–96.
Lawless, H. T. and Malone, G. J. 1986b. A comparison of scaling methods: Sensitivity, replicates and relative measurement. Journal of Sensory Studies, 1, 155–174.
Lawless, H. T. and Skinner, E. Z. 1979. The duration and perceived intensity of sucrose taste. Perception and Psychophysics, 25, 249–258.
Lawless, H. T., Popper, R. and Kroll, B. J. 2010a. Comparison of the labeled affective magnitude (LAM) scale, an 11-point category scale and the traditional nine-point hedonic scale. Food Quality and Preference, 21, 4–12.
Lawless, H. T., Sinopoli, D. and Chapman, K. W. 2010b. A comparison of the labeled affective magnitude scale and the nine point hedonic scale and examination of categorical behavior. Journal of Sensory Studies, 25, S1, 54–66.
Lawless, H. T., Cardello, A. V., Chapman, K. W., Lesher, L. L., Given, Z. and Schutz, H. G. 2010c. A comparison of the effectiveness of hedonic scales and end-anchor compression effects. Journal of Sensory Studies, 28, S1, 18–34.
Lee, H.-J., Kim, K.-O., and O’Mahony, M. 2001. Effects of forgetting on various protocols for category and line scales of intensity. Journal of Sensory Studies, 327–342.
Likert, R. 1932. Technique for the measurement of attitudes. Archives of Psychology, 140, 1–55.
Lindvall, T. and Svensson, L. T. 1974. Equal unpleasantness matching of malodourous substances in the community. Journal of Applied Psychology, 59, 264–269.
Mahoney, C. H., Stier, H. L. and Crosby, E. A. 1957. Evaluating flavor differences in canned foods. II. Fundamentals of the simplified procedure. Food Technology 11, Supplemental Symposium Proceedings, 37–42.
Marks, L. E. 1978. Binaural summation of the loudness of pure tones. Journal of the Acoustical Society of America, 64, 107–113.
Marks, L. E., Borg, G. and Ljunggren, G. 1983. Individual differences in perceived exertion assessed by two new methods. Perception and Psychophysic, 34, 280–288.
Marks, L. E., Borg, G. and Westerlund, J. 1992. Differences in taste perception assessed by magnitude matching and by category-ratio scaling. Chemical Senses, 17, 493–506.
Mattes, R. D. and Lawless, H. T. 1985. An adjustment error in optimization of taste intensity. Appetite, 6, 103–114.
McBride, R. L. 1983a. A JND-scale/category scale convergence in taste. Perception and Psychophysics, 34, 77–83.
McBride, R. L. 1983b. Taste intensity and the case of exponents greater than 1. Australian Journal of Psychology, 35, 175–184.
McBurney, D. H. and Shick, T. R. 1971. Taste and water taste for 26 compounds in man. Perception and Psychophysics, 10, 249–252.
McBurney, D. H. and Bartoshuk, L. M. 1973. Interactions between stimuli with different taste qualities. Physiology and Behavior, 10, 1101–1106.
McBurney, D. H., Smith, D. V. and Shick, T. R. 1972. Gustatory cross-adaptation: Sourness and bitterness. Perception and Psychophysics, 11, 228–232.
Mead, R. and Gay, C. 1995. Sequential design of sensory trials. Food Quality and Preference, 6, 271–280.
Mecredy, J. M. Sonnemann, J. C. and Lehmann, S. J. 1974. Sensory profiling of beer by a modified QDA method. Food Technology, 28, 36–41.
Meilgaard, M., Civille, G. V. and Carr, B. T. 2006. Sensory Evaluation Techniques, Fourth Edition. CRC, Boca Raton, FL.
Moore, L. J. and Shoemaker, C. F. 1981. Sensory textural properties of stabilized ice cream. Journal of Food Science, 46, 399–402.
Moskowitz, H. R. 1971. The sweetness and pleasantness of sugars. American Journal of Psychology, 84, 387–405.
Moskowitz, H. R. and Sidel, J. L. 1971. Magnitude and hedonic scales of food acceptability. Journal of Food Science, 36, 677–680.
Muñoz, A. M. and Civille, G. V. 1998. Universal, product and attribute specific scaling and the development of common lexicons in descriptive analysis. Journal of Sensory Studies, 13, 57–75.
Newell, G. J. and MacFarlane, J. D. 1987. Expanded tables for multiple comparison procedures in the analysis of ranked data. Journal of Food Science, 52, 1721–1725.
Olabi, A. and Lawless, H. T. 2008. Persistence of context effects with training and reference standards. Journal of Food Science, 73, S185–S189.
O’Mahony, M., Park, H., Park, J. Y. and Kim, K.-O. 2004. Comparison of the statistical analysis of hedonic data using analysis of variance and multiple comparisons versus and R-index analysis of the ranked data. Journal of Sensory Studies, 19, 519–529.
Pangborn, R. M. and Dunkley, W. L. 1964. Laboratory procedures for evaluating the sensory properties of milk. Dairy Science Abstracts, 26–55–62.
Parducci, A. 1965. Category judgment: A range-frequency model. Psychological Review, 72, 407–418.
Park, J.-Y., Jeon, S.-Y., O’Mahony, M. and Kim, K.-O. 2004. Induction of scaling errors. Journal of Sensory Studies, 19, 261–271.
Pearce, J. H., Korth, B. and Warren, C. B. 1986. Evaluation of three scaling methods for hedonics. Journal of Sensory Studies, 1, 27–46.
Peryam. D. 1989. Reflections. In: Sensory Evaluation. In Celebration of our Beginnings. American Society for Testing and Materials, Philadelphia, pp. 21–30.
Peryam, D. R. and Girardot, N. F. 1952. Advanced taste-test method. Food Engineering, 24, 58–61, 194.
Piggot, J. R. and Harper, R. 1975. Ratio scales and category scales for odour intensity. Chemical Senses and Flavour, 1, 307–316.
Pokorńy, J., Davídek, J., Prnka, V. and Davídková, E. 1986. Nonparametric evaluation of graphical sensory profiles for the analysis of carbonated beverages. Die Nahrung, 30, 131–139.
Poulton, E. C. 1989. Bias in Quantifying Judgments. Lawrence Erlbaum, Hillsdale, NJ.
Richardson, L. F. and Ross, J. S. 1930. Loudness and telephone current. Journal of General Psychology, 3, 288–306.
Rosenthal, R. 1987. Judgment Studies: Design, Analysis and Meta-Analysis. University Press, Cambridge.
Shand, P. J., Hawrysh, Z. J., Hardin, R. T. and Jeremiah, L. E. 1985. Descriptive sensory analysis of beef steaks by category scaling, line scaling and magnitude estimation. Journal of Food Science, 50, 495–499.
Schutz, H. G. and Cardello, A. V. 2001.. A labeled affective magnitude (LAM) scale for assessing food liking/disliking. Journal of Sensory Studies, 16, 117–159.
Siegel, S. 1956. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, New York.
Sriwatanakul, K., Kelvie, W., Lasagna, L., Calimlim, J. F., Wels, O. F. and Mehta, G. 1983. Studies with different types of visual analog scales for measurement of pain. Clinical Pharmacology and Therapeutics, 34, 234–239.
Stevens, J. C. and Marks, L. M. 1980. Cross-modality matching functions generated by magnitude estimation. Perception and Psychophysics, 27, 379–389.
Stevens, S. S. 1951. Mathematics, measurement and psychophysics. In: S. S. Stevens (ed.), Handbook of Experimental Psychology. Wiley, New York, pp. 1–49.
Stevens, S. S. 1956. The direct estimation of sensory magnitudes—loudness. American Journal of Psychology, 69, 1–25.
Stevens, S. S. 1957. On the psychophysical law. Psychological Review, 64, 153–181.
Stevens, S. S. 1969. On predicting exponents for cross-modality matches. Perception and Psychophysics, 6, 251–256.
Stevens, S. S. and Galanter, E. H. 1957. Ratio scales and category scales for a dozen perceptual continua. Journal of Experimental Psychology, 54, 377–411.
Stoer, N. L. and Lawless, H. T. 1993. Comparison of single product scaling and relative-to-reference scaling in sensory evaluation of dairy products. Journal of Sensory Studies, 8, 257–270.
Stone, H., Sidel, J., Oliver, S., Woolsey, A. and Singleton, R. C. 1974. Sensory Evaluation by quantitative descriptive analysis. Food Technology, 28, 24–29, 32, 34.
Teghtsoonian, M. 1980. Children’s scales of length and loudness: A developmental application of cross-modal matching. Journal of Experimental Child Psychology, 30, 290–307.
Thurstone, L. L. 1927. A law of comparative judgment. Psychological Review, 34, 273–286.
Townsend, J. T. and Ashby, F. G. 1980. Measurement scales and statistics: The misconception misconceived. Psychological Bulletin, 96, 394–401.
Vickers, Z. M. 1983. Magnitude estimation vs. category scaling of the hedonic quality of food sounds. Journal of Food Science, 48, 1183–1186.
Villanueva, N. D. M. and Da Silva, M. A. A. P. 2009. Performance of the nine-point hedonic, hybrid and self-adjusting scales in the generation of internal preference maps. Food Quality and Preference, 20, 1–12.
Villanueva, N. D. M., Petenate, A. J., and Da Silva, M. A. A. P. 2005. Comparative performance of the hybrid hedonic scale as compared to the traditional hedonic, self-adjusting and ranking scales. Food Quality and Preference, 16, 691–703.
Ward, L. M. 1986. Mixed-modality psychophysical scaling: Double cross-modality matching for “difficult” continua. Perception and Psychophysics, 39, 407–417.
Weiss, D. J. 1972. Averaging: an empirical validity criterion for magnitude estimation. Perception and Psychophysics, 12, 385–388.
Winakor, G., Kim, C. J. and Wolins, L. 1980. Fabric hand: Tactile sensory assessment. Textile Research Journal, 50, 601–610.
Yamaguchi, S. 1967. The synergistic effect of monosodium glutamate and disodium 5′ inosinate. Journal of Food Science 32, 473–477.
Author information
Authors and Affiliations
Appendices
Appendix 1: Derivation of Thurstonian-Scale Values for the 9-Point Scale
The choice of adjective words for the 9-point hedonic scale is a good example of how carefully a scale can be constructed. The long-standing track record of this tool demonstrates its utility and wide applicability in consumer testing. However, few sensory practitioners actually know how the adjectives were found and what criteria were brought to bear in selecting these descriptors (slightly, moderately, very much, and extremely like/dislike) from a larger pool of possible words. The goal of this section is to provide a shorthand description of the criteria and mathematical method used to select the words for this scale.
One concern was the degree to which the term had consensual meaning in the population. The most serious concern was when a candidate word had an ambiguous or double meaning across the population. For example, the word “average” suggests an intermediate response to some people, but in the original study by Jones and Thurstone (1955) there were a group of people who equated it with “like moderately” perhaps since an average product in those days was one that people would like. These days, one can think of negative connotations to the word “average” as in “he was only an average student.” Other ambiguous or bimodal terms were “like not so much” and “like not so well.” Ideally, a term should have low variability in meaning, i.e., a low standard deviation, no bimodality, and little skew. Part of this concern with the normality of the distribution of psychological reactions to a word was the fact that the developers used Thurstone’s model for categorical judgment as a means of measuring the psychological-scale values for the words. This model is at its most simple form when the items to be scaled show normal distributions of equal variance.
Which leads us to the numerical method. Jones and Thurstone modified a procedure used earlier by Edwards (1952). A description of the process and results can be found in the paper “Development of a scale for measuring soldiers’ food preferences” by Jones et al. (1955). Fifty-one words and phrases formed the candidate list based on a pilot study with 900 soldiers chosen to be a representative sample of enlisted personnel. Each phrase was presented on a form with a rating scale from –4 to +4 with a check off format. In other words, each person read each phrase and assigned in an integer value from –4 to +4 (including zero as an option). This method would seem to presume that these integers were themselves an interval scale of psychological magnitude, an assumption that to our knowledge has never been questioned.
Of course, the mean scale values could now be assigned on a simple and direct basis, but the Thurstonian methods do not use the raw numbers as the scale, but transform them to use standard deviations as the units of measurement. So the scale needs to be converted to Z-score values. The exact steps are as follows:
-
1.
Accumulate frequency counts for all the tested words across the –4 to + 4 scale. Think of these categories as little “buckets” into which judgments have been tossed.
-
2.
Find the marginal proportions each value from –4 to +4 (summed across all test items). Add up the proportions from lowest to highest to get a cumulative proportion for each bucket.
-
3.
Convert these proportions to z-scores in order to re-scale the boundaries for the original –4 to +4 cutoffs. Let us call these the “category z-values” for each of the “buckets.” The top bucket will have a value of 100%, so it will have no z-score (undefined/infinite).
-
4.
Next examine each individual item. Sum its individual proportions across the categories, from where it is first used until 100% of the responses are accumulated.
-
5.
Convert the proportions for the item to Z-scores. Alternatively, you can plot these proportions on “cumulative probability paper,” a graphing format that marks the ordinate in equal standard deviations units according to the cumulative normal distribution. Either of these methods will tend to make the cumulative S-shaped curve for the item into a straight line. The X-axis value for each point is the “category z-value” for that bucket.
-
6.
Fit a line to the data and interpolate the 50% point on the X-axis (the re-scaled category boundary estimates). These interpolated values for the median for each item now form the new scale values for the items.
An example of this interpolation is shown in Fig. 7.7. Three of the phrases used in the original scaling study of Jones and Thurstone (1955) are pictured, three that were not actually chosen but for which we have approximate proportions and z-scores from their figures. The small vertical arrows on the X-axis show the scale values for the original categories of –4 to +3 (+4 has cumulative proportion of 100% and thus the z-score is infinite). Table 7.1 gives the values and proportions for each phrase and the original categories. The dashed vertical lines dropped from the intersection at the zero z-score (50% point) show the approximate mean values interpolated on the X-axis (i.e., about –1.1 for “do not care for it” and about +2.1 for “preferred.”). Note that “preferred” and “don’t care for it” have a linear fit and steep slope, suggesting a normal distribution and low standard deviation. In contrast, “highly unfavorable” has a lower slope and some curvilinearity, indicative of higher variability, skew, and/or pockets of disagreement about the connotation of this term.
The actual scale values for the original adjectives are shown in Table 7.2, as found with a soldier population circa 1950 (Jones et al., 1955). You may note that the words are not equally spaced, and that the “slightly” values are closer to the neutral point than some of the other intervals, and the extreme points are a little farther out. This bears a good deal of similarity to the intervals found with the LAM scale as shown in the column where the LAM values are re-scaled to the same range as the 9-point Thurstonian Values.
Appendix 2: Construction of Labeled Magnitude Scales
There are two primary methods for constructing labeled magnitude scales and they are very similar. Both require magnitude estimates from the participants to scale the word phrases used on the lines. In one case, just the word phrases are scaled, and in the second method, the word phrases are scaled among a list of common everyday experiences or sensations that most people are familiar with. The values obtained by the simple scaling of just the words will depend upon the words that are chosen, and extremely high examples (e.g., greatest imaginable liking for any experience) will tend to compress the values of the interior phrases (Cardello et al., 2008). Whether this kind of context effect will occur for the more general method of scaling amongst common experiences is not known. But the use of a broad frame of reference could be a stabilizing factor.
Here is an example of the instructions given to subjects in construction of a labeled affective magnitude scale. Note that for hedonics, which are a bipolar continuum with a neutral point, it is necessary to collect a tone or valence (plus or minus) value as well as the overall “intensity” rating.
Next to each word label a response area appeared similar to this:
Phrase: Tone: + – 0 How much:
Like extremely __________ _______
Words or phrases are presented in random order. After reading a word they must decide whether the word is positive, negative or neutral and place the corresponding symbol on the first line. If the hedonic tone was not a neutral one (zero value), they are instructed to give a numerical estimate using modulus-free magnitude estimation. The following is a sample of the instructions taken from Cardello et al. (2008):
After having determined whether the phrase is positive or negative or neutral and writing the appropriate symbol (+, –, 0) on the first line, you will then assess the strength or magnitude of the liking or disliking reflected by the phrase. You will do this by placing a number on the second blank line (under “How Much”). For the first phrase that you rate, you can write any number you want on the line. We suggest you do not use a small number for this word/phrase. The reason for this is that subsequent words/phrases may reflect much lower levels of liking or disliking. Aside from this restriction you can use any numbers you want. For each subsequent word/phrase your numerical judgment should be made proportionally and in comparison to the first number. That is, if you assigned the number 800 to index the strength of the liking/disliking denoted by the first word/phrase and the strength of liking/disliking denoted by the second word/phrase were twice as great, you would assign the number 1,600. If it were three times as great you would assign the number 2,400, etc. Similarly, if the second word/phrase denoted only 1/10 the magnitude of liking as the first, you would assign it the number 80 and so forth. If any word/phrase is judged to be “neutral” (zero (0) on the first line) it should also be given a zero for its magnitude rating.
In the cased of Cardello et al. (2008), positive and negative word labels were analyzed separately. Raw magnitude estimates were equalized for scale range using the procedure of Lane et al. (1961). All positive and negative magnitude estimates for a given subject were multiplied by an individual scaling factor. This factor was equal to the ratio of the grand geometric mean (of the absolute value of all nonzero ratings) across all subjects divided by the geometric mean for that subject. The geometric mean magnitude estimates for each phrase were then calculated based on this range-equated data. These means became the distance from the zero point for placement of the phrases along the scale, usually accompanied by a short cross-hatch mark at that point.
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Lawless, H., Heymann, H. (2010). Scaling. In: Sensory Evaluation of Food. Food Science Text Series. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6488-5_7
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6488-5_7
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6487-8
Online ISBN: 978-1-4419-6488-5
eBook Packages: Chemistry and Materials ScienceChemistry and Material Science (R0)