Abstract
Category learning is a fundamental process in human cognition that spans the senses. However, much still remains unknown about the mechanisms supporting learning in different modalities. In the current study, we directly compared auditory and visual category learning in the same individuals. Thirty participants (22 F; 18–32 years old) completed two unidimensional rule-based category learning tasks in a single day – one with auditory stimuli and another with visual stimuli. We replicated the results in a second experiment with a larger online sample (N = 99, 45 F, 18–35 years old). The categories were identically structured in the two modalities to facilitate comparison. We compared categorization accuracy, decision processes as assessed through drift-diffusion models, and the generalizability of resulting category representation through a generalization test. We found that individuals learned auditory and visual categories to similar extents and that accuracies were highly correlated across the two tasks. Participants had similar evidence accumulation rates in later learning, but early on had slower rates for visual than auditory learning. Participants also demonstrated differences in the decision thresholds across modalities. Participants had more categorical generalizable representations for visual than auditory categories. These results suggest that some modality-general cognitive processes support category learning but also suggest that the modality of the stimuli may also affect category learning behavior and outcomes.
This is a preview of subscription content, access via your institution.




Notes
- 1.
We allowed the evidence accumulation and decision threshold parameters to vary flexibly across learning. As a supplementary analysis, we also compared this flexible model of Paulon et al. (2020) with other sub-cases in which one parameter is fixed but the other is allowed to vary to understand the relative importance of each parameter for auditory and visual learning. We compared the models using the Watanabe-Akaike information criterion (WAIC), a popular approach for assessing predictive performances of competing methods. WAIC is obtained by computing the log point-wise predictive density and then adding a correction reflecting the effective number of degrees of freedom to prevent from overfitting (see Gelman, Hwang, & Vehtari, 2014, for details). The results of this comparison indicated that the model that best describes the data for both auditory and visual tasks is a fully flexible one, allowing both evidence accumulation and decision threshold parameters to vary across learning (WAIC for Visual: flexible: −6690.71, constant accumulation: −7247.56, constant threshold: −7084.33; WAIC for Auditory: flexible: −4404.38, constant accumulation: −4443.75, constant threshold: −4590.62). These model comparisons also help pinpoint which parameter is most relevant for a particular modality. In the visual task, the flexibility for the accumulation parameter is more important than for the threshold parameter, as measured by the relative increase in WAIC. The converse is true for the auditory task. This is consistent with our results and interpretation of the accumulation and drift results from the fully flexible model, discussed in the manuscript.
References
Anwyl-Irvine, A., Massonnié, J., Flitton, A., Kirkham, N., & Evershed, J. (2019). Gorilla in our Midst: An online behavioral experiment builder. Behavior Research Methods, 438242. https://doi.org/10.3758/s13428-019-01237-x
Ashby, F. G. (1992a). Multidimensional models of categorization In F. G. Ashby (Ed.), Multidimensional Models of Perception and Cognition (pp. 449–483). Lawrence Erlbaum. Retrieved from http://psycnet.apa.org/psycinfo/1992-98026-016
Ashby, F. G. (1992b). Multivariate Probability Distributions. In F. G. Ashby (Ed.), Multidimensional Models of Perception and Cognition (pp. 1–34). Lawrence Erlbaum.
Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). A neuropsychological theory of multiple systems in category learning. Psychological Review, 105(3), 442–481. https://doi.org/10.1037/0033-295x.105.3.442
Best, C. T. (1995). A direct realist view of cross-language speech perception. In W. Strange (Ed.), Speech Perception and Linguistic Experience: Issues in Cross-Language Research (pp.171-204). Timonium, MD: York Press.
Bogacz, R., Wagenmakers, E.-J., Forstmann, B. U., & Nieuwenhuis, S. (2010). The neural basis of the speed–accuracy tradeoff. Trends in Neurosciences, 33(1), 10–16. https://doi.org/10.1016/j.tins.2009.09.002
Brashears, B. N., & Minda, J. P. (2020). The effects of feature verbalizability on category learning. In S. Denison, M. Mack, Y. Xu, & B. C. Armstrong (Eds.), Proceedings of the 42nd Annual Conference on the Cognitive Science Society (pp. 655–660). Austin, TX: Cognitive Science Society.
Crittenden, B. M., & Duncan, J. (2014). Task difficulty manipulation reveals multiple demand activity but no frontal lobe hierarchy. Cerebral Cortex, 24(2), 532–540. https://doi.org/10.1093/cercor/bhs333
Duncan, J., & Owen, A. M. (2000). Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends in Neurosciences, 23(10), 475–483. https://doi.org/10.1016/s0166-2236(00)01633-7
Fedorenko, E., Duncan, J., & Kanwisher, N. (2013). Broad domain generality in focal regions of frontal and parietal cortex. Proceedings of the National Academy of Sciences, 110(41), 16616–16621. https://doi.org/10.1073/pnas.1315235110
Francis, A. L., & Nusbaum, H. C. (2002). Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology: Human Perception and Performance, 28(2), 349–366. https://doi.org/10.1037//0096-1523.28.2.349
Garner, W. R. (1974). The processing of information and structure. Hillsdale, NJ: Erlbaum.
Gelman, A., Hwang, J., & Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6), 997–1016. https://doi.org/10.1007/s11222-013-9416-2
Goldstone, R. L. (1994). Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123(2), 178–200.
Goudbeek, M., Swingley, D., & Smits, R. (2009). Supervised and unsupervised learning of multidimensional acoustic categories. Journal of Experimental Psychology: Human Perception and Performance, 35(6), 1913–1933. https://doi.org/10.1037/a0015781
Heffner, C. C., Idsardi, W. J., & Newman, R. S. (2019). Constraints on learning disjunctive, unidimensional auditory and phonetic categories. Attention, Perception & Psychophysics, 81(4), 958–980. https://doi.org/10.3758/s13414-019-01683-x
Lehnert, G., & Zimmer, H. D. (2006). Auditory and visual spatial working memory. Memory & Cognition, 34(5), 1080–1090. https://doi.org/10.3758/bf03193254
Lesaffre, E., Rizopoulos, D., & Tsonaka, R. (2007). The logistic transform for bounded outcome scores. Biostatistics, 8(1), 72–85. https://doi.org/10.1093/biostatistics/kxj034
Logan, J. S., Lively, S. E., & Pisoni, D. B. (1991). Training Japanese listeners to identify English /r/ and /l/: A first report. Journal of the Acoustical Society of America, 89(2), 874–886. https://doi.org/10.1016/j.biotechadv.2011.08.021.secreted
Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: A network model of category learning. Psychological Review, 111, 309–332. https://doi.org/10.1037/0033-295x.111.2.309
Maddox, W. T., & Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization. Perception & Psychophysics, 53(1), 49–70. https://doi.org/10.3758/bf03211715
Maddox, W. T., Ashby, F. G., & Bohil, C. J. (2003). Delayed feedback effects on rule-based and information-integration category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(4), 650–662. https://doi.org/10.1037/0278-7393.29.4.650
Maddox, W. T., Chandrasekaran, B., Smayda, K., & Yi, H.-G. (2013). Dual systems of speech category learning across the lifespan. Psychology and Aging, 28(4), 1042–1056. https://doi.org/10.1037/a0034969
McClelland, J. L., Fiez, J. A., & McCandliss, B. D. (2002). Teaching the /r/–/l/ discrimination to Japanese adults: Behavioral and neural aspects. Physiology & Behavior, 77, 657–662. Retrieved from file:///Users/devans/Documents/Papers2/Articles/2003/Unknown/2003 R8705.pdf%5Cnpapers2://publication/uuid/D9D9D273-E580-4543-BB39-F6DA81E6B21F
McNab, F., & Klingberg, T. (2008). Prefrontal cortex and basal ganglia control access to working memory. Nature Neuroscience, 11(1), 103–107. https://doi.org/10.1038/nn2024
Myers, E. B. (2014). Emergence of category-level sensitivities in non-native speech sound learning. Frontiers in Neuroscience, 8, 1–11. https://doi.org/10.3389/fnins.2014.00238
Noppeney, U., Ostwald, D., & Werner, S. (2010). Perceptual decisions formed by accumulation of audiovisual evidence in prefrontal cortex. The Journal of Neuroscience, 30(21), 7434–7446. https://doi.org/10.1523/jneurosci.0455-10.2010
Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115(1), 39–57.
Nosofsky, R. M., & Palmeri, T. J. (1997). An exemplar-based random walk model of speeded classification. Psychological Review, 104(2), 266–300. https://doi.org/10.1037/0033-295x.104.2.266
Nystrom, N. A., Levine, M. J., Roskies, R. Z., & Scott, J. R. (2015). Bridges: A uniquely flexible HPC resource for new communities and data analytics. In Proceedings of the 2015 Annual Conference on Extreme Science and Engineering Discovery Environment (St. Louis, MO, July 26–30, 2015). XSEDE15. ACM, New York, NY. https://doi.org/10.1145/2792745.2792775.
Paulon, G., Llanos, F., Chandrasekaran, B., & Sarkar, A. (2020). Bayesian semiparametric longitudinal drift-diffusion mixed models for tone learning in adults. Journal of the American Statistical Association, 1–14. https://doi.org/10.1080/01621459.2020.1801448
Rabi, R., & Minda, J. P. (2014). Rule-based category learning in children: The role of age and executive functioning. PLoS ONE, 9(1), e85316. https://doi.org/10.1371/journal.pone.0085316
Ratcliff R, (1978) A theory of memory retrieval.. Psychological Review 85 (2):59-108
Roark, C. L., & Holt, L. L. (2019). Perceptual dimensions influence auditory category learning. Attention, Perception & Psychophysics, 81(4), 912–926. https://doi.org/10.3758/s13414-019-01688-6
Scharinger, M., Henry, M. J., & Obleser, J. (2013). Prior experience with negative spectral correlations promotes information integration during auditory category learning. Memory & Cognition, 41(5), 752–768. https://doi.org/10.3758/s13421-013-0294-9
Schönwiesner, M., & Zatorre, R. J. (2009). Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Proceedings of the National Academy of Sciences, 106(34), 14611–14616. https://doi.org/10.1073/pnas.0907682106
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Smith, P. L., & Vickers, D. (1988). The accumulator model of two-choice discrimination. Journal of Mathematical Psychology, 32(2), 135–168. https://doi.org/10.1016/0022-2496(88)90043-0
Towns, J., Cockerill, T., Dahan, M., Foster, I., Gaither, K., Grimshaw, A., … Wilkens-Diehr, N. (2014). XSEDE: Accelerating scientific discovery. Computing in Science & Engineering, 16(5):62-74. https://doi.org/10.1109/MCSE.2014.80.
Visscher, K. M., Kaplan, E., Kahana, M. J., & Sekuler, R. (2007). Auditory short-term memory behaves like visual short-term memory. PLoS Biology, 5(3), e56. https://doi.org/10.1371/journal.pbio.0050056
Wickens, T. D. (1982). Models for behavior: stochastic processes in psychology. San Francisco, CA: W. H. Freeman.
Yi, H.-G., & Chandrasekaran, B. (2016). Auditory categories with separable decision boundaries are learned faster with full feedback than with minimal feedback. The Journal of the Acoustical Society of America, 140(2), 1332–1335. https://doi.org/10.1121/1.4961163
Yi, H.-G., Maddox, W. T., Mumford, J. A., & Chandrasekaran, B. (2014). The role of corticostriatal systems in speech category learning. Cerebral Cortex, 1–12. https://doi.org/10.1093/cercor/bhu236
Zettersten, M., & Lupyan, G. (2020). Finding categories through words: more nameable features improve category learning. Cognition, 196, 539–547. https://doi.org/10.17605/osf.io/uz2m9
Zvyagintsev, M., Clemens, B., Chechko, N., Mathiak, K. A., Sack, A. T., & Mathiak, K. (2013). Brain networks underlying mental imagery of auditory and visual information. European Journal of Neuroscience, 37(9), 1421–1434. https://doi.org/10.1111/ejn.12140
Author note
This work was supported by grants from the National Institute On Deafness and Other Communication Disorders of the National Institutes of Health (R01DC013315A1 to B.C. and F32DC018979 to C.L.R.) and the National Science Foundation (NSF-1953712 to Co-PIs B.C. and A.S.). This work used the Extreme Science and Engineering Discovery Environment (XSEDE, Towns et al., 2014), which is supported by NSF (ACI-1548562). Specifically, it used the Bridges system (Nystrom, Levine, Roskies, & Scott, 2015), which is supported by NSF (ACI-1445606), at the Pittsburgh Supercomputing Center (PSC).
Open practices statement
The data and materials for the experiment and replication are available at osf.io/msnq2 and neither the experiment nor replication were preregistered.
Author information
Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Decision-bound modeling methods
To address the post-hoc question of whether participants used more rule-based processing during visual generalization than auditory generalization, we applied a series of decision-bound models to participants’ response data from the test blocks of Experiment 1 and Experiment 2. As a comparison, we also applied the models to participants’ response data from the final blocks of Experiment 1 (block 6) and Experiment 2 (block 5), where participants still received feedback.
Rationale
Decision-bound models (Ashby, 1992a; Maddox & Ashby, 1993) give information about the strategies participants use to separate perceptual stimuli into categories. We used decision-bound models to understand how individuals use rule-based processes during auditory and visual rule-based categorization and generalization.
Decision-bound models assume that participants separate stimuli into categories with a decision boundary. This boundary can be based on a single dimension or multiple dimensions. Additionally, participants can separate the categories using rules, which are thought to reflect overt decisional processes and hypothesis testing, or they can separate the categories with a boundary that reflects more implicit, procedural processes (Ashby, Alfonso-Reese, Turken, & Waldron, 1998). We fit a series of rule-based models that assume that participants separate the categories based either on the dimension that is relevant for categorization or the dimension that is irrelevant. We also fit an integration model that assumes that participants use both dimensions to separate the categories, in a manner that reflects implicit, rather than rule-based processing. Finally, we fit a random responder model that assumes that participants are randomly guessing.
Rule-based models
The rule-based models assume that participants draw a decision boundary along one of the two stimulus dimensions. We fit separate models assuming participants used a rule-based strategy along the category-relevant and category-irrelevant dimensions. The rule-based models have two free parameters – the location of the decision boundary along the dimension and a perceptual/criterial noise parameter. Rule-based models assume that participants are using hypothesis testing and overt rules to separate the stimuli into categories. For instance, while learning the auditory categories, a specific rule a participant could use would be to categorize all stimuli that have temporal modulation faster than 8 Hz into Category B and all stimuli slower than 8 Hz into Category A. A rule-based strategy is the optimal strategy to separate the categories in the current experiments.
Integration model
In contrast to rule-based models, the integration model assumes that participants use both stimulus dimensions to separate the categories. Integration strategies are thought to reflect more implicit, procedural learning processes, separating categories by a boundary that is not easily verbalizable (Ashby et al., 1998). The integration model assumes a linear decision boundary and has three free parameters: the slope and intercept of the decision boundary and a perceptual/criterial noise parameter. If a participant is using an integration strategy, it means they are using both dimensions to separate the categories, which is suboptimal in this case.
Random responder model
The random responder model assumes that participants guess on each trial.
Model fitting and selection
For each participant (30 in Experiment 1, 99 in Experiment 2) and each block (final categorization block, generalization test block), we fit rule-based, integration, and random responder models. For each model type, the model parameters were estimated using a maximum likelihood procedure (Ashby, 1992b; Wickens, 1982). Model selection used the Bayesian Information Criterion (BIC): BIC = r*lnN - 2lnL, where r is the number of free parameters, N is the number of trials in a given block for a given subject, and L is the likelihood of the model given the data (Schwarz, 1978). The BIC allows for comparison of model fits because it penalizes models for extra free parameters such that the smaller the BIC, the closer the model is to the “true” model.
The model fitting and selection procedure produces the best-fitting model for each participant and each block (final categorization block, generalization test block). We grouped the models by whether they reflected rule-based or integration processing. No participants were best fit by the random responder model. Below, we report the percentage of participants best fit by the rule-based models.
Decision-bound modeling results
In the final block of training, there is no evidence that participants used more rule-based strategies in the auditory or visual task. There were no significant differences in strategy use between auditory and visual tasks (Experiment 1: McNemar’s χ2= 2.78, p = 0.096; Experiment 2: McNemar’s χ2 = 1.67, p = 0.20). In the final block of Experiment 1, 73% (22/30) of participants in the auditory task and 90% (27/30) of participants in the visual task used rule-based strategies. In Experiment 2, 89% (88/99) of participants in the auditory task and 94% (93/99) of participants in the visual task used rule-based strategies.
In contrast, during the generalization test, significantly more participants used rule-based strategies in the visual task than in the auditory task (Experiment 1: McNemar’s χ2= 4.0, p = 0.046; Experiment 2: McNemar’s χ2= 13.37, p = 0.00026). In the generalization block of Experiment 1, 83% (25/30) of participants in the auditory task and 97% (29/30) of participants in the visual task used rule-based strategies. In Experiment 2, 77% (76/99) of participants in the auditory task and 96% (95/99) of participants in the visual task used rule-based strategies.
Overall, these results demonstrate that there are no differences in auditory and visual rule-based processing during categorization, but during generalization, when there is no longer any feedback, more participants rely on rule-based processing for visual than auditory stimuli. These results also align with our other measures of performance in the generalization test. While there were not differences in overall accuracy (in the highly powered Experiment 2), there were differences in the pattern of responses. Visual category representations were more categorical than auditory category representations and participants found it easier to consistently apply a unidimensional rule to separate the visual categories even in the absence of feedback.
Rights and permissions
About this article
Cite this article
Roark, C.L., Paulon, G., Sarkar, A. et al. Comparing perceptual category learning across modalities in the same individuals. Psychon Bull Rev (2021). https://doi.org/10.3758/s13423-021-01878-0
Accepted:
Published:
Keywords
- Category learning
- Modality effects
- Audition
- Vision