This article is part of a symposium on Dustin Stokes’s book “Thinking and Perceiving” (Routledge 2021), edited by Regina Fabry and Sascha Benjamin Fink.
Dustin Stokes’s book sheds new light on processes that constitute the human mind, the way they interact, and enable us to make contact with the world. He approaches the relation between thought and perception by dealing with such questions as: whether vision is modular, informationally encapsulated, and thus cognitively impenetrable or rather the opposite – whether it is malleable and sensitive to further improvements by cognitive states. Stokes supports the latter by appealing to empirical research on perceptual expertise (Bukach et al., 2006; Drew et al., 2013; Kundel et al., 2007; Scott, 2011) and his own investigations of this phenomenon (Ivy et al., 2021, 2023; Stokes, 2021a)
Expertise involves experience and training specific to a relevant domain such as radiology, ornithology, or fingerprint examining. Proponents of the modular and malleable architectures of the mind offer different explanations of phenomena involved in perceptual expertise, viz. object categorization and identification. Modularists interpret it as the capacity for visual object recognition – a post-perceptual cognitive process of late vision. For Stokes, perceptual expertise, including expert recognition, is a genuinely perceptual phenomenon, which can be sensitive to cognitive influences and as such exhibits both a perceptual and cognitive achievement (Stokes, 2021a, 2021b).
Capacities for object recognition are capacities to either categorize a perceived object as belonging to a conceptual category or to identify it as being a specific individual (Abid, 2021). The two processes can be seen as different stages of the recognition of an item, with categorization being temporally more efficient and occurring before identification. There is wide agreement that at least some of these processes operate automatically (Dell’acqua & Job, 1998; Mroczko-Wąsowicz & Anaya, 2022; Serre et al., 2007). A process is automatic, as opposed to voluntary or intentional, when it is not under the subject’s conscious control (Moors & De Houwer, 2006; Papineau, 2013, p. 177). Empirical studies have determined that recognition can occur as quickly as within 200-300 ms after stimulus detection (DiCarlo et al., 2012; Mohr et al., 2018). This short time frame excludes any possibility for subjects to exercise conscious control over the unfolding of the process.
Stokes claims that recognition by experts differs from that by novices. This is because expert perception is cognitively enhanced as a result of their domain-specific concept-rich cognitive learning and unconscious impact of perceptual learning (visual memories) on attention. As a result, they do not only know better but see better. The empirical findings Stokes (2021b, p. 193) refers to suggest that in expert recognition, the memorized visual information affects saccadic patterns so that experts are less distracted by features of a stimulus that have no consequence for recognizing a specific category of objects. Stokes discusses eye-tracking studies which have shown that experts such as radiologists make fewer but longer saccades and fixate less on features of a radiogram that are diagnostically irrelevant for identifying an abnormality (Drew et al., 2013; Kundel et al., 2007). In consequence, decreased visual distraction leads to enhanced sensitivity to category-specific information. Experts develop visual object representations of the relevant kind more rapidly and see relevant objects more accurately than laypersons. This is why it is concluded that experts display differences in eye movement patterns, enjoy an advantage for a specific category of objects in visual short-term memory, and exhibit automatic successful performance (Stokes, 2023).
In a similar vein, Stokes and colleagues argue in their own experimental work that visual expertise is more than meets the eye (Ivy et al., 2023). They examined holistic visual processing (HVP) – a behavioral marker of a visual-expert search strategy, which shows that experts are able to process information from a larger region of space with a more focused gaze pattern. HVP turns out to be transferable across domains but supporting reduced search time and greater accuracy only within an expert’s particular domain of expertise. This led the researchers to conclude that visual search success does not depend exclusively on the occurrence of HVP, but also on the explicit knowledge of an expert’s domain, including their knowledge how to search and where to search.
The seeming automaticity of object recognition may be an outcome of the interplay between these factors, because recognition is a complex phenomenon consisting of perception, concepts, and associated perceptual memories of categories or particular items. Commenting on behavioral and phenomenological aspects of expert recognitional capacity, Stokes emphasizes its instantaneousness:
The first strand of behavioural evidence concerns “automaticity”. Not only do experts more rapidly perform categorizations or other forms of recognition (…), but they do so in ways that they often cannot carefully describe. (…) expert radiologists often report a sense that there is something anomalous in a medical image before they can point to the anomaly. (…) [they] report that the relevant object or feature is “highly salient” or just “pops out”. (…) These reports and the speed of performance suggest that the expert expends little or no deliberate cognitive effort and that her performance is non-inferential. (Stokes, 2021b, p. 152)
On the other side, theorists aligning toward the modularist view (Fodor, 1983; Marr, 1982; Pylyshyn, 1999) may explain visual object recognition differently. This explanation also involves some aspects of recognition’s automaticity, but this is limited to the early sensory component of recognition. Accordingly, visual recognition includes automatic early sensory processing delivering new visual representations of basic sensory properties for purposes of later categorization and identification. This means that the new visual representations are compared with stored visual memories. In consequence, feedback from visual memories helps to search the newly formed sensory representations for visual matches. Even quick comparisons, which lead to finding recognitional matches between novel visual representations and those stored in memory, are large enough to accommodate attention allocation and to be classified by modularists as late vision or post-perceptual processes (Raftopoulos, 2011).
Modularists acknowledge that late vision is cognitively penetrated and involves modulation of processing by cognitively driven attention (Pylyshyn, 2003). By classifying visual object recognition as a late vision phenomenon, they do not mean that object recognition enhanced by perceptual expertise is an additional indicator of the malleability of the mind. For modularists, perceptual expertise does not make object recognition more sensitive to cognitive influences than it is already the case for regular (non-expert) recognition. On their view, perceptual expertise makes the low-level component of object recognition – early visual sensory processing – more fluent and facilitates perceptual discriminations of fine-grained subordinate categories. Modularists emphasize that outputs of early vision result from unconscious, mandatory, fast, and automatic modular processes. As such, early visual representations do not alone determine late vision phenomena like object recognition. Interestingly, although automaticity in the modularist view is meant in the strong sense of operating pre-attentively, it can be considered to be a graded matter, similarly to modularity itself (Deroy, 2014; Drayson, 2017; Fodor, 1983; Mroczko-Wąsowicz, 2022).
In this commentary, I have examined how object recognition and its automaticity may be approached from the modularist and malleabilist perspectives. Although some relevant questions regarding the status of late vision remain open for further research, it seems that non-modular approaches to object recognition may enjoy more advantages. This is because such approaches are capable of accommodating twofold explanations of the rapidity of expert recognition: (1) bottom-up explanation compatible with the modularist suggestion concerning the extraordinary fluency of discriminatory processes in the early sensory component of object recognition, and (2) the malleabilist proposal concerning the top-down penetrating impact of expertise-related higher cognitive states on perceptual processing and perceptual phenomenology.
Even if one agrees with modularists that the influence of expertise on object recognition can be interpreted as cognitive effects on post-perceptual processes, this does not have to be disruptive to Stokes’s proposal. It seems his proposal is capable of acknowledging that cognitive contact with the world can have an effect not only on perceptual contact with the world but also on related hybrid phenomena such as recognition (Stokes, 2021b, pp. 3–7). The main thesis of the book, namely the cognitive improvement of perception in cases of top-down effects of expertise on recognition, remains untouched by the pertinent warning by Firestone & Scholl (2016, pp. 15–17) who suggest not to confuse two integral but separable constituents of recognition: perception and memory. Carefully distinguishing these constituents is essential, because cognitive effects on back-end memory have no implications for front-end perception. Thinking and Perceiving avoids this pitfall.
Why does explaining expert recognition matter? Providing a satisfactory account of the smoothness of perceptual recognition of objects for which we serve as experts would not only inform debates in philosophy of mind/perception and the cognitive sciences but also furnish us with a better understanding of the process that is ubiquitous in ordinary life and that is central in efficient interactions with objects in our environment (see Mroczko-Wąsowicz & Grush, 2023). The reason is that we all are experts in perceiving and recognizing some kinds of objects (Mroczko-Wąsowicz et al., 2023).
Acknowledgments
The work was supported by the National Science Centre, Poland (grant 2019/35/B/HS1/04386).