1 Introduction

In this paper, we want to tackle the Molyneux question thoroughly, by addressing it in terms of both ordinary perception, the perception of concrete objects out there, and pictorial perception, the perception one has when facing pictures understood as such, i.e., in their figurative value: if a congenitally blind person recovered sight, could she recognize visually the 3D shapes she already recognized tactilely, both when such shapes are given to her directly and when they are given to her pictorially, i.e., as depicted shapes? Philosophers have been trying to tackle the question by means of a priori reflections on the relationship between sense modalities such as touch and vision. Yet it is perhaps time to face the question by means of a posteriori research. Indeed, we want to claim that empirical evidence suggests that the question can be positively answered both in the case of ordinary perception and in the case of pictorial perception. For in the former case, such evidence shows that perception of 3D shapes is supramodal; namely, it can be equivalently achieved in different sense modalities, notably touch and vision, independently of the sensory input such shapes are accessed. While in the latter case, such evidence shows that, as regards both sight and touch, one can satisfy the condition for depicted shapes to be grasped by that perceiver in the picture’s subject, i.e., what the picture presents, although clearly, just as that subject is typically not located where the perceiver is, such shapes are typically not instantiated where that perceiver is (just as Mona Lisa herself, the subject of Leonardo’s La Gioconda, Mona Lisa’s face is not at the Louvre, where a spectator enjoys Leonardo’s masterpiece). This condition states that the picture’s vehicle, i.e., the typically 2D physical basis of a picture, is enriched by adding to its properties the 3D grouping properties that allow for a figure/ground segmentation to be performed in that vehicle’s elements. In a nutshell, since the vehicle’s grouping properties are perceived supramodally, the depicted 3D shapes are also modally-indifferently grasped. In Section 1, we will address the case of ordinary perception; in Section 2, we will focus on pictorial perception.

2 A supramodal account of ordinary perception

The Molyneux question asks whether an individual who has recently gained the ability to see could promptly recognize three-dimensional objects such as cubes and spheres, which were previously familiar only through touch, solely visually, i.e., by solely using their sense of sight (for discussion, see, for example, Degenaar et al., 2024; Ferretti & Glenney, 2020; Matthen & Cohen, 2020).

As regards 3D objects, a positive answer to this question is encouraged by neuroscience findings indicating a robust functional equivalence, in the sighted, between vision and touch (for discussion, see Calzavarini & Voltolini, 2023). Granted, touch acquires information sequentially from fingertips, whilst vision processes it in a parallel and more holistic manner. Despite these differences, the outcome – behavioral performance in object recognition tasks – is surprisingly comparable across both modalities (for review, see Lacey & Sathian, 2014). Critically, behavioral findings suggest that tactile recognition, much like visual recognition, has a perspectival nature, something similar to a ‘point of view’, being influenced by the object’s orientation relative to the observer. In a seminal study, Newell et al. (2001) have shown that when neurotypical subjects attempt to recognize unfamiliar objects assembled from Lego blocks, their performance deteriorates significantly if the object is rotated 180° along any axis. Similarly, both tactile and visual presentations of objects suggest that there is an optimal orientation or ‘canonical perspective’ that facilitates recognition (Woods et al., 2008). Functional equivalence between vision and touch is also supported by findings that individuals who learn to identify new objects visually can often apply this recognition to tactile experiences, and the reverse is also true (Lacey et al., 2007; Lawson, 2009; Norman et al., 2004). Additionally, the way objects are perceived as similar is consistent across these sensory modalities (Cooke et al., 2007; Gaissert et al., 2010).1

Furthermore, a growing body of experimental research using diverse methodologies such as fMRI, TMS, and studies on individuals with brain damage points to significant neural overlap between the visual and tactile modalities, reinforcing the thesis of their functional equivalence (for reviews, Lacey & Sathian, 2014; Ricciardi et al., 2014). The convergence of visual and tactile input appears to peak within the so-called lateral occipital complex (LOC), located in the ventral visual pathway between the occipital and inferior temporal gyri.2 Traditionally viewed as a structure dedicated to visual shape recognition (Grill-Spector et al., 2001), subsequent neuroimaging research has revealed that the LOC also engages when shapes are explored tactilely in individuals with normal vision (Amedi et al., 2002, 2007; James et al., 2002). For example, James et al. (2002) have shown that LOC is active when normal individuals are seeing and touching objects. Similarly, Amedi et al. (2002, 2007) have shown that recognizing objects’ shapes (vs. textures) in both visual and haptic modalities increases activity in LOC. Moreover, LOC appears to show increased (‘multisensory’) responses when visual and tactile stimuli are presented in combination (Kim & James, 2010).

In the early univariate fMRI studies, the amount of neural activation (i.e., BOLD contrast) was the only criterion to demonstrate the intrinsic multimodality of the neural structures involved in 3D shape processing (e.g., Amedi et al., 2002). A potential objection is that this methodology is only an indirect test of multimodality but is not revealing about the format of neural representations (e.g., Erdogan et al., 2016; Kiefer et al., 2023). Based on these data, the possibility is still open that LOC implements independent modality-specific representations for both visual and haptic shapes. Nevertheless, more recent studies in the multisensory field have made use of multivariate neuroimaging, which is standardly supposed to be informative about the format of neural representations at the neural population level (see, for instance, Calzavarini, 2024; Heinen et al., 2024; Kriegeskorte & Douglas, 2018; Ricciardi & Pietrini, 2024). Techniques such as cross-modal multivoxel pattern classification (MVPC) and cross-modal representational similarity analysis (RSA) have been used, respectively, to show that object shapes can be multisensorily decoded from activation in LOC (e.g., Pietrini et al., 2004) and elicit similar patterns of neural activity across vision and touch (e.g., Erdogan et al., 2016). Results of these multivariate studies converge in suggesting that representations of objects’ shapes [in LOC] are multisensory (i.e., with a significant degree of modality independence), implementing a “unique code [...] regardless of whether the sensory modality is vision or touch” (Erdogan et al., 2016, p. 18).

Another potential objection is that activations in LOC during haptic tasks might simply be the result of top-down involvement of visual imagery rather than direct activation by tactile input. Critically, however, these putative ‘visual’ neural regions respond to shape information also when people who cannot engage in visual imagery, such as congenitally blind individuals, are tactilely exploring 3D objects (Heimler & Amedi, 2020; for reviews, Lacey & Sathian, 2014; Ricciardi et al., 2014) In a pioneering study, Pietrini et al. (2004) used MVPC to investigate the neural activity of both normal and congenital/early blind individuals during visual and/or tactile recognition tasks. The study design involved 3D objects belonging to three different categories: bottles, shoes, and (3D models of) human faces (see \(\ref{fig:figure1}\)). In the haptic version of the task, both groups of subjects were asked to manipulate the objects and try to recognize their shapes, with sighted individuals being blindfolded. Results showed that, in sighted individuals, both visual and tactile shape recognition were associated with similar patterns of category-related patterns of activity in the LOC and nearby regions in the ventral temporal cortex (see again \(\ref{fig:figure1}\)). Furthermore, as the authors observe, “blind subjects also demonstrated category-related patterns of response in this ‘’visual’’ area, and in more ventral cortical regions in the fusiform gyrus, indicating that these patterns are not due to visual imagery and, furthermore, that visual experience is not necessary for category-related representations to develop in these cortices.” (Pietrini et al., 2004, p. 5658).

Stimuli (A) and results (B) in the study of Pietrini et al. (2004). Adapted from Pietrini et al. (2004). Copyright (2004) National Academy of Sciences, U.S.A.
Figure 1: Stimuli (A) and results (B) in the study of Pietrini et al. (2004). Adapted from Pietrini et al. (2004). Copyright (2004) National Academy of Sciences, U.S.A.

These results indicate a significant overlap of the neural structure involved in 3D shape processing, notably the LOC, across individuals with typical vision and the early and even congenitally blind. Interestingly, the role of the LOC in the recognition of 3D shapes appears to extend beyond visual and tactile senses. This region also activates in sighted and blind individuals in response to ‘sensory substitution’ systems that convert geometrical information into an auditory stream via specific algorithms (e.g., Amedi et al., 2007). Moreover, it has been recently shown using MVPC that sounds belonging to four different semantic categories (faces, body parts, artificial objects and scenes) can be decoded from neural activity in the ventral-temporal cortex of blind individuals (Hurk et al., 2017; Mattioni et al., 2020; Peelen & Downing, 2017). Based on these and similar results, several researchers have argued that LOC does not have a strictly visual profile but rather implements a modality-invariant (or supramodal) representation of object shape (see, e.g., Lacey & Sathian, 2014; Pietrini et al., 2004; Ricciardi et al., 2014). This suggests that perception of object 3D shapes is inherently neither solely visual nor tactile but rather supramodal, transcending specific modalities, notably in the sense of being input-independent. This notion aligns with the increasingly accepted meta-modal or supramodal paradigm in neuroscience, according to which the organization of the brain is primarily driven by the type of sensory computation performed (e.g., shape or motion processing) rather than by the traditional distinction into sensory (visual, auditory, tactile, olfactory, gustatory) and motor modalities (Calzavarini, 2021, 2024; Heimler & Amedi, 2020; Pascual-Leone & Hamilton, 2001; Ricciardi et al., 2014).

A potential complication for the conclusion that perceptual apprehension of the shapes of objects and their 3D depth ratio is supramodal is due to cortical functional plasticity which is known to follow sensory loss in congenital blindness (for review, see Bedny, 2017). It might be argued that neural regions sensitive to vision in the sighted may functionally reorganize to respond to touch in the blind, without this indicating that the same supramodal processes are involved in 3D shape processing across normal and blind individuals (e.g., Kiefer et al., 2023). Nevertheless, as observed by several scholars (e.g. Calzavarini, 2024; Makin & Krakauer, 2023; Ricciardi & Pietrini, 2024), plasticity for high-level ‘visual’ regions such as LOC is strongly constrained by their pre-determined function (e.g., shape processing), suggesting that, in congenital blindness, the specific computations of these areas are preserved even if they are now triggered by input from another modality (e.g., touch). This suggests that plasticity after a sensory loss cannot override the original functional profile of a region (e.g., transforming a visual region into a tactile region) but is “better interpreted as upregulation of a more general input-agnostic computational capacity that then favours one input over another” (Makin & Krakauer, 2023, p. 4). In particular, for regions such as LOC, “unmasking of a latent [supramodal] capacity that is present in sighted individuals is a more plausible mechanism than positing a qualitative change in a visual area to a tactile one” (Makin & Krakauer, 2023, p. 12).

In the context of Molyneux’s problem, these insights suggest that if the brain is inherently capable of processing shapes in a supramodal way, a newly sighted individual might possess the neural foundation necessary to recognize 3D shapes seen for the first time, provided that the latent capabilities of their ‘visual’ cortex can be quickly upregulated or unmasked through exposure and experience. Note that this conclusion appears to be supported by the few neuroscientific investigations concerning individuals who have recovered sight after congenital or early blindness (e.g., Chen et al., 2016; Held, 2009; Held et al., 2011; for a review, see Occelli, 2020). Results of these studies converge in showing that, right after the acquisition of sight, subjects show an almost complete inability to transfer tactile recognition to the visual modality (Chen et al., 2016; Held, 2009; Held et al., 2011). Nevertheless, this visuo-tactile transfer ability is quickly recovered after a brief period of training, demonstrating that the “two senses are prearranged to immediately become calibrated to one another” (Chen et al., 2016, p. 1069).3 As observed by Occelli,

if […] patients are observed while performing the task at varying delays after the surgical procedure, even a few hours later, then a positive answer to Molyneux’s question appears legitimate. This evidence points to a very fast acquisition of the capability to establish supramodal visuotactile representations through experience […]. Likely, this evidence is the behaviorally observable result of a very rapid unmasking of pre-existing connections between the occipital cortex and the other sensory areas (Occelli, 2020, p. 229).4

3 A supramodal account of pictorial perception

Once we have been able to show that ordinary perception allows one to have a supramodal grasping of 3D shapes, one may wonder whether the same works for another form of perception, pictorial perception, so as to have an analogous solution to Molyneux’s question not only as regards ordinary 3D shapes, but also as regards depicted 3D shapes, which, unlike ordinary 3D shapes, are typically not where the perceiver is. As we will see, there is a good chance to answer this doubt affirmatively.

Unlike ordinary perception, which is the perception of concrete objects out there, pictorial perception is a complex form of perception, for it addresses complex items; namely, pictures understood as such, i.e., in their figurative value. As is well known, in many occasions (Wollheim, 1980, 1987, 1998, 2003a, 2003b) Richard Wollheim has claimed that pictorial perception is a genuine form of perception, seeing-in, which is however sui generis because of its twofoldness. On the one hand, in its configurational fold (CF) seeing-in is addressed to the picture’s vehicle, i.e., the physical basis of a picture. On the other hand, in its recognitional fold (RF) seeing-in is also addressed to the picture’s subject, i.e., what is presented by that picture. For Wollheim, such folds are inseparable. Indeed, as he explicitly says (1987, p. 46), neither fold is identical with the perception of its respective object, the vehicle and the subject, taken in isolation. In this respect, one may take seeing-in as a genuine fusional state in which the folds are not only such that the RF depends on the CF – the former cannot exist if the latter does not exist as well (Hopkins, 2008), but also is compenetrated with it (Voltolini, 2020a).

Now, few people doubt that the seeing-in’s CF is genuinely perceptual. For the seeing-in’s bearer faces the CF’s object, the vehicle. Yet some people doubt (e.g., Dorsch, 2016; Walton, 1993) that the seeing-in’s RF is such. For, typically at least, the seeing-in’s bearer does not face the RF’s object, the subject. Obviously, while facing at Louvre Leonardo’s La Gioconda, the enigmatic woman in front of a Mediterranean landscape one sees in is not there, for she is nowhere. Yet even while standing at Windsor in front of a portrait of Charles III, the prestigious member of the Royal Family one sees in it is not there, where the portrait is hung. Granted, there are ways for justifying Wollheim’s claim. For example, one may say that the RF is a form of knowingly illusory perception, in which the seeing-in’s bearer knowingly illusorily perceives the vehicle as the subject (Voltolini, 2015). In such a case, one would have a sort of controlled hallucination of the subject, just as in the case of the Hermann grid, where at the crossing of white lines embedded into black squares, one seems to see gray spots that are not there (\(\ref{fig:figure2}\)). For clearly enough, the controlled hallucination that leads one to see such spots depend on the particular arrangement of the figure’s main constituents, i.e., its black squares. For if such squares differed in their arrangement, the spots would no longer visible (\(\ref{fig:figure3}\)).5

The Hermann grid (personal picture)
Figure 2: The Hermann grid (personal picture)
A modified grid (personal picture)
Figure 3: A modified grid (personal picture)

Anyway, what counts for our present purposes is that such a claim can be supported by better understanding how in seeing-in the CF compenetrates the RF. For this will allow one to understand how depicted 3D shapes can be grasped supramodally in that form of perception. Let us see.

To begin with, the seeing-in’s CF differs from the perception of the picture’s vehicle in isolation, since it has an enriched content, Indeed, it grasps not only the vehicle’s low-level properties – primarily, its colors and its 2D shapes – but also certain higher-level properties that depend on such properties (the latter cannot be instantiated if the former are not instantiated as well); namely, the grouping properties according to which such properties are arranged. Grouping properties are indeed the properties for an item’s elements – the vehicle low-level properties, in this case – to be organized along a certain direction in a certain dimension. One can see such properties at work already in cases of simple ambiguous figures such as the Mach figure, which can be seen either as a diamond or as a tilted square, depending on which symmetry axes are chosen in order to organize the figure’s array. Such axes remain the same even if the figure is rotated. Clearly enough, grouping properties are higher-level properties. For first, they depend on low-level properties – they would not be instantiated if some low-level properties (primarily, colors and shapes) were not instantiated as well – without supervening on them, as ambiguous figures clearly show. For example in the afore-mentioned Mach figure, its grouping organization changes, even if its low-level properties (primarily, its colors and shapes again) remain the same (Wittgenstein, 2009, II, xi, §247). Second and connectedly, grouping properties can be selectively lost in perception. As Wittgenstein again stressed (2009, II, xi, §257), one can be blind with respect to such properties (aspect-blindness), without being blind with respect to any low-level property (unlike, say, someone suffering from achromatopsia).

In this respect, grouping properties arranging elements in the third dimension are essential for seeing-in, which makes it the case that a basically flat object, the picture’s vehicle, is perceived as presenting a 3D scenario. For they allow such elements of the picture’s vehicle to be arranged along a figure-ground 3D segmentation, in which the elements so arranged standing in the front partially occlude the elements so arranged standing on the back, by virtue of their ascribed 3D shapes. As vividly happens in the case of ‘aspect dawning’- pictures, those in which the figurative character of a picture ‘lights up’ only after a while. For in the case of such a picture, one can clearly realize how a 3D-level figure/ground organization of its vehicle’s elements emerges out of the 2D level characterizing the fact that the vehicle is a basically flat object. For example, in the following picture (\(\ref{fig:figure4}\)), all of a sudden one arranges into a figure-ground segmentation the black-and-white 2D spots one has been grasping there for a long while. This grouping segmentation allows one to grasp certain 3D horseish silhouettes standing some on the front and some others on the back, as partially occluded by the former in virtue of their ascribed 3D shapes.

(by courtesy of Paola Tosti)
Figure 4: (by courtesy of Paola Tosti)

Once one gets such an enriched perception of the picture’s vehicle in the seeing-in’s CF, one can also grasp the 3D picture’s subject in the seeing-in’s RF, which shows how the two folds are compenetrated. What one grasps in the RF is indeed a 3D scene (Nanay, 2022), in which certain objects are on the foreground and others on the background. Now, not only the spatial relationships between such objects, but also their 3D shapes, recapitulate both the spatial relationships and the 3D shapes ascribed to the silhouettes grasped in the CF. To come back to the previous example of the ‘aspect dawning’ picture of \(\ref{fig:figure4}\), the horses that one grasps in the seeing-in’s RF with that picture recapitulate in their structure both the spatial relationships and the 3D shapes ascribed to the horseish silhouettes that one grasps in the seeing-in’s CF with that picture.

Now, typically at least, the 3D shapes of the items that are captured in a seeing-in experience are grasped visually. Yet nothing prevents them from being captured in another sense modality, notably the tactile one. As Lopes (1996), Voltolini (2015) and Calzavarini and Voltolini (2023) have theorized and Kennedy (1993) has empirically confirmed, there are tactile pictures, in which from the perceptual point of view everything works just as in visual pictures.6 In addressing such pictures, on the hand, among the various alignments that a tactile perceiver makes of the elements of those pictures’ vehicles, that perceiver tactilely grasps the figure-ground segmentations in such elements. Those segmentations are modulated by the 3D shapes that perceiver ascribes to such elements. So, the relevant vehicle’s grouping properties that one can grasp visually can be also grasped tactilely, hence in a modality-independent way. On the other hand, by virtue of the above grouping operation, that perceiver also ends up with tactilely grasping a 3D scene that is not there. Hence, the perceiver clearly entertains a touching-in twofold perception of the picture that structurally works just as an ordinary seeing-in perception of that picture: the ascribed 3D segmentations are tactilely grasped in the CF, the 3D scenario is tactilely grasped in the RF, of a touching-in perception.

Now, as it has empirically shown by various studies in cognitive psychology, tactile pictures are so grasped not only by sighted people possibly blindfolded, but also by congenitally blind people. In addressing the vehicle of a picture raised in relief, the latter people are able to tactilely grasp its element’s alignments, hence to so ascribe such elements the 3D figure-ground segmentations of such elements that explain why occlusions affecting such elements are also grasped (Kennedy, 1997, 2000; Kennedy & Domander, 1984; Kennedy & Juricevic, 2006). Hence, those people ascribe to such elements certain 3D shapes. Such an ascription further explains why such people are also able to grasp in that picture’s subject the proper 3D shapes of the items constituting the scene that they perceive even if they do not face it. This ability mirrors the ability they exhibit in ordinary tactile perception, in which they can grasp the proper 3D shapes of the items of the scenes that they face (Tinti et al., 2018). Thus, in the end, congenitally blind people are able to entertain a touching-in twofold pictorial perception of that picture that structurally works just as an ordinary seeing-in perception of that picture.

If the above is the case, the grasping of the 3D depicted shapes is supramodal as well, since such shapes are possibly accessed either visually or tactilely. Hence, it is quite likely that there is a solution to the pictorial version of Molyneux’s puzzle that mirrors the one we have provided for its ordinary version. If she recovered sight, would a blindfolded person visually recognize the very same depicted shapes in a certain bas-relief picture that she was able to detect when addressed that picture tactilely? There are good chances to answer this question in the affirmative. Since touching-in structurally works just as seeing-in, a congenital blind person tactilely perceives the depicted 3D shapes she would able to visually perceive if she recovered sight. For example, as regards a bas-relief picture, a congenitally blind person may be able to touch in it the 3D shape of a depicted hand whose thumb partially occludes the other hand’s fingers by depictively standing in front of them. Now, if all of sudden she indeed recovered her sight, she would be able to grasp exactly the same 3D depicted shape. As is also proven by the fact that even in her present blindness, she may depict a similar picture (\(\ref{fig:figure5}\)) for sighted people to be seen-in (Kennedy, 1993; see also D’Angiulli et al., 1998; Kennedy & Bai, 2002; Pawluk et al., 2010).

(by courtesy of Lea Ferro)
Figure 5: (by courtesy of Lea Ferro)

Note that a positive answer to the Molyneux question in its pictorial form cannot be conclusively established based on extant neuroscientific research. To date, for example, there are no neuroimaging studies showing that the neural structures involved in the recognition of visual pictures are at least partially overlapping with those involved in the recognition of tactile pictures, and that these neural structures are the same in sighted and congenitally blind individuals (for discussion, see Calzavarini & Voltolini, 2023). The only neuroimaging (fMRI) study that has investigated the neural activation during the recognition of seemingly tactile pictures (Stoesz et al., 2003) has used stimuli with modest figurative value (tactile tables representing letters “V” and “U”), which are unlikely to trigger robust front → behind grouping processes. So granted, only potential fMRI studies involving tactile pictures with greater figurative complexity (e.g., raised-line drawings of objects and scenes seen in perspective; see Calzavarini & Voltolini (2023)) might be relevant to this issue. More critically, most of the neuroscientific investigations of shape processing in individuals who have recovered sight after blindness have used only 3D objects (e.g., Lego blocks) as stimuli (Chen et al., 2016; e.g. Held et al., 2011). Interestingly enough, a recent study by McKyton et al. (2015) indicates that, although newly sighted individuals can quickly recover the ability to recognize 2D pictorial stimuli, after 1 year they still are relatively unable to infer 3D shapes from pictorial cues such as occlusions and illusory contours. Yet, other studies indicate susceptibility to pictorial illusions such as Ponzo and Müller-Lyer immediately after sight onset (Gandhi et al., 2015), suggesting that the results by McKyton et al. (2015) might be due to the specific pictorial tests involved (for discussion, see Murray et al., 2015).7 Thus, more research is needed to provide conclusive positive proof for the supramodal pictorial perception hypothesis and, consequently, for a supramodal thorough account of the Molyneux question. Yet, on the basis of the present evidence, it is reasonable to assume that such a proof is forthcoming. 8

4 Conclusions

Let us take stock. First of all, there is a huge amount of empirical evidence showing that ordinary perception of 3D shapes is supramodal. If this is the case, then the Molyneux question has a positive answer: the same 3D shapes that one can recognize tactilely can be also recognized visually. Moreover, similar empirical considerations show that congenitally blind people are able to perform by means of touch the same 3D grouping operations with basically 2D pictorial vehicles that ordinary sighted people perform by means of vision. So, the pictorial vehicle’s grouping properties can be grasped supramodally as well. Hence, such a performance allows them to have a touching-in experience pretty alike to the ordinary seeing-in experience. Thus, there is a great chance that also a pictorial version of the Molyneux question can be answered positively: the former people are able to tactilely grasp the depicted 3D shapes that they would grasp visually, once they recovered sight.

References

Amedi, A., Jacobson, G., Hendler, T., Malach, R., & Zohary, E. (2002). Convergence of visual and tactile shape processing in the human lateral occipital complex. Cerebral Cortex, 12(11), 1202–1212. https://doi.org/10.1093/cercor/12.11.1202
Amedi, A., Merabet, L. B., Camprodon, J., Bermpohl, F., Fox, S., Ronen, I., Kim, D.-S., & Pascual-Leone, A. (2008). Neural and behavioral correlates of drawing in an early blind painter: A case study. Brain Research, 1242, 252–262. https://doi.org/10.1016/j.brainres.2008.07.088
Amedi, A., Stern, W. M., Camprodon, J. A., Bermpohl, F., Merabet, L., Rotman, S., Hemond, C., Meijer, P., & Pascual-Leone, A. (2007). Shape conveyed by visual-to-auditory sensory substitution activates the lateral occipital complex. Nature Neuroscience, 10(6), 687–689. https://doi.org/10.1038/nn1912
Baumgartner, G. (1960). Indirekte grössenbestimmung der rezeptiven felder der retina beim menschen mittels der Hermannschen gittertäuschung. Pflüger’s Archiv für Die Gesamte Physiologie Des Menschen Und Der Tiere, 272(1), 21–22. https://doi.org/10.1007/BF00680926
Bedny, M. (2017). Evidence from blindness for a cognitively pluripotent cortex. Trends in Cognitive Sciences, 21(9), 637–648. https://doi.org/10.1016/j.tics.2017.06.003
Calzavarini, F. (2021). The conceptual format debate and the challenge from (global) supramodality. The British Journal for the Philosophy of Science. https://doi.org/10.1086/717564
Calzavarini, F. (2024). Rethinking modality-specificity in the cognitive neuroscience of concrete word meaning: A position paper. Language, Cognition and Neuroscience, 39(7), 815–837. https://doi.org/10.1080/23273798.2023.2173789
Calzavarini, F., & Voltolini, A. (2023). Pictures as supramodal sensory individuals. In A. Mroczko-Wąsowicz & R. Grush (Eds.), Sensory individuals: Unimodal and multimodal perspectives (pp. 404–418). Oxford University Press. https://doi.org/10.1093/oso/9780198866305.003.0024
Chen, J., Wu, E.-D., Chen, X., Zhu, L.-H., Li, X., Thorn, F., Ostrovsky, Y., & Qu, J. (2016). Rapid integration of tactile and visual information by a newly sighted child. Current Biology, 26(8), 1069–1074. https://doi.org/10.1016/j.cub.2016.02.065
Cooke, T., Jäkel, F., Wallraven, C., & Bülthoff, H. H. (2007). Multimodal similarity and categorization of novel, three-dimensional objects. Neuropsychologia, 45(3), 484–495. https://doi.org/10.1016/j.neuropsychologia.2006.02.009
D’Angiulli, A., Kennedy, J. M., & Helle, M. A. (1998). Blind children recognizing tactile pictures respond like sighted children given guidance in exploration. Scandinavian Journal of Psychology, 39(3), 187–190. https://doi.org/10.1111/1467-9450.393077
Degenaar, M., Lokhorst, G.-J., Glenney, B., & Ferretti, G. (2024). Molyneux’s problem. In E. N. Zalta & U. Nodelman (Eds.), The Stanford encyclopedia of philosophy (Summer 2024). https://plato.stanford.edu/archives/sum2024/entries/molyneux-problem/.
Dorsch, F. (2016). Seeing-in as aspect perception. In G. Kemp & G. Mras (Eds.), Wollheim, Wittgenstein, and pictorial representation (pp. 205–238). Routledge. https://api.semanticscholar.org/CorpusID:52951793
Erdogan, G., Chen, Q., Garcea, F. E., Mahon, B. Z., & Jacobs, R. A. (2016). Multisensory part-based representations of objects in human lateral occipital cortex. Journal of Cognitive Neuroscience, 28(6), 869–881. https://doi.org/10.1162/jocn_a_00937
Ferretti, G., & Glenney, B. (2020). Molyneux’s question and the history of philosophy. Routledge New York, NY.
Gaissert, N., Wallraven, C., & Bülthoff, H. H. (2010). Visual and haptic perceptual spaces show high similarity in humans. Journal of Vision, 10(11), 2–20. https://doi.org/10.1167/10.11.2
Gallace, A., & Spence, C. (2014). The neglected power of touch: What the cognitive neurosciences can tell us about the importance of touch in artistic communication. Sculpture and Touch, 107–124. https://doi.org/10.4324/9781315088228-8
Gandhi, T., Kalia, A., Ganesh, S., & Sinha, P. (2015). Immediate susceptibility to visual illusions after sight onset. Current Biology, 25(9), R358–R359. https://doi.org/10.1016/j.cub.2015.03.005
Grill-Spector, K., Kourtzi, Z., & Kanwisher, N. (2001). The lateral occipital complex and its role in object recognition. Vision Research, 41(10), 1409–1422. https://doi.org/10.1016/S0042-6989(01)00073-6
Heimler, B., & Amedi, A. (2020). Are critical periods reversible in the adult brain? Insights on cortical specializations based on sensory deprivation studies. Neuroscience & Biobehavioral Reviews, 116, 494–507. https://doi.org/10.1016/j.neubiorev.2020.06.034
Heinen, R., Bierbrauer, A., Wolf, O. T., & Axmacher, N. (2024). Representational formats of human memory traces. Brain Structure and Function, 229(3), 513–529. https://doi.org/10.1007/s00429-023-02636-9
Held, R. (2009). Visual-haptic mapping and the origin of cross-modal identity. Optometry and Vision Science, 86(6), 595–598. https://doi.org/10.1097/OPX.0b013e3181a72999
Held, R., Ostrovsky, Y., Gelder, B. de, Gandhi, T., Ganesh, S., Mathur, U., & Sinha, P. (2011). The newly sighted fail to match seen with felt. Nature Neuroscience, 14(5), 551–553. https://doi.org/10.1038/nn.2795
Hopkins, R. (2000). Touching pictures. British Journal of Aesthetics, 40(1), 149–167. https://doi.org/10.1093/bjaesthetics/40.1.149
Hopkins, R. (2004). Painting, sculpture, sight, and touch. The British Journal of Aesthetics, 44(2), 149–166. https://doi.org/10.1093/bjaesthetics/44.2.149
Hopkins, R. (2008). What do we see in film? The Journal of Aesthetics and Art Criticism, 66(2), 149–159. https://doi.org/10.1111/j.1540-6245.2008.00295.x
Hurk, J. van den, Van Baelen, M., & Op de Beeck, H. P. (2017). Development of visual category selectivity in ventral visual cortex does not require visual experience. Proceedings of the National Academy of Sciences, 114(22), E4501–E4510. https://doi.org/10.1073/pnas.1612862114
James, T. W., Humphrey, G. K., Gati, J. S., Servos, P., Menon, R. S., & Goodale, M. A. (2002). Haptic study of three-dimensional objects activates extrastriate visual areas. Neuropsychologia, 40(10), 1706–1714. https://doi.org/10.1016/s0028-3932(02)00017-9
Kennedy, J. M. (1993). Drawing and the blind: Pictures to touch. Yale University Press.
Kennedy, J. M. (1997). How the blind draw. Scientific American, 276(1), 76–81. https://doi.org/10.1038/scientificamerican0197-76
Kennedy, J. M. (2000). Recognizing outline pictures via touch: Alignment theory. In M. A. Heller (Ed.), Touch, representation and blindness (pp. 67–99). Oxford University Press Oxford. https://doi.org/10.1093/acprof:oso/9780198503873.003.0003
Kennedy, J. M., & Bai, J. (2002). Haptic pictures: Fit judgments predict identification, recognition memory, and confidence. Perception, 31(8), 1013–1026. https://doi.org/10.1068/p3259
Kennedy, J. M., & Domander, R. (1984). Pictorial foreground/background reversal reduces tactual recognition by blind subjects. Journal of Visual Impairment & Blindness, 78(5), 215–216. https://doi.org/10.1177/0145482X8407800507
Kennedy, J. M., & Juricevic, I. (2006). Form, projection and pictures for the blind. In M. A. Heller & S. Ballesteros (Eds.), Touch and blindness (pp. 73–93). Psychology Press. https://www.psycnet.org/record/2005-12992-004
Kiefer, M., Kuhnke, P., & Hartwigsen, G. (2023). Distinguishing modality-specificity at the representational and input level: A commentary on Calzavarini (2023). Language, Cognition and Neuroscience, 1–5. https://doi.org/10.1016/j.brainres.2008.07.088
Kim, S., & James, T. W. (2010). Enhanced effectiveness in visuo-haptic object-selective brain regions with increasing stimulus salience. Human Brain Mapping, 31(5), 678–693. https://doi.org/10.1080/23273798.2023.2209928
Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G., & Mishkin, M. (2013). The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1), 26–49. https://doi.org/10.1016/j.tics.2012.10.011
Kriegeskorte, N., & Douglas, P. K. (2018). Cognitive computational neuroscience. Nature Neuroscience, 21(9), 1148–1160. https://doi.org/10.1038/s41593-018-0210-5
Lacey, S., Peters, A., & Sathian, K. (2007). Cross-modal object recognition is viewpoint-independent. PloS One, 2(9), e890. https://doi.org/10.1371/journal.pone.0000890
Lacey, S., & Sathian, K. (2014). Visuo-haptic multisensory object recognition, categorization, and representation. Frontiers in Psychology, 5, 730. https://doi.org/10.3389/fpsyg.2014.00730
Lawson, R. (2009). A comparison of the effects of depth rotation on visual and haptic three-dimensional object recognition. Journal of Experimental Psychology: Human Perception and Performance, 35(4), 911. https://doi.org/10.1037/a0015025
Lopes, D. (1996). Understanding pictures. Clarendon Press. https://books.google.it/books?id=7CaQDwAAQBAJ
Makin, T. R., & Krakauer, J. W. (2023). Against cortical reorganisation. Elife, 12, e84716. https://doi.org/10.7554/eLife.84716
Matthen, M., & Cohen, J. (2020). Many Molyneux questions. Australasian Journal of Philosophy, 98(1), 47–63. https://doi.org/10.1080/00048402.2019.1603246
Mattioni, S., Rezk, M., Battal, C., Bottini, R., Cuculiza Mendoza, K. E., Oosterhof, N. N., & Collignon, O. (2020). Categorical representation from sound and sight in the ventral occipito-temporal cortex of sighted and blind. Elife, 9, e50732. https://doi.org/10.7554/eLife.50732
McKyton, A., Ben-Zion, I., Doron, R., & Zohary, E. (2015). The limits of shape recognition following late emergence from blindness. Current Biology, 25(18), 2373–2378. https://doi.org/10.1016/j.cub.2015.06.040
Murray, M. M., Matusz, P. J., & Amedi, A. (2015). Neuroplasticity: Unexpected consequences of early blindness. Current Biology, 25(20), R998–R1001. https://doi.org/10.1016/j.cub.2015.08.054
Nanay, B. (2020). Molyneux’s question and interpersonal variations in multimodal mental imagery among blind subjects. In G. Ferretti & B. Glenney (Eds.), Molyneux’s question and the history of philosophy (pp. 259–265). Routledge. https://www.taylorfrancis.com/chapters/edit/10.4324/9780429020377-24/molyneux-question-interpersonal-variations-multimodal-mental-imagery-among-blind-subjects-bence-nanay
Nanay, B. (2022). What do we see in pictures? The sensory individuals of picture perception. Philosophical Studies, 179(12), 3729–3746. https://doi.org/10.1007/s11098-022-01864-9
Nanay, B. (2023). Mental imagery: Philosophy, psychology, neuroscience. Oxford University Press.
Newell, F. N., Ernst, M. O., Tjan, B. S., & Bülthoff, H. H. (2001). Viewpoint dependence in visual and haptic object recognition. Psychological Science, 12(1), 37–42. https://doi.org/10.1111/1467-9280.00307
Norman, J. F., Norman, H. F., Clayton, A. M., Lianekhammy, J., & Zielke, G. (2004). The visual and haptic perception of natural object shape. Perception & Psychophysics, 66, 342–351. https://doi.org/10.3758/BF03194883
Occelli, V. (2020). Molyneux’s question and neuroscience of vision. In G. Ferretti & B. Glenney (Eds.), Molyneux’s question and the history of philosophy (pp. 216–234). Routledge. https://www.taylorfrancis.com/chapters/edit/10.4324/9780429020377-20/molyneux-question-neuroscience-vision-valeria-occelli
Orlov, T., Raveh, M., McKyton, A., Ben-Zion, I., & Zohary, E. (2021). Learning to perceive shape from temporal integration following late emergence from blindness. Current Biology, 31(14), 3162–3167. https://doi.org/10.1016/j.cub.2021.04.059
Pascual-Leone, A., & Hamilton, R. (2001). The metamodal organization of the brain. Progress in Brain Research, 134, 427–445. https://doi.org/10.1109/TOH.2010.25
Pawluk, D., Kitada, R., Abramowicz, A., Hamilton, C., & Lederman, S. J. (2010). Figure/ground segmentation via a haptic glance: Attributing initial finger contacts to objects or their supporting surfaces. IEEE Transactions on Haptics, 4(1), 2–13. https://doi.org/10.1016/s0079-6123(01)34028-1
Peelen, M. V., & Downing, P. E. (2017). Category selectivity in human visual cortex: Beyond visual object recognition. Neuropsychologia, 105, 177–183. https://doi.org/10.1016/j.neuropsychologia.2017.03.033
Pietrini, P., Furey, M. L., Ricciardi, E., Gobbini, M. I., Wu, W.-H. C., Cohen, L., Guazzelli, M., & Haxby, J. V. (2004). Beyond sensory images: Object-based representation in the human ventral pathway. Proceedings of the National Academy of Sciences, 101(15), 5658–5663. https://doi.org/10.1073/pnas.0400707101
Plaisier, M. A., Tiest, W. M. B., & Kappers, A. M. (2008). Haptic pop-out in a hand sweep. Acta Psychologica, 128(2), 368–377. https://doi.org/10.1016/j.actpsy.2008.03.011
Plaisier, M. A., Tiest, W. M. B., & Kappers, A. M. L. (2009). Salient features in 3-d haptic shape perception. Attention, Perception & Psychophysics, 71(2), 421–430. https://doi.org/10.3758/APP.71.2.421
Plaisier, M. A., Van Polanen, V., & Kappers, A. M. (2017). The role of connectedness in haptic object perception. Scientific Reports, 7(1), 43868. https://doi.org/10.1038/srep43868
Ricciardi, E., Bonino, D., Pellegrini, S., & Pietrini, P. (2014). Mind the blind brain to understand the sighted one! Is there a supramodal cortical functional architecture? Neuroscience & Biobehavioral Reviews, 41, 64–77. https://doi.org/10.1016/j.neubiorev.2013.10.006
Ricciardi, E., & Pietrini, P. (2024). The supramodality “spillover” from neuroscience to cognitive sciences: A commentary on Calzavarini (2024). Language, Cognition and Neuroscience, 39(7), 867–871. https://doi.org/10.1080/23273798.2023.2218502
Schiller, P. H., & Carvey, C. E. (2005). The hermann grid illusion revisited. Perception, 34(11), 1375–1397. https://doi.org/10.1068/p5447
Stoesz, M. R., Zhang, M., Weisser, V. D., Prather, S., Mao, H., & Sathian, K. (2003). Neural networks active during tactile form perception: Common and differential activity during macrospatial and microspatial tasks. International Journal of Psychophysiology, 50(1-2), 41–49. https://doi.org/10.1016/s0167-8760(03)00123-5
Tian, S., Chen, L., Wang, X., Li, G., Fu, Z., Ji, Y., Lu, J., Wang, X., Shan, S., & Bi, Y. (2024). Vision matters for shape representation: Evidence from sculpturing and drawing in the blind. Cortex, 174, 241–255. https://doi.org/10.1016/j.cortex.2024.02.016
Tinti, C., Chiesa, S., Cavaglià, R., Dalmasso, S., Pia, L., & Schmidt, S. (2018). On my right or on your left? Spontaneous spatial perspective taking in blind people. Consciousness and Cognition, 62, 1–8. https://doi.org/10.1016/j.concog.2018.03.016
Voltolini, A. (2015). A syncretistic theory of depiction. Palgrave. https://doi.org/10.1057/9781137263292
Voltolini, A. (2020a). Different kinds of fusion experiences. Review of Philosophy and Psychology, 11(1), 203–222. https://doi.org/10.1007/s13164-019-00456-7
Voltolini, A. (2020b). Qua seeing-in, pictorial experience is a superstrongly cognitively penetrated perception. Kunstiteaduslikke Uurimusi, 29(03+ 04), 13–30. https://www.ceeol.com/search/article-detail?id=925675#:~:text=By%20'superstrongly%20cognitively%20penetrated'%2C,a%20concept%20is%20needed%20to
Walton, K. L. (1993). Mimesis as make-believe: On the foundations of the representational arts. Harvard University Press.
Wittgenstein, L. (2009). Philosophical investigations. John Wiley & Sons.
Wollheim, R. (1980). Seeing-as, seeing-in, and pictorial representation. Art and its objects, 2, 205–226. https://www.cambridge.org/core/books/abs/art-and-its-objects/seeingas-seeingin-and-pictorial-representation/A00A989B987FE3EA96A6F941B141616D
Wollheim, R. (1987). Painting as an art. Princeton University Press. https://doi.org/10.2307/jj.5425922
Wollheim, R. (1998). On pictorial representation. The Journal of Aesthetics and Art Criticism, 56(3), 217–226. https://doi.org/10.2307/432361
Wollheim, R. (2003a). What makes representational painting truly visual? Aristotelian Society Supplementary Volume, 77(1), 131–147. https://doi.org/10.1111/1467-8349.00106
Wollheim, R. (2003b). In defense of seeing-in. In H. Hecht, R. Schwartz, & M. Atherton (Eds.), Looking into pictures: An interdisciplinary approach to pictorial space. The MIT Press. https://doi.org/10.7551/mitpress/4337.003.0004
Woods, A. T., Moore, A., & Newell, F. N. (2008). Canonical views in haptic object perception. Perception, 37(12), 1867–1878. https://doi.org/10.1068/p6038

  1. Note that functional equivalence from vision and touch could also be found when lower-level perceptual features, such as edges, connectedness and texture, are involved (see, e.g., Plaisier et al., 2008, 2017; 2009).↩︎

  2. Granted, LOC is not the only brain region that is supposed to be involved in visual shape recognition. In classical models of visual recognition (see, e.g., Kravitz et al., 2013), the transformation of 2D retinal input into a 3D representation along the visual pathways is a complex process that begins in the primary visual cortex (V1), which elaborates elementary features such as edges and orientation. From V1, visual information proceeds along two functionally distinct streams: the dorsal stream, which goes towards the parietal lobes and is critical for motion and spatial processes, and the ventral stream, which extents in the temporal lobe and is involved in object recognition and shape perception. Within the ventral stream, visual information is progressively integrated and transformed in higher visual areas, such as V2 and V4, which processes more complex features like contours, shapes, and colors. This hierarchical processing culminates in the lateral occipital complex (LOC), where the visual system assembles these lower-level features into coherent 3D object representations, enabling the perception of depth, form, and object identity.↩︎

  3. Indeed, the recognition of static 3D forms after sight may require more extensive experience and cortical reorganization compared to other visual processes. For example, a recent study by Orlov et al. (2021) has reported the case of 23 newly sighted children who quickly recovered the ability to infer the direction of global motion but maintained significant difficulties in recognizing shapes in slit-viewing conditions, which required them to integrate fragmented visual information over time to recover the global shape. In addition, the study indicates that “shape recovery could only be carried out after the global-motion vector was extracted” (Orlov et al., 2021, p. 3165). These results not only suggest that “the motion-processing pathways are likely to be more resilient to long-term visual deprivation than the form-processing pathways” (ib.), but also that the reliance on motion cues might be an essential component of 3D shape recognition. Note, however, that the study by Orlov et al. (2021) focuses on anorthoscopic vision, which requires participants to recognize shapes that are only partially visible at any given moment as they move behind a narrow slit. This setup is intrinsically more closely aligned with the functioning of the dorsal pathway, which processes motion and spatial relationships over time, and might not be ideal for testing 3D shape recognition. Thus, more research is needed on this issue.↩︎

  4. If perception of 3D shapes is supramodal, there is no need to appeal to mental imagery in order to ascribe a congenitally blind person the capacity to visually grasp the same 3D shapes she can already grasp tactilely, as Nanay (2023, pp. 104–106; see also 2020) instead claims.↩︎

  5. Granted, there are reasons to say that the perception of the Hermann grid is not already a pictorial perception, since no proper 3D scene seems ultimately to be grasped in it, as if no RF would occur in that perception. For what enables that scene to be grasped, hence a genuine RF to emerge in a proper seeing-in perception, is its subject’s conceptualization: cf. Voltolini (2015, 2020b). This is apparently consistent with the observation that the Hermann grid is traditionally considered to be a very low-level psychological phenomenon, being created at the retinal level and not at a cortical level of stimulus processing (Baumgartner, 1960). Note, however, that this retinal account has been revisited more recently by Schiller and Carey (2005), who suggest that the Hermann grid illusion is primarily a cortical phenomenon, due to “the manner in which S1 type simple cells […] in primary visual cortex respond to the grid” (2005, p. 1375).↩︎

  6. Hopkins (2000, 2004) has claimed that there cannot be tactile pictures, since they do not seem to grasp the picture’s subject from a vantage point of view, as even for Wollheim (1980) seems to be fundamental for seeing-in. Yet this claim has been both theoretically (Calzavarini & Voltolini, 2023) and empirically (Gallace & Spence, 2014) criticized, also because, as we saw in the previous Section, already in ordinary perception touch seems to be sensitive to perspectivality.↩︎

  7. McKyton et al. (2015) used a “odd-ball task” in which participants were asked to find an odd target among an array of elements. Conditions in this study included low-level tasks (shape recognition based on color, size or contour) and mid-level tasks (shape recognition based on 3D pictorial clues: occlusion, shading, box, and illusory contours). In their commentary on this article, Murray et al. (2015, p. R999) argued that “the oddball discrimination tasks used by McKyton et al. […] may have tapped into experience-dependent processes […] in contrast to the illusions reported in Gandhi et al. […]”, and this might explain the apparently conflicting results across the two studies.↩︎

  8. Note that a similar conclusion can be drawn for pictorial production, that is, the ability to produce drawings. A fascinating study conducted by Amedi et al. (2008), for example, reported the case of E.A., an early blind artist capable of creating highly detailed drawings of objects or scenes that are easily recognizable by sighted individuals. The study employed fMRI to explore the neural mechanisms underlying E.A.’s ability to convert 3D objects explored through touch into 2D drawings. The neural activity observed during the drawing process involved several ‘visual’ areas, including the LOC, supporting a supramodal profile of pictorial production. A recent study by Tian et al. (2024) has tested the drawing ability (among other things) of a group of early blind individuals, showing that the blind can produce recognizable drawings, especially for tools, which they had rich tactile experiences with. The absence of visual experience, however, impacted on the ability to draw animals, with blind participants’ drawings being less recognizable than those of sighted participants. These results suggest that, while tactile experience can compensate for the lack of vision in producing pictorial stimuli, vision might still be crucial for in cases where tactile information is less available, such as in the case of animals.↩︎