3 Experiment 1 - Simulating Space when Remembering Words: Role of Visuospatial Memory

3.1 Motivation and Aims

Spatial simulation within grounded-embodied cognition and cognitive offloading within the extended cognition were outlined and discussed in Chapter 1 and 2. Based on this theoretical background, this chapter describes an experimental study investigating a memory-based looking behaviour (i.e., looking at nothing) which is representative of spatial simulation and cognitive offloading. In general terms, this study aims to investigate how spatial location is simulated following the visual perception of words to support the retrieval of these words.

3.2 Abstract

People tend to look at uninformative, blank locations in space when retrieving information. This gaze behaviour, known as looking at nothing, is assumed to be driven by the use of spatial indices associated with external information. In the present study, we investigated whether people form spatial indices and look at nothing when retrieving words from memory. Participants were simultaneously presented four nouns. Additionally, word presentation was sometimes followed by a visual cue either co-located (congruent) or not (incongruent) with the probe word. During retrieval, participants looked at the relevant, blank location, where the probe word had appeared previously, more than the other, irrelevant blank locations following a congruent visual cue and when there was no cue between encoding and retrieval (pure looking at nothing condition). Critically, participants with better visuospatial memory looked less at “nothing”, suggesting a dynamic relationship between so-called “external” and “internal” memory. Overall, findings suggest an automatic spatial indexing mechanism and a dynamic looking at nothing behaviour for words.

Highlights

  • Participants offloaded memory work onto the environment with eye movements when remembering visually and simultaneously presented single words.
  • Worse visuospatial memory led to more reliance on the environment during retrieval.

3.3 Introduction

The human mind can anchor spatially-located information to external spatial locations. This mechanism has been expressed within a visual processing model, where the location of an object is separated from the visual features of it (Marr, 1982). This view, expanded into an exhaustive spatial indexing model (Pylyshyn, 1989), assumes that the visual system is able to individuate spatial relations before discerning a visual pattern and immediately index the locations of such patterns. In a similar fashion, spatial registration hypothesis (Coslett, 1999) holds that perceived stimuli are coded with respect to their location in space. Location, therefore, is a critical constituent of our interactions with the world (van der Heijden, 1993). Within the spatial indexing (or spatial registering/encoding) model, spatial indices remain attached to a particular object independent of its movements and visual properties. Critically, spatiotemporal continuity (i.e., persistence of spatial “tags” over time) occurs even when the visual information disappears, as often manifested in mental imagery (e.g., Brandt & Stark, 1997). Spatial indices tied to external visual and verbal information trigger eye movements when a mental representation is reactivated. Thus, when retrieving information from memory, people tend to exploit location-based indices and look at the seemingly uninformative, empty locations where the information originally occurred even if location is irrelevant to the task. This behaviour is known as looking at nothing (Spivey & Geng, 2000). In their pioneering study, Richardson and Spivey (2000) documented the use of spatial information and looking at nothing in verbal memory. Four faces randomly appeared on different quadrants of a two by two grid along with four corresponding spoken facts (e.g., “Shakespeare’s first plays were historical dramas; his last was the Tempest”). On the next screen, a statement (e.g., “Shakespeare’s first play was the Tempest”) probed participants’ memory for verbal information. During retrieval, there were significantly more looks in the blank quadrant where the face associated with the probed semantic information had been when compared to other quadrants. Thus, people did not just look at any nothing when answering the questions. Rather, they looked at an invisible spatial index, which was previously allocated to the information (Spivey & Geng, 2000). Looking at nothing may be best thought of as an interface between internal and external worlds. Ferreira, Apel and Henderson (2008) proposed an integrated memory architecture, where external cues and internal representations work hand in hand to retrieve information as efficiently as possible (see also Richardson, Altmann, Spivey, & Hoover, 2009). More precisely, the integrated memory account combines visual/auditory and spatial information in the external world with visual, linguistic, spatial and conceptual counterparts in the mental world. When part of an integrated representation (linguistic information) is reactivated, the other parts (spatial information) are retrieved as well. In this regard, looking at nothing is also an example of spatial simulation (Barsalou, 1999) in that the spatial position where the information is presented is recreated when the information is needed again. Looking at nothing can also be thought as an example of efficient cognitive offloading (Risko & Gilbert, 2016), in which the memory work is offloaded onto the world to minimise internal demands. In the current study, we addressed the looking at nothing triangle, which is composed of actual looking behaviour, spatial indices and mental representations to answer three questions (1) How automatic is spatial indexing? Do individuals automatically index the locations of short and briefly presented linguistic information (e.g., visually and simultaneously presented single words)? (2) How dynamic is spatial indexing and looking at nothing? Can spatial indices be updated with subsequent visual information and how does it affect looking behaviour? (3) Does everybody look at blank locations, or is looking at nothing modulated by certain cognitive capacities such as visuospatial memory span?

3.3.1 Spatial indexing and looking at nothing: automaticity

Looking at nothing typically occurs under two retrieval conditions as shown in the previous studies: (1) People look at blank locations when remembering spoken linguistic information such as factual sentences (Hoover & Richardson, 2008; D. C. Richardson & Kirkham, 2004; D. C. Richardson & Spivey, 2000; Scholz et al., 2018, 2011, 2016). As illustrated above, spoken linguistic information is explicitly associated with a visual object in this paradigm, which we term as explicit indexing. In turn, eyes revisit the previous locations of the object (associated with the information) when retrieving the spoken factual information. (2) Looking at nothing also occurs during retrieval of visually presented non-linguistic information such as single objects (Martarelli & Mast, 2013; Spivey & Geng, 2000), arrangement of multiple objects (Altmann, 2004; Johansson & Johansson, 2014) or visual patterns (Bochynska & Laeng, 2015; Laeng et al., 2014). In this case, locations are encoded along with the visual object(s) or patterns. In the current study, we adopted a different approach to examine the automaticity of spatial encoding of linguistic information. We showed participants four nouns on a grid simultaneously to study for a brief period of time. Then, an auditorily presented word (which could be either among the studied words or not) probed participants’ verbal recognition memory while participants were looking at a blank screen. If participants automatically encode location of the words as assumed in spatial indexing hypothesis (implicit indexing), they should display looking at nothing behaviour. In other words, we predict more fixations in the now-blank locations of the probe word during retrieval compared to the other, irrelevant blank locations. However, if explicit indexing is required for looking at nothing as shown in the previous studies, there should be same amount of spontaneous looks in the relevant and irrelevant blank locations. Word locations are encoded in reading (Fischer, 1999; but Inhoff & Weger, 2005), writing (Le Bigot, Passerault, & Olive, 2009) and complex cognitive tasks such as memory-based decision-making (Jahn & Braatz, 2014; Renkewitz & Jahn, 2012; Scholz, von Helversen, & Rieskamp, 2015). However, to what extent spatial encoding is automatic is not clear. An automatic process is fast, efficient, unconscious, unintentional, uncontrolled (i.e., cannot be wilfully inhibited), goal-independent and purely stimulus-driven (i.e., cannot be avoided) (Hasher & Zacks, 1979; Moors & De Houwer, 2006). Based on the criteria of automaticity, there is evidence that spatial encoding is an automatic process (Andrade & Meudell, 1993), an effortful process (Naveh-Benjamin, 1987, 1988) and a combination of both (Ellis, 1991). For instance, Pezdek, Roman and Sobolik (1986) reported that spatial information for objects are more likely to be encoded automatically as compared to words. If participants look at previous locations of the words they are asked to remember, this could provide evidence for the automaticity of spatial encoding in looking at nothing due to the specifics of the present experimental paradigm (i.e., implicit encoding and brief encoding time) and the nature of looking at nothing (i.e., an unintentional and efficient behaviour).

3.3.2 Spatial indexing and looking at nothing: dynamicity

How stable are the spatial indices in looking at nothing? Can they be updated with subsequent visuospatial information? How does updated spatial indices guide eye movements during memory retrieval? Answers to these questions are critical to understand mechanics of spatial indexing and looking at nothing. Thus, we tested whether congruency or incongruency of visuospatial cues between encoding and retrieval stages affects spatial indexing and looking at nothing. There are studies examining the temporal stability of spatial indices. For example, in Wantz, Martarelli and Mast (2015), location memory for visual objects faded 24 hours after the initial encoding. In consequence, participants looked at relevant, blank locations immediately after the encoding, 5 minutes and 1 hour after the encoding but not after 24 hours. That said, less is known about the spatial stability of indices. In one study (D. C. Richardson & Kirkham, 2004), looking at nothing was reported when the visual information that was associated with the to-be-retrieved information moved and thus, updated the spatial indices. Participants looked at the previous locations of the updated locations rather than the original locations of the previously encoded information, suggesting a flexible and a dynamic spatial indexing mechanism. In the current study, a visual cue (i.e., a black dot) that was irrelevant to the words and to the task itself was shown between encoding and retrieval stages. The cue was presented either in the same quadrant of the grid or a diagonal quadrant as to the location of the probe word at the encoding stage. There was also a third condition, in which, the participants did not see a cue at all. A plethora of studies on Simon effect (Simon & Rudell, 1967) indicates that spatial congruency between the stimulus and the response results in faster and more accurate response even when the location is irrelevant to successful performance (see Hommel, 2011 for a review). In line, Vankov (2011) presented evidence for a Simon-like effect in spatial indexing and showed that compatibility of irrelevant spatial information benefits memory retrieval (see also Hommel, 2002; Wühr & Ansorge, 2007; Zhang & Johnson, 2004) Participants saw four objects on a 2 x 2 grid (e.g., a line drawing of a guitar, cat, camel and plane) at the encoding phase. Then, they were presented a word denoting either a new object or one of the studied objects (e.g., guitar) in one of the four locations as to the location of the target object; that is, in the same location, a vertical location (above or below the target object), a horizontal location (left or right of the target object) or a diagonal location. Participants were asked to remember whether the object denoted by the word appeared before. The fastest responses were found when the word cue appeared in the same location as to the target object. Participants were the slowest to respond when the word cue was in the diagonal location as to the target object. In the light of the abovementioned evidence, we predict that (in)congruency between the spatial code attached to the word and the spatial code attached to the visuospatial cue could modulate looking at nothing behaviour. A congruent cue is predicted to emphasize the original location of the probe word and thus, the spatial indice tagging it. In turn, fixations to the relevant, blank locations should be more frequent in congruent cue condition as to no cue condition. On the other hand, an incongruent cue could update the spatial code attached to the word and disrupt looking to blank locations by shifting participants’ attention to a diagonal location. Such a pattern would suggest that spatial indexing and looking at nothing for words are dynamic processes that are sensitive to the systematic manipulation of irrelevant visuospatial information.

3.3.3 Looking at nothing and visuospatial memory

The link between mental representations and looking at nothing is critical. One position within the radical grounded-embodied cognition (Chemero, 2011) is that the world functions as an outside memory without the need for mental representations (O’Regan, 1992). According to this view, the external memory store can be accessed at will through visual perception. As discussed above, the integrated memory account (Ferreira et al., 2008) represents an opposing position within a relatively “traditional” grounded-embodied approach. Accordingly, “internal memory” (mental representations) and so called “external memory” (i.e., the external world internalised via spatial indices and eye movements) work cooperatively in an efficient and goal-directed manner in looking at nothing. To be more precise, “the opportunistic and efficient mind” (D. C. Richardson et al., 2009) exploits external support whenever it needs to minimise internal memory load. In support of this assumption, there is evidence that short-term memory capacity is a reliable predictor of conscious and intentional use of environment in memory tasks (see Risko & Gilbert, 2016 for a review). In one memory study (Risko & Dunn, 2015), offloading (i.e., writing down to-be-retrieved information) was given as an option to the participants. Results revealed that participants with worse short-term memory wrote down the information rather than relying on the internal memory more frequently than the participants with better short-term memory. In looking at nothing, there is evidence that reliance on the environment increases/decreases in proportion to internal demands. For example, people tend to exhibit less looking at nothing as they are asked to study and recall the same sentences over and over again, suggesting less reliance on external cues as the task becomes easier through repetition (Scholz et al., 2011). Similarly, Wantz, Martarelli and Mast (2015) showed less looks to blank locations with repeated recall without rehearsal as mental representations stabilise in time. However, not much is known about how individual differences in internal memory map onto the differences in looking at nothing within the scope of integrated memory account. If the opportunistic and efficient cognitive system uses both internal and external cues to access memory traces (D. C. Richardson et al., 2009) and if external cues are used to relieve internal operations (Risko & Dunn, 2015), people with relatively worse visuospatial memory should rely more on the environment during memory retrieval (and vice versa). A correlation between visuospatial memory capacity and looking at nothing could provide further evidence for the integrated memory system by disproving the world as an outside memory argument (O’Regan, 1992) and consequently, radical grounded-embodied cognition.

3.3.4 Role of eye movements in memory retrieval

Another fundamental issue is whether looks occur to blank regions that are associated with information facilitate the retrieval of this information. This issue taps into a seemingly simple question with regard to the very nature of memory-guided eye movements: Why do people look at nothing? Role and functionality of eye movements in memory retrieval have been highly controversial (see Ferreira et al., 2008; Mast & Kosslyn, 2002; Richardson et al., 2009 for discussions). First studies did not present any evidence for improvement in memory with looks to blank spaces (Hoover & Richardson, 2008; D. C. Richardson & Kirkham, 2004; D. C. Richardson & Spivey, 2000; Spivey & Geng, 2000; Vankov, 2011). Initial failure to demonstrate memory enhancement lead to the preliminary conclusion that eye movements only co-assist the retrieval process as a by-product (Spivey, Richardson, & Fitneva, 2004). There is now growing evidence that gaze position can play a functional role in memory retrieval. For example, Laeng and Teodorescu (2002) reported that participants who viewed an image and looked at the blank screen freely (free perception & free retrieval) were more accurate in answering the retrieval questions those whose gaze were restricted to the central fixation point (free perception & fixed retrieval) (see also Johansson, Holsanova, Dewhurst, & Holmqvist, 2012; Laeng et al., 2014 for memory advantage in free gaze compared to fixed gaze). In a similar gaze manipulation paradigm, participants who were instructed to look at relevant, blank regions were more accurate in judging statements about visual objects (Johansson & Johansson, 2014) and verbal information (Scholz et al., 2018, 2016) than the participants who were instructed to look at a diagonal location as to the original location of the object or object associated with verbal information. The current study was not designed to test the role of looking behaviour in memory. That is, eye gaze at retrieval was not manipulated as in the studies reviewed above. Rather, we analysed the functionality of looking at nothing by using the fixation percentage in the relevant quadrant (i.e., looking at nothing) as a predictor of hit rate and hit latency within mixed-effects models. If looks to the relevant, blank locations predict recognition memory for visually presented single words, it might provide tentative evidence for the facilitatory role of gaze position in memory.

3.4 Method

3.4.1 Participants

The experiment was carried out with forty-eight students at the University of Birmingham (six males; Mage = 19.92, SD = 1.96, range: 18 - 27). 96% of the participants were psychology students. All participants were monolingual native speakers of British English as determined with the Language History Questionnaire (version 2.0; Li, Zhang, Tsai, & Puls, 2013). Participants reported normal or corrected-to-normal vision, no speech or hearing difficulties and no history of any neurological disorder. They received either £6 (n = 12) or course credit (n = 36) for participation. All participants were fully informed about the details of the experimental procedure and gave written consent. Post-experiment debriefing revealed that all participants were naïve to the purpose of the experiment. No participant was replaced.

3.4.2 Materials

There were 192 trials involving 864 unique nouns in total. Trials were evenly divided into two groups (n = 96) as experimental (positive probe) trials and fillers. Probe words in the experimental trials were among the four study words in the encoding phase, whereas a different, not seen, word was probed in fillers. Words in the experimental trials (n = 384) were drawn from the extensions of Paivio, Yuille and Madigan norms for 925 nouns (J. M. Clark & Paivio, 2004). The word pool was filtered to exclude words shorter than 3 letters and longer than 6 letters. Imageability, frequency (the CELEX database; Baayen, Piepenbrock, & Gulikers, 1995; and logarithmic values of occurrences per million in Kučera & Francis, 1967), age of acquisition, concreteness, availability (Keenan & Benjafield, 1994), length in letters and number of syllables were identified as major predictors of verbal memory (Rubin & Friendly, 1986) and used to control the experimental stimuli. The subset was then grouped into quadruples and trial sets were identified. Words within quadruples were matched on age of acquisition, availability, concreteness, imageability, length in letters, log frequency and number of syllables (all SDs < 2.00 and all SEs < 1.00). Words were further controlled so that no word started with the same letter, rhymed or related semantically with any other in the quadruple. Monosyllabic, disyllabic and trisyllabic words were evenly distributed [e.g., (3, 3, 3, 3), (1, 2, 1, 2) or (3, 2, 3, 2) etc.]. The word in each trial set with the median imageability value was selected as the probe among four words leaving the others as distractors (see Rubin & Friendly, 1986). Welch’s t-tests revealed no significant difference between the probe and distractor words in in frequency, length in letters or number of syllables (all ps > .05). Thus, any word among the four words in each trial set was as likely to be remembered as any other word. Words in filler trials were drawn from the Toronto Word Pool (Friendly, Franklin, Hoffman, & Rubin, 1982). They were also controlled to develop a consistent stimuli set. Words were grouped into quintuples and matched on log frequency in CELEX database (all SDs < 0.60 and all SEs < 0.30). Finally, we formed 192 unique mathematical equations [e.g., (2*3) - (2+3) = 1] to present as memory interference between encoding and retrieval phases (see Conway & Engle, 1996 for a similar design). Half of the equations were correct. Incorrect equations were further divided into two equal groups: The results were either plus or minus one of the correct result.

3.4.3 Apparatus

Stimuli were presented on a TFT LCD 22-inch widescreen monitor operating at 60 Hz with a resolution of 1680 x 1050 pixels (501.7 mm x 337.4 mm). The monitor was placed 640 mm in front of the participant. A chin and forehead rest was used to reduce head movements. Participants’ eye movements were monitored using SR EyeLink 1000 (sampling rate: 1000 Hz, spatial resolution < 0.5°, http://sr-research.com/eyelink1000.html). Viewing was binocular but only the left eye was monitored. Auditory material was produced by a native female speaker of British English in a sound attenuated room and recorded using Audacity (version 2.1.10, https://www.audacityteam.org). Participants responded (yes/no they had seen the word) by pressing one of two keys on a standard keyboard. Eye movement data were extracted using the SR EyeLink Data Viewer (version 2.4.0.198, https://www.sr-research.com/data-viewer/). No drift or blink correction procedure was applied. Data were analysed and visualised in R programming language and environment (R Core Team, 2017). Mixed-effects models were constructed with lme4 package (Bates, Mächler, Bolker, & Walker, 2015). Significance values of the likelihood tests and coefficients in models were computed based on the t-distribution using the Satterthwaite approximation with lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2015).

3.4.4 Procedure

Eye tracking started with a standard nine-point calibration and validation, which confirmed high data quality (average calibration error < 1° and maximum calibration error < 1.50°). As spelled out in detail below, each trial was composed of five consecutive phases: (1) fixation (2) encoding, (3) cueing, (4) interference and (5) retrieval (See Figure 3.1). The task was to decide whether an auditorily presented word had appeared before or not (i.e., yes/no verbal recognition memory test). As soon as the participants made yes/no judgement by hitting one of the response buttons, the trial ended, and a new encoding phase began. (1) Fixation: A fixation cross appeared at the centre of the screen for 500 ms. (2) Encoding: Participants were presented four words on a 2 x 2 grid for 1600 ms. Words (Times New Roman, font size = 40) were centrally placed in rectangular boxes (285 x 85 in pixels, 7.6° x 2.4° of visual angle). By using boxes during encoding and retrieval, we aimed to enrich the spatial context in order to evoke more reliance on the space and thereby, observe looking at nothing when remembering short verbal information as words (see Spivey & Geng, 2000 for the effect of spatial context). (3) Cueing: A flashing black dot appeared in cue trials for 1000 ms either in the same (congruent cue) or in the diagonal quadrant (incongruent cue) as the original location of the probe word in the encoding phase. There was also a third condition where no cue was presented between encoding and interference. Cue condition was a within-subjects variable and three cue conditions were randomly presented in a session. That said, an equal number of random participants (n = 16) saw the same probe word with a congruent cue, an incongruent cue or without any cue. (4) Interference: Participants were exposed to retrospective memory interference which was irrelevant to the main task. We expected to push out old information (i.e., encoded words) from the episodic buffer (Baddeley, 2000) and encourage participants to depend on spatial indices for the retrieval of words without using explicit indexing (see Martarelli, Chiquet, Laeng, & Mast, 2017 for a similar paradigm). Hence, participants were presented a mathematical equation and asked to identify whether the equation was correct or not within 10,000 ms. (5) Retrieval: The probe word was auditorily presented as participants looked at the blank grid with empty boxes. There was a 500 ms gap at the between the presentation of the blank retrieval screen and the sound file was played. Participants were asked to make an unspeeded yes/no judgement to determine whether they had seen the probe word among the four words shown in the encoding phase within 10,000 ms (or they timed-out). The order of trials and equations were fully randomised independent of each other. The location of all words in all conditions was counterbalanced with Latin Square design to control gaze biases so that each word appeared an equal number of times in each location of the grid. The experiment was divided into four equal blocks with 48 trials in each block and there was a short pause between blocks. A typical session lasted approximately 60 minutes, including consent and setting up the eye tracker. Overall accuracy in interference equations and in the recognition memory test for words were 86% and 81% respectively, suggesting that participants attended to the task with high concentration. Following the experiment, a computerized version of the Corsi block-tapping task (Corsi, 1972) operated on PEBL (Psychology Experiment Building Language, version 0.13, test battery version 0.7, http://pebl.org) (Mueller & Piper, 2014) was used to measure visuospatial short-term memory.

(Figure 3.1) Figure 3.1 A schematic illustration of the temporal order of events in an example trial showing three different cue conditions. In this example, the relevant quadrant is the top left location, where the probe word (i.e., CHIN) appears.

3.5 Results

3.5.1 Measures

Results were analysed in two parts as memory performance and looking behaviour.

Memory performance: Hit rate and hit latency rate were used as measures of memory performance. Hit rate was the proportion of yes trials to which the participants correctly responded yes. Hit latency was the time in milliseconds between the onset of auditory presentation of the probe word and correct keyboard response. Participants were not instructed to make speeded response in the current paradigm. Nevertheless, hit latencies were reported to verify and complement hit rate.
Looking behaviour: Fixation percentage was used as the main gaze measure and dependent variable as in previous looking at nothing studies discussed above (e,g., Wantz et al., 2015). Fixation percentage (or fixation frequency) is the percentage of fixations in a trial falling within a particular interest area in proportion to total fixations in a trial. Thus, it was computed by dividing the number of fixations on each quadrant to the total number of fixations during the retrieval phase (see Wenzel, Golenia, & Blankertz, 2016 for a similar computation and use of fixation frequency).
Words in the study were of varying lengths and thus, had different presentation durations. Fixation percentage was purposefully chosen as it is immune to such differences in durations. Further, we assumed that fixations rather than the time spent on particular region (i.e., dwell time per quadrant) are important for the link between memory and eye movements. Fixation-based measures are reliable indicators of memory load and attention in a given location (e.g., Just & Carpenter, 1980; Meghanathan, van Leeuwen, & Nikolaev, 2015). Hence, we preferred fixation percentage over dwell time percentage as a more refined indicator of looking at nothing2. Accordingly, we expected that participants would fixate on the relevant quadrant to derive support from the environment. Four rectangular interest areas corresponding to the quadrants were identified. All interest areas were of the same size (502 x 368 in pixels, 13.4° x 10.6° of visual angle). They framed the rectangular boxes that words were presented in (see Figure 3.1) and were not contiguous (see Jahn & Braatz, 2014 for a similar arrangement). Interest areas occupied 93.58% of the total screen area. A circular interest area with a diameter of 40 pixels (1.1° of visual angle) was defined at the centre of the grid. Participants’ head were positioned on a head and chin rest to minimise head movements and we assumed that looking at the centre was the baseline looking behaviour in contrast to the looking at the relevant quadrant. Negative correlation between looks to the centre and the relevant quadrant confirmed this inverse relationship; rs (46) = -.73, p < .0001. Proportion of fixations accrued on the interest areas during the retrieval phase (from the onset of auditory presentation of the probe word until the participant’s response) were calculated. Fixations were a minimum duration of 40 ms. First fixations and fixations outside the interest areas (7.91%) were omitted. Only hits (i.e., correct responses) in yes trials were included in the fixation analyses. Fixation percentages allocated to the three quadrants that did not contain the target probe word were averaged into one and analysed against the relevant quadrant in which the probe word was seen.

3.5.2 Mixed-effects modelling

Data were analysed using linear and binomial logit mixed-effects modelling. Visual inspections of residual plots did not reveal any obvious deviations from homoscedasticity or linearity. Linear models were fit for continuous target variables (hit latency and fixation percentage). Binomial models were fit for categorical target variables (hit rate) and with bobyqa optimiser to prevent non-convergence. Participants and items were treated as random effects to explain by-participant and by-items variation (Baayen, Davidson, & Bates, 2008). We started fitting models by building the random effects structure and followed a maximal approach. That is, random effects were included as both random intercepts and correlated random slopes (random variations) as long as they converged and were justified by the data (Barr, Levy, Scheepers, & Tily, 2013). Random intercepts and slopes were included even if they did not improve the model fit in order to control for possible dependence due to repeated measures or order effects. In particular, imageability and word length among the lexico-semantic variables were selected to add as random slopes as long as the models converge. Random effects structure was simplified step by step as per the magnitude of the contribution of a random effect to the explanation of the variation in the data. That is, the random effect with the weakest contribution was dropped first and if necessary, the structure was further reduced accordingly. Contribution of a fixed effect was investigated by comparing a full model containing the effect in question against a reduced model in which only that effect was removed, or a null model without any fixed effects. Compared models had the same random effects structure (Winter, 2013).

3.5.3 Memory performance

Hit rate

We analysed whether there was a difference in hit rate across congruent and incongruent cue conditions. Fixed effect was cue location with two levels (congruent and incongruent cue). Imageability was added as random slopes into participants. Imageability and word length were added as random slopes into items. Cue location did not improve the model fit when compared against a null model; χ2(1) = 0.01, p = .91. In other words, participants retrieved the probe words in incongruent cue condition (mean hit rate = 81%) as accurately as congruent cue condition (mean hit rate = 81%). Cue location did not improve the model fit either when no cue condition (mean hit rate = 82%) was included; χ2(2) = 0.48, p = .79.

Hit latency

Linear mixed-effects models were fit to identify any difference in hit latency between cue conditions. Fixed effect was cue location with two levels (congruent and incongruent cue). Imageability, word length and cue location were added as random slopes into participants. Imageability and word length were added as random slopes into items. As in hit rate, likelihood tests indicated that there was no difference in hit latency between congruent (mean hit latency = 1807.48 ms) and incongruent (mean hit latency = 1830.94 ms) cue condition; χ2(1) = 1.47, p = .23. Results did not change when no cue condition (mean hit latency = 1842.84 ms) was included; χ2(2) = 2.59, p = .27.

Effect of visuospatial memory on memory performance

The effect of visuospatial memory capacity of participants as measured by Corsi-block tapping test on hit rate and hit latency was examined. As reported above, we did not find any differences in hit rate or hit latency across cue conditions. Thus, mixed-effects models including all cue conditions were fit. Fixed effect was Corsi-block tapping score. Hit rate: Imageability was added as random slopes into participants. Corsi-block tapping score improved the model fit; χ2(1) = 9.39, p = .002. Participants with better visuospatial memory retrieved the probe words from memory more accurately; B = 0.01, z = 3.19, p = .001. Hit latency: Word length were added as random slopes into participants. Imageability and word length were added as random slopes into items. Corsi-block tapping score did not improve the model fit; χ2(1) = 0.55, p = .46.

3.5.4 Looking behaviour

Looking at nothing for words

First, we examined whether there was a difference in spontaneous looking times between relevant and irrelevant quadrants during the retrieval phase. The target variable was fixation percentage in the correctly answered yes trials. Fixed effect was quadrant with two levels (relevant and irrelevant quadrant). Imageability was added as random slopes into participants. Imageability and word length were added as random slopes into items. Likelihood tests showed that quadrant significantly improved the model fit; χ2(1) = 22.85, p < .0001. Overall, participants looked significantly more at the relevant quadrant as compared to the irrelevant quadrant when retrieving probe words from memory; B = 0.03, t = 4.78, p < .0001.

Effect of visuospatial interference on looking at nothing

We examined looking at the relevant and the irrelevant quadrants within congruent, incongruent and no cue conditions separately to specify the effect of visuospatial interference on looking at nothing (see Figure 3.2). The target variable was fixation percentage in correctly answered yes trials. Fixed effect was quadrant with two levels (relevant and irrelevant quadrant). Imageability was added as random slopes into participants; imageability and word length were added as random slopes into items in all models.
Congruent cue condition: Quadrant improved the model fit; χ2(1) = 27.51, p < .0001. Participants looked significantly more at the relevant quadrant compared to the irrelevant quadrant in congruent cue condition; B = 0.05, t = 5.25, p = .0001.
No cue condition: Quadrant improved the model fit with a smaller magnitude compared to congruent cue condition; χ2(1) = 5.00, p = .03. Participants looked significantly more at the relevant quadrant compared to the irrelevant quadrant in no cue condition; B = 0.02, t = 2.24, p = .03.
Incongruent cue condition: Quadrant did not improve the model fit; χ2(1) = 0.61, p = .43. Participants did not look significantly more at the relevant quadrant during the retrieval phase; B = 0.007, t = 0.78, p = .43. Models including quadrant with three levels (i.e., relevant and irrelevant quadrant and central interest area) indicated that participants did not look at any region more than the other (ps > .05) in incongruent cue condition.
Difference between conditions: Models with fixation percentage in the relevant quadrant as the target variable were fit by taking the no cue condition as the baseline condition. Results revealed that participants looked at the relevant, blank locations more frequently in congruent cue condition as to no cue condition; B = -0.03, t = 2.48, p = .01. There was not such a difference between no cue and incongruent cue (p = .95).

(Figure 3.2) Figure 3.2 Proportion of fixations in the relevant and irrelevant quadrants across three cue conditions. Values on the y axis correspond to percentages (e.g., 0.1 = 10%). Notched box plots show median (horizontal line), mean (yellow dot), 95% confidence interval of the median (notch), interquartile range (the box), the first and the third quartiles (lower and upper ends of the box) and ranges (vertical line). Grey dots represent data points. * p ≤ .05, **** p ≤ .0001

Visuospatial memory and looking behaviour

Visuospatial memory as a predictor of looking at nothing

The effect of visuospatial memory capacity of participants as measured by Corsi-block tapping test on looking at the relevant, blank quadrant was investigated. The target variable was fixation percentage in the relevant quadrant in correctly answered yes trials. Fixed effect was Corsi-block tapping score. Word length was added as random slopes into participants; imageability was added as random slopes into items in all models.
Congruent cue condition: Corsi-block tapping score did not predict looks in the relevant quadrant; χ2(1) = 1.07, p = .30 or irrelevant quadrant; χ2(1) = 1.50, p = .22.
No cue condition: Corsi-block tapping score did not predict looks in the relevant quadrant; χ2(1) = 0.30, p = .58 or irrelevant quadrant; χ2(1) = 1.52, p = .22.
Incongruent cue condition: Corsi-block tapping score improved the model fit for fixation percentage in the relevant quadrant; χ2(1) = 4.83, p = .03 but not the irrelevant quadrant; χ2(1) = 0.67, p = .41. Participants with better visuospatial memory looked less at the relevant quadrant during memory retrieval when there was an incongruent cue between encoding and retrieval phases; B = -0.0009, t = 2.28, p = .03.

Correlation between visuospatial memory and looking behaviour

We tested the correlation between visuospatial memory as a function of Corsi-block tapping test and fixations to relevant, irrelevant and central interest areas under three different cue conditions (see Figure 3.3).
Relevant quadrant: There was a significant, negative correlation between visuospatial memory capacity and fixation percentage in the relevant quadrant under incongruent cue condition; rs (46) = -.37, p = .009. Participants with better visuospatial memory tended to look less at the relevant quadrant when there was an incongruent cue between encoding and retrieval phases. There was not such a correlation within congruent; rs (46) = .20, p = .18 or no cue conditions; rs (46) = -.18, p = .22.
Irrelevant quadrant: There was a significant, negative correlation between visuospatial memory capacity and fixation percentage in the irrelevant quadrant under the no cue condition; rs (46) = -.29, p = .05. Participants with better visuospatial memory tended to look less at the irrelevant quadrant when there was not any cue between encoding and retrieval phases. There was not such a correlation within congruent; rs (46) = -.26, p = .07 or incongruent cue conditions; rs (46) = -.16, p = .27.
Central interest area: There was a significant, positive correlation between visuospatial memory capacity and fixation percentage in the central interest area under congruent cue condition; rs (46) = .39, p = .006, no cue condition; rs (46) = .30, p = .04 and incongruent cue condition; rs (46) = .33, p = .02. All conditions combined, participants with better visuospatial memory tended to look more at the central interest area; rs (46) = .30, p = .04.

(Figure 3.3) Figure 3.3 Scatterplots showing the correlations between visuospatial memory as a function of Corsi-block tapping score (higher score means better visuospatial memory) and fixation percentage in the relevant, irrelevant and central interest area. Values on the y axis correspond to percentages (e.g., 0.25 = 25%). Scatterplot has a linear regression line. Blue band around the line represents 95% confidence interval. Tassels at the x and y axis illustrate the marginal distribution of data along visuospatial memory and fixation percentage.

Functionality of looking at nothing

The current experiment was not designed to test the functionality of looking behaviour in memory. Nevertheless, we examined whether memory performance (hit rate and hit latency) was predicted by the proportion of fixations in the relevant, blank locations. Imageability was added as random slopes into participants. Imageability and word length were added as random slopes into items. Looks to relevant, blank locations predicted hit rate; B = 0.54, z = 2.28, p = .02 in congruent cue but not in no cue; B = 0.31, z = 1.22, p = .22. Hit rate also predicted looking at nothing in congruent cue condition only; B = 0.05, t = 2.29, p = .02. Looks to relevant, blank locations did not predict hit latency in neither of the cue condition (congruent cue; B = -44,74, t = 0.66, p = .51, no cue; B = 53.56, t = 0.70 p = .48). Interaction between Corsi-block tapping score and fixation percentage in the relevant quadrant did not predict hit rate or hit latency (ps > .05).

3.6 Discussion

The purpose of the current study is to shed light on the nature of spatial indexing and looking at nothing mechanisms and particularly, to investigate the relationship between internal and external memory within memory for language. To this end, we asked three questions as to automaticity of spatial indexing, dynamicity of spatial indexing and looking at nothing and the effect of individual differences in visuospatial memory on looking behaviour.

3.6.1 Looking at previous word locations

Results showed there were significantly more fixations in the relevant, blank region where the probe word appeared at the encoding stage relative to other, irrelevant blank regions during memory retrieval. In other words, participants looked at nothing when retrieving simultaneously and visually presented single words. Our results in the congruent and no cue conditions were in line with the previous studies evidencing looking at nothing when remembering verbal information (Hoover & Richardson, 2008; D. C. Richardson & Kirkham, 2004; D. C. Richardson & Spivey, 2000; Scholz et al., 2018, 2011, 2016). That is, we replicated the corresponding area effect (Wantz et al., 2015). The novelty of this study lies in the linguistic information to be retrieved and how it is encoded and remembered. As discussed in introduction, people saw “fact-teller” objects along with the spoken information in the previous studies which document looking at nothing for language memory (e.g., Richardson & Spivey, 2000). Thus, participants associated verbal information with external visual information. Such an explicit indexing might have motivated participants to rely on the environmental sources. Whereas, linguistic information was not explicitly associated with any visual object in the current study. Further, words appeared in the four cells of the grid at the same time (see Vankov, 2011). Lastly, memory was not probed with details about factual information or correct/incorrect statements but in a simple recognition memory test. In such a relatively minimal and ecologically valid retrieval scenario, participants offloaded memory work onto the environment by simulating locations unintentionally when retrieving linguistic information from memory. It is also important note that looking at nothing in the present study occurred following an intervening task (i.e., judging a maths equation). Simulating spatial locations following a demanding task might suggest that looking at nothing is a not mere residual of the encoding process but rather, an efficient means of memory retrieval (see Renkewitz & Jahn, 2012). To put in a nutshell, our findings suggest that looking at nothing could be a more robust and ubiquitous behaviour than previously documented. One limitation of the current study could be the use of boxes. We aimed to enrich the spatial context on the screen by following the methodology in Spivey and Geng (2000) by placing words in rectangular boxes at the encoding stage in both experiments. Importantly, participants were asked to remember the probe word while looking at a retrieval screen with boxes without the words in them. This methodology allowed us to identify narrower and thus, more specific interest areas (i.e., boxes) than quadrants of the grid. However, to what extent remembering words while looking at a screen with empty boxes meets the definition of looking at “nothing” in strict terms can be discussed. Replication studies are necessary to ascertain that word locations can be registered, simulated and referred back to via eye movements without any contextual enrichment such as boxes (see Chapter 6 for a methodology where words are not placed in boxes).

3.6.2 Indexing word locations

Visuospatial cues affected spatial indexing and thus, looking at nothing in line with our predictions. Participants looked at relevant, blank locations in congruent cue condition, that is, when the cue appeared in the same location as to the probe word. Importantly, there were also more looks in the relevant quadrant when there was no cue between encoding and retrieval stages (pure looking at nothing). Findings from no cue condition suggest that looking at nothing is not driven by mere attentional shift. Rather, eye movements in the present study resulted from the spatial indices associated with words and thus, were governed by memory for language. On the other hand, looking at nothing did not occur when the visuospatial cue appeared in a diagonal location as to the original location of the probe word (i.e., incongruent cue condition). Results indicate that participants formed spatial indices corresponding to simultaneously presented single words even though locational information was not required in the memory task. Spatial indices were formed for subsequent cues as well. Emergence and magnitude of looking at nothing were determined by the relationship between the spatial indices for words and cues. Congruent cues reinforced the encoded locations and amplified the corresponding area effect as expected. In turn, participants looked more at the relevant locations in congruent cue condition as compared to no cue condition. In contrast, incongruent cue functioned as interference. When spatial indices associated with words and visual cues did not match, the initial index attached to the word was updated. Consequently, eye movements to the relevant, blank location were disrupted. It is important to note that participants did not look at any blank region (relevant, irrelevant or centre) more than the other regions in incongruent cue condition. Such a behaviour suggests that spatial codes corresponding to words and visuospatial cues were in competition when they did not refer to the same location. We can conclude that word locations were registered in all cases. Participants were given only 1600 ms to study four words leaving 400 ms for each word. Thus, we can argue that word locations were encoded almost instantaneously upon the presentation. Further, locations were encoded unintentionally suggested by the fact that participants were naïve to the purpose of the experiment and they were not instructed to remember word locations (cf., Andrade & Meudell, 1993; Naveh-Benjamin, 1988). Informal interviews with the participants after the experiment suggested that locations were indexed without awareness. In keeping with this, it appears safe to argue that spatial indexing mechanism in the current study meets most of the automaticity criteria (Moors & De Houwer, 2006). In this regard, our results contrast with Pezdek et al. (1986), which shows automatic spatial encoding for objects but not words. In conclusion, we present that not only the existence but also the magnitude of looking at nothing is determined by the strength and stability of spatial encoding. Although spatial indexing and looking at nothing are inherently different processes, they are both linked to each other in a dynamic relationship.

3.6.3 Looking at nothing and visuospatial memory

The chief finding of the study is the relation between visuospatial memory capacity and the tendency to look at blank locations. To our knowledge, this is the first direct evidence showing individual differences in looking at nothing. We showed this relation in predictive and correlational analyses. There was a positive correlation between visuospatial memory measured with Corsi-block tapping test and fixation percentages in the central interest area during retrieval in all cue conditions. Higher visuospatial memory predicted less looking at nothing under incongruent cue condition. In line, there were negative correlations between visuospatial memory and fixations in relevant (within incongruent cue condition) and irrelevant locations (within no cue condition). Taken together, participants with better visuospatial memory, thus richer internal sources, looked more at the centre of the screen rather than looking at relevant (or irrelevant) locations. Central interest area was the initial and thus, default looking position prompted by a central fixation cross shown before each trial. Given that participants’ head was stabilised on the chinrest, we assume that participants with better visuospatial memory who “looked” at the centre of the screen; in fact, did not look at any specific area. In other words, they sustained their attention on the internal sources rather than the external codes in space by not launching fixations to relevant, or as a matter of fact, irrelevant regions. Such a looking behaviour can be comparable to cases in which individuals avert their gazes (Glenberg, Schroeder, & Robertson, 1998) or close their eyes (Vredeveldt, Hitch, & Baddeley, 2011) in order to disengage from the environment in the face of cognitive difficulty. Here, we surmise that participants with better visuospatial memory did not feel the “necessity” to rely on the blank locations as their internal memory was sufficient to retrieve the probe word accurately. Thus, they did not look at any regions in a task where moving their eyes could drain cognitive sources further (see Scholz et al., 2018). This interpretation was supported by the fact that participants with better visuospatial memory did better in the memory test in general. Further, participants with better visuospatial memory looked less at nothing when they saw an incongruent cue. The negative correlation between visuospatial memory and fixations in the relevant quadrant within incongruent cue condition illuminates another dimension of the coordination between internal and external memory. We argue that additional and incongruent visuospatial information made the environment unreliable for a successful memory retrieval. In the event of such spatial interference, participants with better visuospatial memory seemed to ignore any deictic code either attached to words or cues and, turned to internal sources. It appears that unreliability of the external memory was detected as a function of the strength of internal visuospatial memory. Overall, findings support the integrated memory account (Ferreira et al., 2008; D. C. Richardson et al., 2009) where internal memory representations and spatial indices which are internalised with eye movements work cooperatively to realise fast and efficient retrieval. On the other hand, results are at odds with the view that looking at nothing is an automatic attempt to access contents of the spatial index (Spivey et al., 2004). If looking at nothing were an automatic behaviour as spatial indexing, all participants would be expected to display the same behaviour regardless of their memory capacity. Rather, results demonstrate that looking at nothing systematically changes not only with the task conditions (e.g., memory demands coming from the task difficulty) (Scholz et al., 2011; Wantz et al., 2015), encoding conditions (e.g., explicit/implicit spatial indexing), or retrieval conditions (e.g., type of retrieval questions, grid arrangement) (Spivey & Geng, 2000) but also cognitive differences between individuals. Coordination between internal and external memory in looking at nothing presents further evidence for the dynamicity account of looking at nothing. On a larger scale, findings extend the literature showing that the likelihood of cognitive offloading is determined by the abundance of internal sources (Risko & Dunn, 2015). One important aspect here is consciousness. Previous studies showing more frequent cognitive offloading as a consequence of worse internal capacity typically offers offloading as an option to the participants (e.g., Risko & Dunn, 2015). However, looking at nothing is an unintentional and presumably an unconscious behaviour in that participants in our study (and previous looking at nothing studies reviewed above) were never instructed to pay attention to word locations and that they can rely on the environment whenever they encountered retrieval difficulty. Even though, they still used the environment in an intelligent way (Kirsh, 1995) and further, this behaviour was modulated by their internal capacity. Such an unintentional trade-off between internal and external memory might suggest that cognitive offloading to minimise memory load could be a deeply-entrenched but an unconscious memory strategy. That said, consciousness and intentionality were not systematically tested in the current study. Future studies should be designed in a way to investigate whether looking at nothing is a completely unconscious behaviour, or whether we have some kind of control on our “decision” to offload memory work onto the world.

3.6.4 Looking at nothing and memory performance

Results showed that participants who looked at the relevant, blank locations retrieved the probe words more accurately only in congruent cue condition. Accuracy predicted more looks in the relevant quadrant within congruent cue condition as well (see Martarelli et al., 2017; Martarelli & Mast, 2011; Scholz et al., 2014 for looking at nothing occuring in correct trials but not in error trials). Looking at nothing did not predict hit rate in the no cue condition or hit latency in none of the conditions. Thus, we did not present any conclusive evidence that looking at nothing improves memory performance. It can well be argued that Simon-like congruency effect (as in congruent condition) accounts for the enhanced memory rather than fixations in the relevant, blank quadrant. Along with that, eye movements at retrieval were not manipulated in the current study unlike previous studies (e.g., Scholz et al., 2014). Hence, our method to investigate the functionality of gaze position lacks direct causality between looking at nothing and accuracy. Consequently, results reported here cannot distinguish whether participants who looked at nothing were more accurate or participants who were more accurate also looked at nothing. Findings showing partial functionality in the present work should be interpreted cautiously due to methodological limitations.

3.7 Conclusion

Looking at nothing is a unique case in that it demonstrates how the cognitive system can maximize efficiency by spreading the cognitive problem across three domains with the act of looking, the environment with the spatial indices and mental representations in the brain. Our results extended the current literature by shedding further light on the nature of spatial indexing and looking at nothing mechanisms. We provide evidence for automatic and dynamic spatial indexing and a dynamic, efficient looking at nothing behaviour for words. The major contribution of this study is showing a systematic trade-off between internal and external sources driven by individual cognitive differences in order to make the most of environmental opportunities and cognitive capacity. Finally, the current looking at nothing paradigm provides a venue to study the relationship between language and looking at nothing.


  1. The same analyses were performed with dwell time percentages as well and findings were consistent with the analyses based on fixation percentages reported here.↩︎