Participants
The sample consisted of participants in the longitudinal BabyTwins Study Sweden (BATSS35), which were recruited from the national population registry (only the greater Stockholm area was selected, due to in-person assessments in Stockholm). From 2016 to 2020, 311 families (29% of the entire population of same-sex twins born in the area) participated in the multi-methods assessment at 5Â months. Data collection was performed at the Centre of Neurodevelopmental Disorders at Karolinska Institutet (KIND) in Stockholm, Sweden. In general, the study sample has a high socioeconomic status, and it includes mainly Swedish-born families (90% of twin pairs had at least one parent born in Sweden). See Table 1 for sample demographics (in-depth demographics are reported elsewhere35). Parents gave informed consent to take part. BATSS was approved by the regional ethics board in Stockholm and was conducted in accordance with the Declaration of Helsinki.
General exclusion criteria for the study were opposite-sex twin pairs, diagnosis of epilepsy, known presence of genetic syndrome related to autism, uncorrected vision or hearing impairment, very premature birth (prior to week 34), presence of developmental or medical condition likely to affect brain development (e.g., Cerebral Palsy), and infants where none of the biological parents were involved in the infantâs care. Among the recruited and tested infants, 3 twins were excluded from analysis because they subsequently were found not to fulfil the general criteria (above) due to seizures at the time of birth (nâ=â2 infants) and spina bifida (nâ=â1 infants). In addition, we excluded 24 infants due to twin-to-twin transfusion syndrome (12 twin pairs) and one infant due to birthweight below 1.5 kg. Condition-specific criteria for exclusion are described in the âEye tracking procedure and stimuliâ section.
Eye tracking procedure and stimuli
The stimuli used in this study (Non-social condition36; Social condition37; Mixed condition38) were not purposefully designed to measure gaze lateralization, but were used for this purpose as we believe they fulfil the characteristics necessary to answer our research questions. In all three conditions, the sample is based on the same set of infants from BATSS35, although the sample varies slightly in each condition due to stimuli-specific exclusion criteria (see specifications in each condition section).
For the Social and Non-social conditions, gaze data was recorded using the Tobii T120 eye-tracker with a sampling rate of 60 Hz, using a standard Tobii monitor at native resolution (1024âÃâ768). For the mixed condition, a Tobii TX300 eye-tracker was used, with a sampling rate of 120 Hz. The infant was seated in a baby chair or in the parentâs lap, approximately 60 cm from the screen. Before the eye tracking session, a 5-point calibration video was presented, and the experimental task did not begin until a successful calibration was achieved. For the Social and Non-social conditions, another 5-point video for offline calibration validation purposes was shown once in the beginning of the eye-tracking session.
Social condition
The stimuli consisted of 12 videos in which a woman sings nursery rhymes, 4 videos in which a woman is talking (rhyme verses) and 4 videos in which a woman is only smiling (Fig. 1). The primary goal of these stimuli was to measure eye versus mouth looking (already published data37). The videos were shown in a pseudo-random order, unique to each participant. In all videos, a woman was centered in the video and the background was grey (there were two women, each of them contributing equally to all conditions). The length of the videos ranged from 4 to 12 s (total duration was 153 s). Further details are provided elsewhere37. A dynamic area of interest (AOI) was created for each frame of the videos (Fig. 1c). The horizontal radius of the ellipse is 200 pixels, and the vertical radius is 280 pixels. In an earlier study of the tendency to look at the eyes and the mouth using the same stimuli and sample37, it was found that most infants tended to look at the eyes. Since the eyes are also spatially separated from each other, the face AOI was divided into an upper and a lower part (Fig. 1c), and in a deviation from the analysis plan, the eye region was primarily used for all further analyses (analyses involving the mouth region are reported in Supplementary Information S1). Lateralization was measured as the total amount of time looking at the left hemiface relative to the total amount of time looking at the whole face, calculated separately for the lower and the upper part of the face (giving us a scale of 0â1, where 0 means not looking at the left hemiface at all, while 1 means looking at the left hemiface 100% of the time). A value over 0.5 means, therefore, that there is a left gaze bias, while a value below 0.5 suggests a right gaze bias. Inclusion criteria for this condition was looking at the face for at least 20% of the total duration of the condition (i.e., 30.6 s; exclusions were not made on a trial-by-trial basis, since the stimuli was very similar in all videos and always centered in the video). In total, 21 participants were excluded from this condition due to this criterion. In addition, 2 infants were excluded due to non-Swedish speaking parents, 6 due to technical issues, 7 due to lack of time, 2 due to lack of room, and 4 due to being too tired or too fussy. The final sample in this condition consisted of 552 infants.
Non-social condition
The stimuli consisted of 8 videos (each shown for 16 s), which contained a series of images, each of which showed two sets of dots, appearing on the left and right sides of the screen (Fig. 2). The primary goal of these stimuli was to measure the approximate number system (already published data36). Each image was unique in terms of a specific spatial constellation of dots. On one side of the screen, the collection of dots was numerically constant, while on the other side the collection of dots alternated in numerosity. The side with alternating numerosity switched between 10 and 20 dots (1:2 ratio condition) or 6 and 24 dots (1:4 ratio condition). The side with constant set sizes showed 10 dots and 6 dots, respectively, for these conditions. Each condition consisted of four stimulus videos, which were counterbalanced in terms of left vs. right location of the side with alternating set size. In half of the images where the two sets of dots differed in numerosity, the two sets of dots were matched on the total surface area. In the other half, the two sets of dots were matched on individual dot size. In 50% of the videos, the two sets of dots were controlled for convex hull (the smallest convex polygon that contains a set of dots). Lateralization in this condition was measured as the amount of time looking at the left side of the screen, relative to the amount of time looking at the whole screen. In order to create this variable, we averaged the percentage of viewing time at the left side of the screen (relative to the whole screen) for trials where the numerically changing side was on the left side and on the right side, respectively. This was done separately for each condition, and was then averaged to create the final variable (there was no statistically significant difference in lateralization between the two conditions; t(513)â=ââââ0.830, pâ=â0.407). By doing this, the final measure contained the same amount of information from trials where the numerically changing side was on the left side of the screen as from trials where the opposite was true, creating a non-biased variable. Inclusion criteria for this condition was looking at the screen for at least 20% of the total duration of each video (approximately 3.2 s), in order to allow the infants to observe the numerically changing dots. Infants were included in further analyses only if they had at least four valid trials (of which two from each condition, counterbalanced in terms of left vs right location of the numerically changing side). Due to these criteria, 61 participants were excluded from this condition. In addition, 6 infants were excluded due to technical issues, 7 due to lack of time, 2 due to lack of room, and 4 due to being too tired or too fussy. The final sample in this condition consisted of 514 infants.
Mixed condition
Stimuli consisted of 6 different complex displays of objects (Fig. 3), including a face (with direct eye-gaze; counterbalancing ethnicity and location of the face within the array) and 4 non-face competitors (including a ânoiseâ stimulus generated from the same face, a mobile phone, a bird, and a car). The primary goal of these stimuli was to measure face orienting and preference (already published data38). In two trials the face was to the right of the screen, in two trials it was to the left, and in two trials the face was in the middle of the screen either at the top (array in Fig. 3) or at the bottom of the screen. In a deviation from the pre-registered analysis plan, we only included the four trials where the face was either to the left or to the right of the screen, since it was found in an earlier study that the infants preferred looking at the face when viewing these images38 and the lateralization measure therefore might be biased if images with faces in the middle of the screen are included. These images were shown for 20 s each, in a fixed order. Lateralization in this condition was measured as the amount of looking time at the left side of the screen, relative to the whole screen (first averaged for valid trials where the face was either to the left or to the right, to create an unbiased average lateralization score). A value over 0.5 means, therefore, that there is a left gaze bias, while a value below 0.5 suggests a right gaze bias. A trial was classified as valid if the infant looked at the screen for at least 20% of the total duration of the video (i.e., 4 s, as the total duration was 20 s). Infants were included in further analyses only if they had at least two valid trials, of which one where the face was to the left and one where the face was to the right. Due to this criterion, 14 participants were excluded from this condition. In addition, 23 participants did not partake in this experiment due to lack of time or technical issues, and were therefore not included in this condition. The final sample in this condition consisted of 559 infants.
Parent-rated questionnaires
The CSBS DP Infant Toddler Checklist, ITC39, is a 24-item parent-rated questionnaire, used to identify children with any type of communication delay. Lower scores indicate a higher degree of socio-communicative delays. Items include, for example, questions on whether the parent knows when the child is happy or sad, and whether the child lets the parent know when they need help reaching an object. It was administered at 14Â months (age range was 387â525Â days), and we used the total score as a measure of socio-communicative behaviors. Data from one individual was excluded due to too old age (806Â days).
The MacArthur Communicative Development Inventory, CDI40, is a parent-rated questionnaire that assesses early language development. It was administered at 14Â months (the Words and Gestures form, age range was 386â516Â days) and 36Â months (the Words and Sentences form, age range was 1086â1401Â days). As a measure of receptive vocabulary at 14Â months, we used the total number of words (out of 370 words) that the infant could understand but not produce. At 36Â months, we used the vocabulary checklist score as a measure of expressive vocabulary.
The Quantitative Checklist for Autism in Toddlers, Q-CHAT41, is a normally distributed quantitative measure of autistic traits, which consists of 25 parent-rated items scored on a 5-point scale (0â4) and was administered at 36Â months (age range was 1074â1401Â days). The scores from all items are summed to obtain a total score, where higher scores indicate more autistic traits. Data from two individuals were excluded due to insufficient age (735Â days and 783Â days).
Experimenter-rated developmental assessment
The Mullen Scales of Early Learning, MSEL42, was administered by an experimenter at 5Â months. This is a standardized assessment commonly used in many areas of psychology as a measure of general cognitive ability. Here, the Early Learning Composite Score was used (a standardized score derived from fine motor, visual reception, receptive language, and expressive language subscales).
See Table 2 for descriptive statistics on parent-rated questionnaires and the experimenter-rated developmental assessment.
Statistical analyses
Left versus right gaze bias was analyzed using one-sample t-tests comparing the mean lateralization against a value of 0.5 (i.e., chance level). Associations among eye tracking measures were analyzed using two-tailed Pearson correlations.
Univariate twin models were used to estimate the genetic and environmental contribution to the mean gaze lateralization in each condition. The sources of variation in a trait can be divided into genetic influences (A; heritability), shared environment (C; e.g., family environment), and unique environment (E; i.e., environmental influences that makes twins different from each other, including measurement error). Since monozygotic (MZ) twins share 100% of their segregating DNA, while dizygotic (DZ) twins on average share 50% of their segregating DNA, a higher within pair similarity among MZ twins than DZ twins suggests genetic contribution to a trait. Zygosity was determined for all twin pairs by DNA analysis. Sex and age were included as covariates in all twin models. The best fitting model was selected based on the Akaike Information Criterion (AIC).
All phenotypic associations were calculated using the robust sandwich estimator in generalized estimating equations (GEE) in order to account for the correlation between twins in a pair43, using the drgee package in R44. The variables used in these models were regressed on age and sex before analyses. Due to the explorative nature of the phenotypic analyses, we adjusted the p values using Bonferroni correction. The original significance threshold was pâ<â0.05 and the number of analyses was 15, meaning that the adjusted significance threshold was pâ<â0.003.