Treffer: Competition between parts and whole: A new approach to Chinese compound word processing.
Weitere Informationen
How compound words are processed remains a central question in research on Chinese reading. The Chinese reading model assumes that all possible words sharing characters are activated during word processing and these activated words compete for a winner (Li & Pollatsek, 2020). The present studies aimed to examine whether embedded component words compete with whole compound words in Chinese reading. In Study 1, we analyzed two existing lexical decision databases and revealed inhibitory effects of component-word frequency and facilitative effects of character frequency on the first components. In Study 2, we conducted two factorial experiments to further examine the effects of first component-word frequency, with character frequencies controlled. The results consistently indicated significant inhibitory effects of component-word frequency. Collectively, these findings support the theoretical proposition that both component words and compound words are activated and engage in competition during word processing. This provides a new approach to compound word processing in Chinese reading and a possible solution to mixed results of character frequency effects reported in the literature. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Competition Between Parts and Whole: A New Approach to Chinese Compound Word Processing
<cn> <bold>By: Qiwei Zhang</bold>>
>
> <bold>Kuan-Jung Huang</bold>
>
> <bold>Xingshan Li</bold>
>
>
<bold>Acknowledgement: </bold>This research was supported by a grant from the National Natural Science Foundation of China (NSFC; 32371156). We thank Adrian Staub for his comments on an earlier version of this manuscript. The code of analysis in Study 1 can be retrieved from https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c, and the data of Megastudy of Lexical Decision in Simplified Chinese (Tsang et al., 2018) and Chinese Lexicon Project-Tse (Tse et al., 2017) are available by corresponding the authors of the database. All the codes, materials, and data in Study 2 can be retrieved from https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c.Qiwei Zhang served as lead for data curation, formal analysis, investigation, methodology, project administration, validation, visualization, and writing–original draft. Xingshan Li served as lead for funding acquisition, resources, and supervision, contributed equally to methodology, project administration, and validation, and served in a supporting role for software and writing–original draft. Qiwei Zhang, Kuan-Jung Huang, and Xingshan Li contributed equally to writing–review and editing.
A compound word is a morphologically complex word binding together two or more morphemes (such as snowball from snow and ball); most morphemes of compound words can be used as independent words in sentences. In recent decades, whether compound words are processed via full form or components has been extensively studied in alphabetic writing systems such as English, Finnish, Dutch, Spanish, and Basque (e.g., English: Andrews, 1986; Finnish: Pollatsek et al., 2000; Dutch: Kuperman et al., 2009; Spanish and Basque: Duñabeitia et al., 2007). Meanwhile, compound words account for more than 70% of Chinese vocabulary (Beijing Language Institute, 1986). Therefore, an important research question for Chinese reading is how compound words are processed. Previous studies have shown script-specific mechanisms of compound word processing (Li et al., 2022). However, as we will review below, how Chinese readers process compound words is not fully understood, and some recent findings are mixed (Cui et al., 2021; Tsang et al., 2018; Yu et al., 2021). This study investigates the mechanism of compound word processing in Chinese reading, aiming to address the long-standing debate regarding whether Chinese words are processed in a holistic or decompositional manner.
<h31 id="xhp-50-5-479-d282e139">Compound Word Processing in Alphabetic Writing Systems</h31>Before turning to compound word processing in Chinese, it is instructive to consider findings and theories in alphabetic writing systems, where lexical decision tasks (LDTs) and natural reading tasks are commonly used (Balota & Chumbley, 1984; Taft & Forster, 1976). In LDT, participants quickly identify whether a string is a word or nonword, with response times (RTs) and accuracy rates as key metrics (Meyer & Schvaneveldt, 1971). Natural reading tasks, on the other hand, focus on eye movements to measure word processing difficulty (Rayner, 1998; Rayner & Duffy, 1986).
Three primary theories—holistic processing, decompositional processing, and dual-route processing—have been proposed to understand how compound words are recognized in alphabetic writing systems. The holistic processing theories argue that compound words are stored and retrieved as single units, supported by evidence of whole-word frequency effect showing faster recognition for more frequent whole words (e.g., Giraudo & Grainger, 2000; Hyönä & Olson, 1995; Kuperman et al., 2008). In contrast, the decompositional processing theories argue that compound words are broken down into their components for processing (Taft & Forster, 1975, 1976; Zhang & Peng, 1992). Studies supporting this theory have revealed component frequency effects that high-frequency components lead to shorter reading times (e.g., Bien et al., 2005; Hasenacker & Schroeder, 2019; Kuperman et al., 2009). The dual-route models posit that both processes operate in parallel, with the faster route taking precedence (e.g., Baayen & Schreuder, 2000; Caramazza et al., 1988; Schreuder & Baayen, 1995). Some factors like word length can affect the race: shorter words tend to be processed holistically, while longer words are often decomposed (Bertram & Hyönä, 2003; Hyönä & Pollatsek, 1998; Pollatsek et al., 2000). In summary, both whole-word and component frequencies affect compound word processing, and dual-route models offer the most comprehensive explanation for these findings (Caramazza et al., 1988; Pollatsek et al., 2000).
<h31 id="xhp-50-5-479-d282e213">Properties of the Chinese Writing System</h31>Chinese is a logographic writing system with many unique properties that distinguish it from alphabetic writing systems. One is that Chinese characters primarily convey semantic information, although they also carry phonological information. There are more than 5,000 characters in Chinese, each of which is a writing unit representing a single morpheme and syllable, except in a few multicharacter monomorphemic words such as “蝴蝶” (meaning butterfly), in which two characters together represent a morpheme. Furthermore, there are no spaces to demarcate words within a sentence.
A Chinese word can be composed of one or more characters. Compared to words in alphabetic writing systems, the mean length of Chinese words is shorter, and the variance is smaller. Based on the frequencies of the 56,008 listed words in one lexicon (Lexicon of Common Words in Contemporary Chinese Research Team, 2008), 6% of Chinese words are one character long, 72% are two characters, 12% are three characters, 10% are four characters, and less than 0.3% of the words are longer than four characters. The relationship between characters and words is complex. Most Chinese characters are one-character words; however, they can be combined with other characters to form compound words. For example, the character “人” is a word by itself (meaning people), but it can also constitute multicharacter words with other characters (such as “人群” [meaning a lot of people], “陌生人” [meaning stranger], “出人意料” [meaning unexpected]). There are two types of frequency associated with one character. One is character frequency, calculating every occurrence of the character, whether the character is an individual word or embedded in a longer word. The other is word frequency, referring to the occurrence of the character when it is used alone as an individual word. As a concrete example, in a corpus (Cai & Brysbaert, 2010), the character “人” appears 373,292 times, and the corpus contains 46.8 million characters; thus, the character frequency of “人” is 7,969 occurrences per million. On the other hand, the one-character word “人” appears 194,914 times (far less than the number of times the character “人” appears because “人” also appears as a part of other longer words). The corpus contains 33.5 million words, and thus, the word frequency of the one-character word “人” is 5,810 occurrences per million. In practice, there is a high correlation between the word frequency and the corresponding character frequency (
The visual salience of morphemes and words in written Chinese is different from that in alphabetic scripts. In written English, for example, morpheme boundaries in a compound word can hardly be identified simply with visual cues, but a space unambiguously separates two words. In contrast, morphemes are visually salient in written Chinese. This is because in Chinese, one morpheme corresponds to one character most of the time, and each character is visually represented in a uniformly sized box. However, when reading sentences, no apparent cues exist between Chinese words varying in length, and thus, words cannot be segmented simply with visual cues.
These differences between the Chinese writing system and alphabetic writing systems possibly require different models of compound word processing. For example, Chinese compound words are horizontally shorter so that they are more likely to be processed via the full-form route, according to the dual-route model (Caramazza et al., 1988; Pollatsek et al., 2000). Alternatively, because Chinese morphemes are visually salient, decomposition of compound words into individual components could be more likely. The following section reviews some evidence for or against holistic/decompositional processing of Chinese compound words.
<h31 id="xhp-50-5-479-d282e237">Previous Findings of Chinese Compound Word Processing</h31><bold>Character Frequency Effects</bold>
As with studies of other languages introduced earlier, whether Chinese compound words are accessed in a holistic or decompositional manner has been investigated by examining the effects of whole-word frequency and character frequency (Cui et al., 2021; Li et al., 2014; Ma et al., 2015; Peng et al., 1999; Sun et al., 2018; Taft et al., 1994; Tsang et al., 2018; Tse & Yap, 2018; Xiong et al., 2023; Yan et al., 2006; Yu et al., 2021; Zhang & Peng, 1992; see Table 1). In LDTs, whole-word frequency effects have been consistently found, while mixed findings of character frequency effects have been reported (Peng et al., 1999; Taft et al., 1994; Xiong et al., 2023; Zhang & Peng, 1992). In Zhang and Peng (1992), facilitative whole-word and character frequency effects were found in separate experiments. RTs were shorter when the whole-word frequency of the target was higher. RTs were also shorter when the frequency of the embedded components of the target word was higher. When both frequency effects were examined within one experiment, interactions between character frequency and compound word frequency were found, although the interaction patterns differed from one study to another (Tse & Yap, 2018; Wang & Peng, 1999). Peng et al. (1999) used a factorial design and found facilitative character frequency effects only for frequent compound words. In contrast, Tse and Yap (2018) conducted a regression analysis which contained 18,983 two-character words, and they found a facilitative character frequency effect that was stronger for words with low whole-word frequency.
>
><anchor name="tbl1"></anchor>
Some other lexical decision studies revealed inhibitory character frequency effects, showing longer RTs for words comprising more frequent characters (Tsang et al., 2018; Sun et al., 2018; Xiong et al., 2023). In a mega lexical-decision study of more than 10,000 simplified Chinese words (Tsang et al., 2018), an inhibitory character frequency effect was found after accounting for the number of words the character can form. Notably, the variable in the above studies was the average character frequency within a multicharacter word instead of the separate character frequency for each component. Sun et al. (2018) conducted a reanalysis of two existing lexical decision databases (Chinese Lexicon Project [CLP], Tse et al., 2017; Megastudy of Lexical Decision in Simplified Chinese [MELD-SCH], Tsang et al., 2018), distinguishing first and second character frequency. Regression analyses initially revealed inhibitory character frequency effects of either component. However, a subsequent post hoc analysis that employed principal components as predictors—instead of using raw variables—revealed facilitative character frequency effects. Sun et al. posited that the initial inhibitory results were artifacts stemming from collinearity in the models, and they concluded that the character frequency effects were facilitative. Nevertheless, in a recent study strictly manipulating whole-word frequency and first character frequency of compound words, Xiong et al. (2023) observed inhibitory effects of first character frequency, but only for low-frequency words. They speculated that the reversed character frequency effects might stem from the influence of neighborhood size and/or frequency.
Eye-tracking studies, like lexical decision research, consistently show facilitative whole-word frequency effects during sentence reading, with high-frequency compound words being read faster than low-frequency compound words (Li et al., 2014; Ma et al., 2015; Sun et al., 2018; Tsang et al., 2018; Yan et al., 2006; Yu et al., 2021). However, findings of character frequency effects are mixed (see Table 1 for summary). Although Yan et al. (2006) found that the fixation durations on compound words were longer when their first character frequency is low, other studies revealed shorter times for words containing high-frequency first characters (Cui et al., 2021; Xiong et al., 2023; Yu et al., 2021). Still others did not find significant character frequency effects (Li et al., 2014; Ma et al., 2015).
In summary, given the mixed findings of character frequency effects in previous studies, it is hard to conclude whether Chinese compound words are processed in a holistic or decompositional manner.
<bold>Whole-Word Effects</bold>
While it is unclear whether and how embedded characters affect compound word processing, much robust and consistent evidence have been reported supporting that words are generally processed as whole units (e.g., Li et al., 2014; Yu et al., 2021; Xiong et al., 2023). Additionally, despite the absence of visual cues for word boundaries in Chinese, evidence suggests that words are generally processed as whole units. This is supported by longer reading times when spaces or other interference were added between characters within each word, but not between words themselves (Bai et al., 2008; M. Chen et al., 2021; Li et al., 2012, 2013; Zang et al., 2013). These results suggest that Chinese readers do not process texts character by character. Additionally, word superiority effects in Chinese show that characters in words are identified faster and more accurately than in nonwords (Reicher, 1969; Shen & Li, 2012). Thus, even without explicit boundaries between words, Chinese text appears to be processed holistically during sentence reading.
<h31 id="xhp-50-5-479-d282e420">Models on Chinese Compound Word Processing</h31>Some models, based on the interactive activation principle (McClelland & Rumelhart, 1981), aimed to explain compound word processing in Chinese reading. Some of these models predict facilitative effects of components because of excitatory connections between characters and multicharacter words (e.g., Taft & Zhu, 1997; Tan & Perfetti, 1999). Some others assume that the effect of characters on compound word processing depends on the properties of the word (Peng et al., 1999; X. Zhou & Marslen-Wilson, 2000). For example, the inter–intra model suggests that if a compound word is semantically transparent, its parts positively influence how quickly the whole word is recognized; if the compound word is semantically opaque, its parts make it slower to recognize the whole word (Peng et al., 1999). Overall, these models predict character frequency affects Chinese compound word processing.
The Chinese reading model (CRM) proposed by Li and Pollatsek (2020) was designed to explain how Chinese readers recognize words and control eye movements without relying on interword spaces. The model comprises two modules: one for word recognition and another for eye-movement control. In the word recognition module, characters within the perceptual span are activated in parallel at the character level, and then they activate possible words containing these characters. Because each character can only belong to one word, CRM assumes that there are inhibitory lateral links between spatially overlapping word units. By doing so, all activated spatially overlapping words compete for recognition, and the word with the highest activation wins. This mechanism allows the model to simultaneously segment and recognize words in continuous Chinese text.
CRM provides a unique perspective on the mechanism of Chinese compound word processing. Unlike the traditional dichotomous approach, CRM centers on the competition among all activated words, including both single and multicharacter words within the perceptual span. Compound words win most of the time because they receive activation from all constituent characters, and their activation value increases faster than the embedded single-character words. Therefore, CRM predicts that compound words are ultimately identified as a whole, aligning with the evidence for holistic processing (e.g., Li et al., 2014; Ma et al., 2015; Yang et al., 2012). Specifically, the model simulated the findings of word frequency effects in Wei et al. (2013), where the two-character strings are recognized as a whole-word in 99% of the trials. Moreover, CRM predicts that the frequency of the component words (i.e., the embedded characters as individual words) impacts the competitive process. High-frequency component words may cause more competition, prolonging the time for compound words to settle the competition. Furthermore, lower frequency compound words should be more impacted by the competition from the embedded component words, given that the baseline activation of these low-frequency compound words are lower to begin with. Therefore, a larger component-word frequency effect is expected when identifying low-frequency compound words than high-frequency component words.
In summary, previous CRMs assume that the components of a compound word affect Chinese compound word processing, although different models make different predictions. Models of decomposition processing predict a facilitative effect at the character level, while CRM assumes that compound words are recognized based on competition and predicts an inhibitory effect of the component at the word level.
<h31 id="xhp-50-5-479-d282e464">The Present Study</h31>The present study aimed to investigate the mechanism of Chinese compound word processing. Specifically, we tested one prediction of the CRM model. According to CRM, the embedded components compete with the whole word at the word level, and this competition results in an inhibitory component-word frequency effect. Previous studies on Chinese compound word processing only focused on the influence of character frequency, ignoring the fact that components in compound words could be used independently as words and compete with compound words during reading to induce an inhibitory effect on compound word processing. This may explain the inconsistent findings using different experimental materials because the component-word frequency was seldom controlled previously. Although CRM was initially designed to simulate word processing during sentence reading, it posits that words are the units of sentence reading and contain a word processing module. The LDT differs from sentence reading in that readers need to make decision regarding whether the characters make up the word. However, the initial word processing stage may be similar for lexical decision and natural reading. This is the reason that researchers use the LDT to study how words are identified. Therefore, it is reasonable to assume that CRM can simulate the procedure of compound word processing.
In Study 1, we analyzed the corpus data from the MELD-SCH (Tsang et al., 2018) and CLP-Tse (Tse et al., 2017) of traditional characters<anchor name="b-fn1"></anchor><sups>1</sups> for the LDT to investigate how whole-word frequency, character frequency, and component-word frequency jointly affect word processing. According to CRM (Li & Pollatsek, 2020), in addition to whole-word frequency, components are assumed to play inhibitory roles at the word level; according to other frameworks (Tan & Perfetti, 1999; Taft & Zhu, 1997), components are assumed to play facilitative roles at the character level. These predictions were evaluated in Study 1. In Study 2, we conducted two factorial design experiments to further examine component-word frequency effects on word identification with controlled character frequencies, which is the most important prediction of the present study. According to the architecture of CRM, where the component words compete with the whole words at the word level, we expected to observe inhibitory effects of compound-word frequencies. By controlling for character frequencies across conditions, Study 2 provides a more direct investigation of how component-word frequency affects word processing.
Study 1
> <h31 id="xhp-50-5-479-d282e491">Method</h31>
<bold>Database of MELD-SCH</bold>
MELD-SCH (Tsang et al., 2018) reported average RTs in an LDT for 12,578 simplified Chinese words, including 10,022 two-character words. Items were divided into 12 lists, and 42 participants were assigned to each list (504 participants in total). The mean error rate was 5.19%, and only correctly responded trials were included when calculating the RTs.
We analyzed RTs of the LDT on compound words to investigate how they were affected by the following seven linguistic properties: whole-word frequency of the compound word, number of strokes, character frequency, and component-word frequency of the first and second components. While whole-word frequency and number of strokes have been shown to robustly influence lexical decision latencies, the effects of character frequency have been mixed, and the effects of component-word frequency have not been examined. Frequency data were obtained from the SUBTLEX-CH frequency corpus based on simplified Chinese subtitles (Cai & Brysbaert, 2010).
Because the present study focused on distinguishing the effects of character frequency and component-word frequency, we only included those two-character compound words in which the individual components are also words by themselves (9,565 words). Moreover, we excluded items with a mean error rate above 0.33 (283 words). Following the guidelines of Baayen and Milin (2010), items with scaled absolute residual values over three were omitted (totaling 52 words), ensuring the residuals approximated a normal distribution (see Appendix A). The pruning of the statistical model did not change the pattern of statistical effects. Ultimately, 9,230 two-character words were included in the analyses. Finally, as the distributions of frequencies and RTs were highly positively skewed, we applied log transformation with a base of 10 to these values in the subsequent analysis. However, for ease of interpretation, Table 2 presents descriptive statistics of raw frequency values.
>
><anchor name="tbl2"></anchor>
<bold>Database of CLP-Tse</bold>
CLP-Tse (Tse et al., 2017) reported average RTs in an LDT for 25,286 traditional Chinese two-character compound words. Items were divided into 18 lists, and 33 participants were assigned to each list (594 participants in total). The mean error rate for words was 11.67%, and only correctly responded trials were included when calculating the RTs.
Although the words in CLP-Tse were written in traditional Chinese, which is visually more complex than simplified Chinese, it has been verified that simplified-character-based frequency measures explain slightly more variance in lexical decision RT than traditional character-based frequency measures (Tse et al., 2017). As a result, when analyzing CLP-Tse, the number of strokes was counted based on the form of traditional Chinese, and all other frequency measures were obtained from the SUBTLEX-CH frequency corpus (Cai & Brysbaert, 2010).
The analysis of CLP-Tse is trial-based, and there are 1,668,876 trials in the raw data set containing 25,286 different two-character words. First, we preprocessed the data based on items. Similar to the preprocessing of MELD-SCH, we only included those two-character compound words in which the individual components are also words by themselves (18,533 different words). Moreover, we excluded words with a mean error rate above 0.33 (816 words). Then, trials with RTs longer than 2,500 ms or shorter than 200 ms were excluded (7,189 trials). Since the distribution of RTs was positively skewed, log transformation was applied to reduce skewing. Next, as recommended by Baayen and Milin (2010), we removed 725 words whose scaled absolute residual values were over three to make the residuals approximately normally distributed (see Appendix A). The pruning of the statistical model did not change the pattern of statistical effects. Ultimately, 586,742 trials that contained 17,717 two-character words were included in the analyses. Finally, as the distributions of frequencies and RTs were highly positively skewed, we applied log transformation with base 10 to these values in the subsequent analysis. However, for ease of interpretation, Table 2 presents descriptive statistics of raw frequency values.
<bold>Analyses</bold>
The available data of MELD-SCH were based on items instead of including every response of each participant, so we fit linear regression models to the item-based average RTs in MELD-SCH. Meanwhile, we fit linear mixed-effect models (LMMs) to the trail-based RTs in CLP-Tse using the lme4 package for R 3.6.3 (Bates et al., 2015; R Development Core Team, 2020), with subject and word as random factors. Although the model was initially structured with a maximal random factor, convergence issues necessitated the removal of all random slopes. Consequently, the final model retained only random intercepts. The whole-word frequency of the compound word, number of strokes, character frequencies, and component-word frequencies of each component were included as predictors in multiple linear regression models fitted for data sets of MELD-SCH, and they were included as fixed factors in LMMs fitted for data sets of CLP-Tse in initial analyses. Models were constructed in which all predictors (whole-word frequency, numbers of strokes, character frequencies, and component-word frequencies) were entered simultaneously. The intercorrelations and variance inflation factors (VIFs) are shown in Appendix A. VIF is a measure of the severity of the multicollinearity problem in multiple linear regression models. Generally, if VIF is greater than 10, then multicollinearity is high (Kutner et al., 2004), and a cutoff of five is also commonly used (Sheather, 2009). In the current study, all VIFs were smaller than 5 in the model fitted for MELD-SCH, and all VIFs were smaller than 4 in the model fitted for CLP-Tse. Q–Q plots for the dependent variables and the residuals and residual plots of predicted values against residuals indicated that the assumptions of normal distribution and homoskedasticity were approximately satisfied (see Appendix A). Furthermore, the interaction terms between whole-word frequency and character frequency as well as whole-word frequency and component-word frequency were included in the second step. We included the interaction terms because some previous studies have shown interactive effects between whole-word frequency and character frequency (Cui et al., 2021; Tse & Yap, 2018; Peng et al., 1999; Wang & Peng, 1999; Yan et al., 2006). All independent variables were mean-centered and standardized (Ford et al., 2010). When the interaction term was significant, a simple slope analysis was conducted using GAMLj for jamovi 1.8 (Gallucci, 2019).
<bold>Transparency and Openness</bold>
The code of analysis can be retrieved from
The model accounted for 39.73% of the variance in the data of MELD-SCH.<anchor name="b-fn2"></anchor><sups>2</sups> As shown in Table 3, some classic effects of linguistic properties were found in the models. For both MELD-SCH and CLP-Tse, the regression coefficient of whole-word frequency was negative, indicating that RTs for high-frequency words were shorter than those for low-frequency words (for MELD-SCH, β = −.075,
>
><anchor name="tbl3"></anchor>
Most interestingly, the component-word frequency and character frequency of the component showed opposite effects. Specifically, in both models, the regression coefficients of first component-word frequency were positive, indicating that compound words containing a high-frequency first component word were identified more slowly than those with a low-frequency first component word (for MELD-SCH, β = .012,
To further investigate whether whole-word frequency would moderate component frequency effects, including character frequency and component-word frequency, we constructed new models with interactions. In the model fitted for the data in MELD-SCH, both first and second component-word frequencies interacted with the whole-word frequency significantly (first component: β = .004,
>
><anchor name="fig1"></anchor>
In the model fitted for CLP-Tse, first component-word frequency had an interaction with whole-word frequency (β = .003,
To investigate whether the frequency of components influence whole compound word processing, two data sets of Chinese lexical decisions were analyzed in Study 1. Many interesting findings were observed in these analyses for both data sets. First, an inhibitory component-word frequency effect was observed, with RTs in the LDT increasing with component-word frequency regardless of the whole-word frequency. Second, we observed a facilitative character frequency effect, with RTs of the LDT decreasing with first character frequencies. The effect was significant only for the first character in MELD-SCH, but it was significant for both the first and second components in CLP-Tse. Third, a whole-word frequency effect was observed, with RTs in the LDT decreasing with an increase in whole-word frequency. Interestingly, the whole-word frequency had larger effects on RTs of lexical decisions than any character properties, which was reflected by regression coefficients. Finally, the interactions of component-word frequency and whole-word frequency had a consistent pattern in the analysis of two data sets, showing increased competition at the word level when processing high-frequency compound words.
In summary, when the statistical model considered both character frequency and component-word frequency simultaneously, they had effects in different directions. Moreover, the frequency effects of the first component were more stable than those of the second component, which might result from the left-to-right reading direction. Meanwhile, the effects found in CLP-Tse were more stable than those in MELD-SCH. This is possibly because there are more words in CLP-Tse, and this data set provides trial-based information, which makes the consideration of variance between subjects possible.
Study 2
>
Study 1 found that character frequency and component-word frequency affect the RTs of lexical decisions differently when considering the two variables simultaneously. As predicted by CRM, components would inhibit compound word processing. One problem with examining the two effects in uncontrolled corpus data sets, however, is that character frequency and component-word frequency are highly correlated (for first component,
<bold>Method</bold>
Participants
>
Seventy-eight native Chinese-speaking participants (57 females) from Mainland China with normal or corrected-to-normal vision were recruited online to participate in the experiment. Their ages ranged from 18 to 29 years. Given the number of words in each condition, there were 1,716 observations per condition, which is comparable to the recommendation of Brysbaert and Stevens (2018). The study was approved by the ethics committee of the Institute of Psychology, Chinese Academy of Sciences, and the participants received a small monetary compensation for their participation.
Stimuli
>
Whole-word frequency (medium vs. low) and first component-word frequency (high vs. low) were orthogonally manipulated to form four conditions. The whole-word frequency of the compound word was divided into medium (
>
><anchor name="tbl4"></anchor>
There were 88 two-character nonwords, which were combined with two characters by randomizing the second characters of all real words in the experiments. This ensured that character-level properties were matched between words and nonwords. All nonwords were manually checked to ensure that they were not an existing word orthographically or phonologically.
Apparatus
>
This study was conducted online on Pavlovia, and PsychoPy (Peirce et al., 2019) was used to program and implement the experiment, recording RTs and accuracy rates. All participants were asked to complete the experiment in a quiet room using their own computers, of which the resolution was set to 1,920 × 1,080 pixels and the refresh rate was 60 Hz. Stimuli were presented in black 26-size Song font on a gray background in the center of the display screen one at a time.
Procedure
>
Before the formal experiment, eight words and eight nonwords were presented to help participants familiarize themselves with the task. Each trial started with a 500-ms fixation cross in the center of the screen, followed by a stimulus that was displayed until the participant responded (or 2,500 ms). Participants decided whether the two-character string presented on the screen was a word by pressing the keyboard as quickly and as accurately as possible; participants pressed “J” for “yes” and “F” for “no.” They were presented with a 300-ms blank screen for their correct response or 300-ms feedback for their incorrect response, and after another 200-ms blank screen, a new trial started.
Transparency and Openness
>
The materials, raw data, and the code of analysis in R is publicly available at the Open Science Framework website (
<bold>Results</bold>
Only responses for words in experimental parts were analyzed, including accuracy rates and RTs. Generalized linear mixed-effect models were tested using the lme4 packages (Bates et al., 2015) in R 4.2 to analyze accuracy rates, and LMMs were used to analyze RTs. Because of the positive skewness of the RTs, the data were log-transformed to meet the distribution assumption of LMMs.<anchor name="b-fn3"></anchor><sups>3</sups> In all models, whole-word frequency (medium was coded as −0.5 and low was coded as 0.5) and component-word frequency (high was coded as −0.5 and low was coded as 0.5) were entered as contrast coded fixed factors, specifying participants and items as crossed random factors. All models were initially constructed with a maximal random factor structure. If the maximal model did not converge, a simpler model was tested, with the random component generating the smallest variances removed (Barr et al., 2013). We report regression coefficients (
Accuracy Rates
>
The mean accuracy of the lexical decisions for all words was 94.7%, and the accuracy rates were larger than 80% for all participants. Because the mean accuracy of two words was less than 67%, their data were excluded from the following analyses.<anchor name="b-fn4"></anchor><sups>4</sups> Both were low-frequency compound words, and one belonged to the high first component-word frequency condition, while the other belonged to the low first component-word frequency condition. The mean accuracy for the remaining words was 95.4%. The descriptive statistics and fixed-effect estimate from the GLMM are shown in Tables 5 and 6. The final model included random intercepts and slopes (i.e., whole-word frequency and the interaction) for subjects and random intercepts for items. The main effect of whole-word frequency was significant, and accuracy was higher for compound words with high whole-word frequency than for those with low whole-word frequency (
>
><anchor name="tbl5"></anchor>
>
><anchor name="tbl6"></anchor>
RTs
>
Trials with incorrect responses were first excluded (4.6%), and RTs longer than 2,000 ms or shorter than 200 ms were excluded (0.1%). Finally, RTs beyond 3
Nonwords in Experiment 1 were combined with two characters by randomizing the second characters of target words, which means the same character occurred once in the word context and once in the nonword context. This has the unintended consequence of priming the second occurrence of the same character, with unpredictable consequences for the lexical decision latency.<anchor name="b-fn6"></anchor><sups>6</sups> Experiment 2 was designed to exclude this possibility. In Experiment 2, characters in nonwords were not characters that were used in target words. Given that our focus centered on the component-word frequency effects on word identification, we chose to exclusively manipulate first component-word frequency in a broader sample of compound words for Experiment 2.
<bold>Method</bold>
Participants
>
In Experiment 2, 35 native Chinese-speaking participants (21 females) from Mainland China with normal or corrected-to-normal vision were recruited online to participate in the experiment. Their ages ranged from 19 to 26 years. The number of observations per condition in this experiment was 1,750, closely matching the 1,716 observations per condition in Experiment 1. As stated previously, these numbers are comparable to the recommendations made by Brysbaert and Stevens (2018).
Stimuli
>
First component-word frequency was manipulated and was divided into high (
>
><anchor name="tbl7"></anchor>
Apparatus
>
The same apparatus was used as in Experiment 1.
Procedure
>
The same procedure was used as in Experiment 1.
Transparency and Openness
>
The materials, raw data, and the code of analysis in R is publicly available at the Open Science Framework website (
<bold>Results</bold>
The same analysis processes were used as in Experiment 1. In all LMMs, component-word frequency was entered as contrast-coded fixed factors (high was coded as −0.5 and low was coded as 0.5).
Accuracy Rates
>
The mean accuracy was 93.7%. The descriptive statistics and results of the GLMM are shown in Table 8. The final model included random intercepts for subjects and items. The component-word frequency effect was significant (
>
><anchor name="tbl8"></anchor>
RTs
>
Approximately 8.0% of the trials were excluded using the same criterion as in Experiment 1. The final model included random intercepts and slopes (i.e., component-word frequency) for subjects and random intercepts for items. The results in Table 8 showed a significant first component-word frequency effect (
The results of two factor-designed experiments generally replicated the major finding of Study 1. The results showed that it took longer to identify compound words containing high-word-frequency components than those containing low-word-frequency components. The interaction between whole-word frequency and component-word frequency was not significant. These findings provided evidence to support the argument that component words of compound words compete with the whole word during word processing.
General Discussion
>
The present study examined how Chinese compound words are processed by analyzing two large-scale databases and conducting two lexical decision experiments. In contrast to previous studies, we distinguished component-word frequency and character frequency when investigating how component properties affect compound word processing.
In the present studies, we found two main effects. The first is the classical whole-word frequency effect, with shorter lexical decision latencies for high-frequency compound words. Another is component-word frequency effects, with longer reading times for compound words containing high-frequency component words. The two frequency effects confirmed a prediction of CRM. When the model processes a compound word, both the whole word and the component words are activated and compete for a winner. The whole compound word wins most of the time because it receives more support from visual and character levels than any component words, so it will be identified as a word. CRM assumes that a high-frequency compound word takes less time to win than a low-frequency word, which is shown as the whole word frequency effect in the experiments. Meanwhile, the activation of embedded component words might cause some interference in the competition. CRM predicts that the activation of high-frequency component words is higher than that of low-frequency component words; thus, they cause more competition to the whole compound word. This stronger competition slows down word identification and results in longer processing times. The finding of an inhibitory component-word frequency effect is consistent with this prediction.
Furthermore, the effect size of whole word frequency is larger than that of component-word frequency in both studies. Although their frequency ranges were different, these variables were standardized in the analysis of Study 1 and measured in the same situation in Experiment 1 of Study 2. The finding of larger whole-word frequency effect aligns with the prediction of CRM, which predicts that the whole word usually wins the competition because the whole-word node is supported by bottom-up activation from more character nodes than its component words (i.e., one-character words). Therefore, the component words are inhibited by the whole compound word soon after being activated at the beginning of processing, while the whole compound word is long lived. This possibly makes the frequency effects of components either nonsignificant (as in previous studies, see Li et al., 2014; Ma et al., 2015; Rayner et al., 2007) or trivial compared to the whole-word frequency effects (as in the present study) and makes processing holistic-like in practice (Bai et al., 2008; Shen & Li, 2012; Shen et al., 2018; Yang et al., 2012; Zang et al., 2013; J. Zhou & Li, 2021).
It is necessary to clarify that the competition-based view is different from the dual-route model, where lexical access of component words and whole words takes place in different routes (Caramazza et al., 1988; Pollatsek et al., 2000). In the dual-route model, words are accessed through the faster route of either the holistic or decomposition one and component effects are considered as evidence for decomposition-then-composition. However, our current view posits that lexical processing of component words and whole words are simultaneous at the same level, predicting an inhibitory effect from component-word frequency because of competition. In short, we do not view compound word identification dichotomously but view it as an interactive activation-based competition among all possible words.
The findings of the present study might provide one solution to the discrepant findings in the literature regarding how character frequency affects word identification in Chinese reading. Some previous studies found a facilitative effect of character frequency on compound word processing (e.g., Peng et al., 1999; Wang & Peng, 1999; Yan et al., 2006), others found inhibitory effects (e.g., Tsang et al., 2018; Xiong et al., 2023; Yu et al., 2021), and still others found null effects (e.g., Cui et al., 2017; Li et al., 2014; Ma et al., 2015). As we argued in the Introduction section, components of compound words may produce two opposite effects on Chinese compound word processing: a facilitative effect at the character level (Taft & Zhu, 1997) and an inhibitory effect at the word level (Li & Pollatsek, 2020). Consistent with these predictions, inhibitory component-word frequency effects of the first component were observed in two studies, while facilitative character frequency effects were observed in Study 1. The balance of these two effects can explain the mixed findings from previous studies, which only included character frequencies as variables without considering component-word frequencies (e.g., H. C. Chen et al., 2003; Tsang et al., 2018; Tse & Yap, 2018). Based on the results from the new analysis on the corpus data in Study 1 and the two experiments in Study 2, we argued that the key to solving this puzzling picture in the literature is to consider the effects of component words when theorizing Chinese compound word processing. Possibly, if target words differ greatly in character frequency but not in component-word frequency, a facilitative effect of character frequency on word recognition might be observed. However, if the components are of high word frequency in the high character frequency condition, an inhibitory effect might override the facilitative one. Meanwhile, this explanation is just one possibility causing the mixed results of character frequencies in previous studies and it does not exclude other possibilities.
Additionally, in Study 1, the interactions between whole-word frequency and component-word frequency were significant, suggesting that whole-word frequency is an essential determinant of component-word frequency effects and component-word frequency effects are stronger when the whole-word frequency is higher. However, the interaction was not replicated in Experiment 1 of Study 2, an empirical study in which whole-word frequency and first component-word frequency were manipulated as category variables. In contrast, first component-word frequency showed inhibitory effects on the RTs of lexical decisions independent of compound word frequency. One probable reason for the absence of an interaction is that the range of whole word frequency is limited. It remains to be seen whether an experimentally manipulated component-word frequency effect would be smaller or nonexistent for compound words with high whole-word frequency. Note that despite not using the high-frequency words, the frequency range we selected in Experiment 1 covers 65% of all the two-character words, suggesting that the competition pattern we observed occurs for most of the Chinese compound words.
The results of Study 1 also showed that the effects of the first character and the second character were different to some degree. The frequency effects of the first component are more robust and stronger than those of the second component, which is consistent with previous studies showing similar patterns in Chinese compound word processing even when words were presented in isolation (Peng et al., 1994; Tan & Perfetti, 1999). Differences between the two characters of a word might be caused by reading direction. Because Chinese readers usually read from left to right so that their eyes usually move from left to right, the first character of a word may have some advantages over the second character during reading (Ma et al., 2015). However, considering that the frequency effects of the second component are not consistent in Study 1, significant in the analyses of CLP-Tse but not in those of MELD-SCH, more empirical studies are needed to verify the frequency effects of the second component on Chinese compound word processing. Meanwhile, CLP-Tse is a data set of traditional Chinese, while MELD-SCH is based on simple Chinese, so it is also possible that there are some differences between the lexical identifications in these two visually different Chinese.
Inhibitory effects of component word frequency on compound word processing have also been observed in some alphabetic languages such as Basque and Vietnamese (Pham & Baayen, 2015; Vergara-Martínez et al., 2009). Most studies of English observed facilitative effects of morpheme frequency (Inhoff et al., 2008; Schmidtke et al., 2021). However, the effect is not always robust. For example, in an LDT, when the second component was a high-frequency word, the frequency effect of first component was not significant (Juhasz et al., 2003). Moreover, in eye movement studies, Juhasz et al. (2003) also did not find significant first lexeme effects. Although studies of English compound words did not consistently observe component word frequency, none has reported inhibitory effects. Apparently, there are some cross-language differences regarding how component frequency affects compound word processing. The exact reasons for these differences are currently unclear, and further research is required to understand them.
This raised a question of whether the mechanism of compound word processing proposed in the present study is specific to Chinese or is a universal approach for all writing systems. The unique properties of Chinese might affect compound word processing in the following ways. First, Chinese words are short, allowing readers to process a word within a single fixation. In contrast, longer compound words in alphabetic languages might need more fixations, preventing holistic processing. Second, because there are no explicit marks to demarcate words in Chinese, readers need to decide which word each character belongs to. This may encourage competition between whole words and the components. In contrast, for English compounds, the absence of whitespace may suggest that the embedded word is not to be identified separately, potentially reducing inhibitory effects. Finally, morphemes are salient in Chinese and likely to be activated early during processing. This might not happen as quickly in alphabetic languages if morpheme boundaries are not apparent. These differences suggest that compound word processing in Chinese might have unique properties compared to alphabetic writing systems. The linguistic experience could affect how readers process words (Traficante et al., 2018). Therefore, it is an interesting question regarding how well CRM explains word processing in alphabetic languages.
One further question is whether the mechanism for processing compound words in an LDT could be applied to natural sentence reading. On the one hand, multiple words are presented simultaneously without obvious word boundaries during natural reading. It is likely that the mechanism of compound word processing would be affected by the procedure of word segmentation during sentence reading. Zang et al. (2016) manipulated the lexical probability (i.e., the likelihood of a character being a single-character word vs. part of a two-character word) of the first component and the preview of the second component in a sentence reading study, with character frequency matched. They found when the first component was more likely to be a single-character word, the preview effects on the whole words reduced, indicating Chinese readers could use lexical probability cues for word segmentation during sentence reading. On the other hand, given that words are presented with contexts and readers might rely more on top-down information during reading, the influence of character frequency might be relatively weak (Cui et al., 2013, 2021; Li et al., 2014; Ma et al., 2015; Yan et al., 2006; Yu et al., 2021). Accordingly, words in a sentence might be processed essentially as psychological units and possibly induce no or little difficulty in segmentation for Chinese readers (Bai et al., 2008). In sum, future research is crucial to determine the extent to which character frequency and component-word frequency serve as distinct factors in the mental lexicon of Chinese readers, as well as to assess the generalizability of compound word processing mechanisms across tasks.
Similar to the results of the studies presenting words in isolation, previous sentence-reading studies tended to observe robust whole-word frequency effects and mixed character frequency effects. Recent studies have found inhibitory effects of character frequency on compound word processing during sentence reading (Cui et al., 2021; Xiong et al., 2023; Yu et al., 2021). Cui et al. (2021) explained the inhibitory first-character effect under the constraint hypothesis (Hyönä et al., 2004) based on the observation that morphological family members (number of words the character appears in) and first character frequency were strongly correlated. It was hypothesized that the fewer the morphological family members associated with the first character, the stronger constraint the first character has on the possible compound words. The constraint might be particularly useful when the whole compound word is low frequency. Yu et al. (2021), however, pointed out that family member sizes are mostly found to be facilitative in alphabetic languages (e.g., Dutch: Kuperman et al., 2009; English: Juhasz & Berkowitz, 2011; Finnish: Kuperman et al., 2008), as well as in Chinese (Yao et al., 2022). Furthermore, when they analyzed only a subset of target words to equate family member size, the inhibitory effect of first character frequency was still present. They therefore refuted the constraint hypothesis. In the current research, when including family size into the analysis of Study 1, its effect on word identification was only significant in the analysis of CLP-Tse, in a direction of facilitation, but absent in the analysis of MELD-SCH (more details in Table A3). Notably, even when including family sizes, there is still facilitative character frequency and inhibitory component-word frequency effects, consistent with initial findings. Instead, Yu et al. argued that the inhibitory character frequency effect reflects the heuristics Chinese readers use to perform word segmentation when reading multiple consecutive characters in a sentence, whereby the unfamiliarity from a low-frequency first character induces an inference of a one-character word and a short fixation. However, our current lexical-decision results imply that the inhibitory effect of the component does not necessarily emerge because of the need for segmentation because the targets were presented in isolation (also see Xiong et al., 2023). We leave the question of generalization between single-word and sentence-reading paradigms to future studies where the effect of component-word frequency is explicitly examined. If the component-word frequency affects the eye-movement measures in the same way when controlling the character frequency, it will enhance the application of our theory in Chinese reading.
Finally, we acknowledge the limitation that we did not consider semantic processing, although this is an integral part of compound word processing. Peng et al. (1999) found that character frequency effects were moderated by the semantic transparency of the whole word. To interpret this, compound words were divided into semantic transparent or opaque words in their model (not computationally implemented), and there were different types of connections between morpheme and word nodes depending on the transparency. Simply based on the measurements of RTs in LDTs, it is also difficult to discriminate the time courses or processing stages of different frequency effects on compound word identification. Considering the tasks in the two studies were both lexical decisions, it is uncertain whether the results could be generalized to other tasks. Additionally, there are inevitably problems to be solved in explaining word processing in other languages because the competition-based word processing mechanism in CRM was targeted at specific properties of Chinese. In the future, further studies are needed to investigate these questions. However, for the present, we mainly focus on the effects at the word level in Chinese compound word identification.
Conclusion
>
By analyzing two existing lexical decision databases and conducting empirical research using LDTs, the present study showed that whole-word properties and component properties affect word processing during Chinese reading. Specifically, facilitative whole-word frequency effects and inhibitory component-word frequency effects were observed in the analyses of previous corpus as well as the experiments with factorial design. These findings support a novel view of how compound words are processed in Chinese reading. According to this approach, both the whole compound word and the words formed by components are activated, and these words compete for a winner. Because compound words are supported by more character units than any component word, the whole word almost always wins the competition, resulting in the compound word being processed as a unit. Meanwhile, because the activated component words compete with the whole word, their properties also influence the time it needs to identify whole compound word. This new approach might explain the previous inconsistent findings about the effects of component frequency and highlight the importance of component words.
Footnotes
<anchor name="fn1"></anchor><sups> 1 </sups> Simplified Chinese characters are used mainly in mainland China and have fewer strokes. Traditional Chinese characters, used in regions such as Taiwan, Hong Kong, and Macau, are more complex and retain historical forms. The two systems differ in character complexity and appearance.
<anchor name="fn3"></anchor><sups> 3 </sups> Models using raw data of RTs showed similar patterns of significance from the ones conducted on log-transformed data, and therefore, only the results for log-transformed RTs are reported.
<anchor name="fn4"></anchor><sups> 4 </sups> Words with accuracy lower than 0.67 may not be processed as words by readers although their word frequencies were not different significantly from other words. The two words are “协约”and“支流”. Models based on all words showed similar patterns of significance from the ones conducted on the trimmed data, and therefore, only the results for trimmed data are reported.
<anchor name="fn5"></anchor><sups> 5 </sups> For ease of understanding, the effect size calculations here are consistent with those in Table 1. Negative values indicate facilitative effects and positive values indicate inhibitory effects, and larger absolute values indicate stronger effects.
<anchor name="fn6"></anchor><sups> 6 </sups> We thank Sachiko Kinoshita and the editor for pointing out this problem of Experiment 1.
<anchor name="fn2"></anchor>
<sups>
2
</sups> We constructed the linear mixed-effects model to fit the data of CLP-Tse, and therefore,
References
<anchor name="c1"></anchor>Andrews, S. (1986). Morphological influences on lexical access: Lexical or nonlexical effects?
Baayen, R. H., & Milin, P. (2010). Analyzing reaction times.
Baayen, R. H., & Schreuder, R. (2000). Towards a psycholinguistic computational model for morphological parsing.
Bai, X., Yan, G., Liversedge, S. P., Zang, C., & Rayner, K. (2008). Reading spaced and unspaced Chinese text: Evidence from eye-movements.
Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4.
Beijing Language Institute. (1986).
Bertram, R., & Hyönä, J. (2003). The length of a complex word modifies the role of morphological structure: Evidence from eye-movements when reading short and long Finnish compounds.
Bien, H., Levelt, W. J., & Baayen, R. H. (2005). Frequency effects in compound production.
Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial.
Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.
Caramazza, A., Laudanna, A., & Romani, C. (1988). Lexical access and inflectional morphology.
Chen, H. C., Song, H., Lau, W. Y., Wong, K. F. E., & Tang, S. L. (2003). Developmental characteristics of eye-movements in reading Chinese. In C.McBride-Chang & H. C.Chen (Eds.),
Chen, M., Wang, Y., Zhao, B., Li, X., & Bai, X. (2021). The trade-off between format familiarity and word-segmentation facilitation in Chinese reading.
Cui, L., Häikiö, T., Zhang, W., Zheng, Y., & Hyönä, J. (2017). Reading monomorphemic and compound words in Chinese.
Cui, L., Wang, J., Zhang, Y., Cong, F., Zhang, W., & Hyönä, J. (2021). Compound word frequency modifies the effect of character frequency in reading Chinese.
Cui, L., Yan, G., Bai, X., Hyönä, J., & Liversedge, S. P. (2013). Processing of compound-word characters in reading Chinese: An eye-movement-contingent display change study.
Duñabeitia, J. A., Perea, M., & Carreiras, M. (2007). The role of the frequency of constituents in compound words: Evidence from Basque and Spanish.
Ford, M. A., Davis, M. H., & Marslen-Wilson, W. D. (2010). Derivational morphology and base morpheme frequency.
Gallucci, M. (2019).
Giraudo, H., & Grainger, J. (2000). Effects of prime word frequency and cumulative root frequency in masked morphological priming.
Hasenacker, J., & Schroeder, S. (2019). Compound reading in German: Effects of constituent frequency and whole-word frequency in children and adults.
Hyönä, J., Bertram, R., & Pollatsek, A. (2004). Are long compound words identified serially via their constituents? Evidence from an eye movement-contingent display change study.
Hyönä, J., & Olson, R. (1995). Eye fixation patterns among dyslexic and normal readers: Effects of word length and word frequency.
Hyönä, J., & Pollatsek, A. (1998). Reading Finnish compound words: Eye fixations are affected by component morphemes.
Inhoff, A. W., Starr, M. S., Solomon, M., & Placke, L. (2008). Eye movements during the reading of compound words and the influence of lexeme meaning.
Juhasz, B., & Berkowitz, R. (2011). Effects of morphological families on English compound word recognition: A multitask investigation.
Juhasz, B. J., Starr, M. S., Inhoff, A. W., & Placke, L. (2003). The effects of morphology on the processing of compound words: Evidence from naming, lexical decisions and eye fixations.
Kuperman, V., Bertram, R., & Baayen, R. H. (2008). Morphological dynamics in compound processing.
Kuperman, V., Schreuder, R., Bertram, R., & Baayen, R. H. (2009). Reading polymorphemic Dutch compounds: Toward a multiple route model of lexical processing.
Kutner, M. H., Nachtsheim, C. J., Neter, J. (2004).
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs.
Lexicon of Common Words in Contemporary Chinese Research Team. (2008).
Li, X., Bicknell, K., Liu, P., Wei, W., & Rayner, K. (2014). Reading is fundamentally similar across disparate writing systems: A systematic characterization of how words and characters influence eye-movements in Chinese reading.
Li, X., Gu, J., Liu, P., & Rayner, K. (2013). The advantage of word-based processing in Chinese reading: Evidence from eye movements.
Li, X., Huang, L., Yao, P., & Hyönä, J. (2022). Universal and specific reading mechanisms across different writing systems.
Li, X., & Pollatsek, A. (2020). An integrated model of word processing and eye-movement control during Chinese reading.
Li, X., Zhao, W., & Pollatsek, A. (2012). Dividing lines at the word boundary position helps reading in Chinese.
Ma, G., Li, X., & Rayner, K. (2015). Readers extract character frequency information from nonfixated-target word at long pretarget fixations during Chinese reading.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1.: An account of basic findings.
Meyer, D. E., & Schvaneveldt, R. W. (1971). Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations.
Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., & Kristoffer Lindeløv, J. (2019). Psychopy2: Experiments in behavior made easy.
Peng, D. L., Li, Y., & Liu, Z. (1994). Identification of the Chinese two-character word under repetition priming condition.
Peng, D. L., Liu, Y., & Wang, C. (1999). How is access representation organized? The relation of polymorphemic words and their morphemes in Chinese. In J.Wang, A. W.Inhoff, & H.-C.Chen (Eds.),
Pham, H., & Baayen, H. (2015). Vietnamese Compounds show an anti-frequency effect in visual lexical decision.
Pollatsek, A., Hyönä, J., & Bertram, R. (2000). The role of morphological constituents in reading Finnish compound words.
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research.
Rayner, K., & Duffy, S. A. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity.
Rayner, K., Li, X., & Pollatsek, A. (2007). Extending the E–Z reader model of eye-movement control to Chinese readers.
R Development Core Team. (2020).
Reicher, G. M. (1969). Perceptual recognition as a function of meaningfulness of stimulus material.
Schmidtke, D., Van Dyke, J. A., & Kuperman, V. (2021). CompLex: An eye-movement database of compound word reading in English.
Schreuder, R., & Baayen, R. H. (1995). Modeling morphological processing. In L. B.Feldman (Ed.),
Sheather, S. J. (2009).
Shen, W., & Li, X. (2012). The uniqueness of word superiority effect in Chinese reading.
Shen, W., Li, X., & Pollatsek, A. (2018). The processing of Chinese compound words with ambiguous morphemes in sentence context.
Sun, C. C., Hendrix, P., Ma, J. Q., & Baayen, R. H. (2018). Chinese Lexical Database (CLD): A large-scale lexical database for simplified Mandarin Chinese.
Taft, M., & Forster, K. I. (1975). Lexical storage and retrieval of prefixed words.
Taft, M., & Forster, K. I. (1976). Lexical storage and retrieval of polymorphemic and polysyllabic words.
Taft, M., Huang, J., & Zhu, X. (1994). The influence of character frequency on word recognition responses in Chinese. In H. W.Chang, J. T.Hung, C. W.Hue, & O.Tzeng (Eds.),
Taft, M., & Zhu, X. (1997). Submorphemic processing in reading Chinese.
Tan, L. H., & Perfetti, C. A. (1999). Phonological activation in visual identification of Chinese two-character words.
Traficante, D., Marelli, M., & Luzzatti, C. (2018). Effects of reading proficiency and of base and whole-word frequency on reading noun- and verb-derived words: An eye-tracking study in Italian primary school children.
Tsang, Y. K., Huang, J., & Lui, M., Xue, M., Chan, Y.-W. F., Wang, S., Chen, H.-C. (2018). Meld-sch: A megastudy of lexical decision in simplified Chinese.
Tse, C. S., & Yap, M. J. (2018). The role of lexical variables in the visual recognition of two-character Chinese compound words: A megastudy analysis.
Tse, C. S., Yap, M. J., Chan, Y. L., Sze, W. P., Shaoul, C., & Lin, D. (2017). The Chinese lexicon project: A megastudy of lexical decision performance for 25,000+traditional Chinese two-character compound words.
Vergara-Martínez, M., Duñabeitia, J. A., Laka, I., & Carreiras, M. (2009). ERP Correlates of inhibitory and facilitative effects of constituent frequency in compound word reading.
Wang, C., & Peng, D. (1999). The role of surface frequencies, cumulative morpheme frequencies, and semantic transparencies in the processing of compound words.
Wei, W., Li, X., & Pollatsek, A. (2013). Word properties of fixated words affect outgoing saccade length in Chinese reading.
Xiong, J., Yu, L., Veldre, A., Reichle, E. D., & Andrews, S. (2023). A multitask comparison of word- and character-frequency effects in Chinese reading.
Yan, G., Tian, H., Bai, X., & Rayner, K. (2006). The effect of word and character frequency on the eye-movements of Chinese readers.
Yang, J., Staub, A., Li, N., Wang, S., & Rayner, K. (2012). Plausibility effects when reading one- and two-character words in Chinese: Evidence from eye movements.
Yao, P., Staub, A., & Li, X. (2022). Predictability eliminates neighborhood effects during Chinese sentence.
Yu, L., Liu, Y., & Reichle, E. D. (2021). A corpus-based versus experimental examination of word-and character-frequency effects in Chinese reading: Theoretical implications for models of reading.
Zang, C., Liang, F., Bai, X., Yan, G., & Liversedge, S. P. (2013). Interword spacing and landing position effects during Chinese reading in children and adults.
Zang, C., Wang, Y., Bai, X., Yan, G., Drieghe, D., & Liversedge, S. P. (2016). The use of probabilistic lexicality cues for word segmentation in Chinese reading.
Zhang, B., & Peng, D. L. (1992). Decomposed storage in the Chinese lexicon. In H.-C.Chen & O.Tzeng (Eds.),
Zhou, J., & Li, X. (2021). On the segmentation of Chinese incremental words.
Zhou, X., & Marslen-Wilson, W. (2000). Lexical representation of compound words: Cross-linguistic evidence.