Treffer: Competition between parts and whole: A new approach to Chinese compound word processing.

Title:
Competition between parts and whole: A new approach to Chinese compound word processing.
Authors:
Zhang Q; CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences., Huang KJ; Department of Psychological and Brain Sciences, University of Massachusetts., Li X; CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences.
Source:
Journal of experimental psychology. Human perception and performance [J Exp Psychol Hum Percept Perform] 2024 May; Vol. 50 (5), pp. 479-497. Date of Electronic Publication: 2024 Mar 28.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: American Psychological Assn Country of Publication: United States NLM ID: 7502589 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1939-1277 (Electronic) Linking ISSN: 00961523 NLM ISO Abbreviation: J Exp Psychol Hum Percept Perform Subsets: MEDLINE
Imprint Name(s):
Original Publication: Washington, American Psychological Assn.
Grant Information:
National Natural Science Foundation of China
Entry Date(s):
Date Created: 20240328 Date Completed: 20240419 Latest Revision: 20251006
Update Code:
20251006
DOI:
10.1037/xhp0001198
PMID:
38546626
Database:
MEDLINE

Weitere Informationen

How compound words are processed remains a central question in research on Chinese reading. The Chinese reading model assumes that all possible words sharing characters are activated during word processing and these activated words compete for a winner (Li & Pollatsek, 2020). The present studies aimed to examine whether embedded component words compete with whole compound words in Chinese reading. In Study 1, we analyzed two existing lexical decision databases and revealed inhibitory effects of component-word frequency and facilitative effects of character frequency on the first components. In Study 2, we conducted two factorial experiments to further examine the effects of first component-word frequency, with character frequencies controlled. The results consistently indicated significant inhibitory effects of component-word frequency. Collectively, these findings support the theoretical proposition that both component words and compound words are activated and engage in competition during word processing. This provides a new approach to compound word processing in Chinese reading and a possible solution to mixed results of character frequency effects reported in the literature. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

Competition Between Parts and Whole: A New Approach to Chinese Compound Word Processing

<cn> <bold>By: Qiwei Zhang</bold>
> CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
> Department of Psychology, University of Chinese Academy of Sciences
> <bold>Kuan-Jung Huang</bold>
> Department of Psychological and Brain Sciences, University of Massachusetts
> <bold>Xingshan Li</bold>
> CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
> Department of Psychology, University of Chinese Academy of Sciences </cn>

<bold>Acknowledgement: </bold>This research was supported by a grant from the National Natural Science Foundation of China (NSFC; 32371156). We thank Adrian Staub for his comments on an earlier version of this manuscript. The code of analysis in Study 1 can be retrieved from https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c, and the data of Megastudy of Lexical Decision in Simplified Chinese (Tsang et al., 2018) and Chinese Lexicon Project-Tse (Tse et al., 2017) are available by corresponding the authors of the database. All the codes, materials, and data in Study 2 can be retrieved from https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c.Qiwei Zhang served as lead for data curation, formal analysis, investigation, methodology, project administration, validation, visualization, and writing–original draft. Xingshan Li served as lead for funding acquisition, resources, and supervision, contributed equally to methodology, project administration, and validation, and served in a supporting role for software and writing–original draft. Qiwei Zhang, Kuan-Jung Huang, and Xingshan Li contributed equally to writing–review and editing.

A compound word is a morphologically complex word binding together two or more morphemes (such as snowball from snow and ball); most morphemes of compound words can be used as independent words in sentences. In recent decades, whether compound words are processed via full form or components has been extensively studied in alphabetic writing systems such as English, Finnish, Dutch, Spanish, and Basque (e.g., English: Andrews, 1986; Finnish: Pollatsek et al., 2000; Dutch: Kuperman et al., 2009; Spanish and Basque: Duñabeitia et al., 2007). Meanwhile, compound words account for more than 70% of Chinese vocabulary (Beijing Language Institute, 1986). Therefore, an important research question for Chinese reading is how compound words are processed. Previous studies have shown script-specific mechanisms of compound word processing (Li et al., 2022). However, as we will review below, how Chinese readers process compound words is not fully understood, and some recent findings are mixed (Cui et al., 2021; Tsang et al., 2018; Yu et al., 2021). This study investigates the mechanism of compound word processing in Chinese reading, aiming to address the long-standing debate regarding whether Chinese words are processed in a holistic or decompositional manner.

<h31 id="xhp-50-5-479-d282e139">Compound Word Processing in Alphabetic Writing Systems</h31>

Before turning to compound word processing in Chinese, it is instructive to consider findings and theories in alphabetic writing systems, where lexical decision tasks (LDTs) and natural reading tasks are commonly used (Balota & Chumbley, 1984; Taft & Forster, 1976). In LDT, participants quickly identify whether a string is a word or nonword, with response times (RTs) and accuracy rates as key metrics (Meyer & Schvaneveldt, 1971). Natural reading tasks, on the other hand, focus on eye movements to measure word processing difficulty (Rayner, 1998; Rayner & Duffy, 1986).

Three primary theories—holistic processing, decompositional processing, and dual-route processing—have been proposed to understand how compound words are recognized in alphabetic writing systems. The holistic processing theories argue that compound words are stored and retrieved as single units, supported by evidence of whole-word frequency effect showing faster recognition for more frequent whole words (e.g., Giraudo & Grainger, 2000; Hyönä & Olson, 1995; Kuperman et al., 2008). In contrast, the decompositional processing theories argue that compound words are broken down into their components for processing (Taft & Forster, 1975, 1976; Zhang & Peng, 1992). Studies supporting this theory have revealed component frequency effects that high-frequency components lead to shorter reading times (e.g., Bien et al., 2005; Hasenacker & Schroeder, 2019; Kuperman et al., 2009). The dual-route models posit that both processes operate in parallel, with the faster route taking precedence (e.g., Baayen & Schreuder, 2000; Caramazza et al., 1988; Schreuder & Baayen, 1995). Some factors like word length can affect the race: shorter words tend to be processed holistically, while longer words are often decomposed (Bertram & Hyönä, 2003; Hyönä & Pollatsek, 1998; Pollatsek et al., 2000). In summary, both whole-word and component frequencies affect compound word processing, and dual-route models offer the most comprehensive explanation for these findings (Caramazza et al., 1988; Pollatsek et al., 2000).

<h31 id="xhp-50-5-479-d282e213">Properties of the Chinese Writing System</h31>

Chinese is a logographic writing system with many unique properties that distinguish it from alphabetic writing systems. One is that Chinese characters primarily convey semantic information, although they also carry phonological information. There are more than 5,000 characters in Chinese, each of which is a writing unit representing a single morpheme and syllable, except in a few multicharacter monomorphemic words such as “蝴蝶” (meaning butterfly), in which two characters together represent a morpheme. Furthermore, there are no spaces to demarcate words within a sentence.

A Chinese word can be composed of one or more characters. Compared to words in alphabetic writing systems, the mean length of Chinese words is shorter, and the variance is smaller. Based on the frequencies of the 56,008 listed words in one lexicon (Lexicon of Common Words in Contemporary Chinese Research Team, 2008), 6% of Chinese words are one character long, 72% are two characters, 12% are three characters, 10% are four characters, and less than 0.3% of the words are longer than four characters. The relationship between characters and words is complex. Most Chinese characters are one-character words; however, they can be combined with other characters to form compound words. For example, the character “人” is a word by itself (meaning people), but it can also constitute multicharacter words with other characters (such as “人群” [meaning a lot of people], “陌生人” [meaning stranger], “出人意料” [meaning unexpected]). There are two types of frequency associated with one character. One is character frequency, calculating every occurrence of the character, whether the character is an individual word or embedded in a longer word. The other is word frequency, referring to the occurrence of the character when it is used alone as an individual word. As a concrete example, in a corpus (Cai & Brysbaert, 2010), the character “人” appears 373,292 times, and the corpus contains 46.8 million characters; thus, the character frequency of “人” is 7,969 occurrences per million. On the other hand, the one-character word “人” appears 194,914 times (far less than the number of times the character “人” appears because “人” also appears as a part of other longer words). The corpus contains 33.5 million words, and thus, the word frequency of the one-character word “人” is 5,810 occurrences per million. In practice, there is a high correlation between the word frequency and the corresponding character frequency (r &gt; .80 in the following analysis). In the rest of the paper, we will refer to the word frequency of one-character words as component-word frequency, as we are considering characters embedded in a compound word as its components, that is, the word frequency of the components. For example, the component-word frequency of “人” in “人群” is the word frequency of the single-character word “人”. Notably, the component-word frequency is not specific to the character position in compound words.

The visual salience of morphemes and words in written Chinese is different from that in alphabetic scripts. In written English, for example, morpheme boundaries in a compound word can hardly be identified simply with visual cues, but a space unambiguously separates two words. In contrast, morphemes are visually salient in written Chinese. This is because in Chinese, one morpheme corresponds to one character most of the time, and each character is visually represented in a uniformly sized box. However, when reading sentences, no apparent cues exist between Chinese words varying in length, and thus, words cannot be segmented simply with visual cues.

These differences between the Chinese writing system and alphabetic writing systems possibly require different models of compound word processing. For example, Chinese compound words are horizontally shorter so that they are more likely to be processed via the full-form route, according to the dual-route model (Caramazza et al., 1988; Pollatsek et al., 2000). Alternatively, because Chinese morphemes are visually salient, decomposition of compound words into individual components could be more likely. The following section reviews some evidence for or against holistic/decompositional processing of Chinese compound words.

<h31 id="xhp-50-5-479-d282e237">Previous Findings of Chinese Compound Word Processing</h31>

<bold>Character Frequency Effects</bold>

As with studies of other languages introduced earlier, whether Chinese compound words are accessed in a holistic or decompositional manner has been investigated by examining the effects of whole-word frequency and character frequency (Cui et al., 2021; Li et al., 2014; Ma et al., 2015; Peng et al., 1999; Sun et al., 2018; Taft et al., 1994; Tsang et al., 2018; Tse & Yap, 2018; Xiong et al., 2023; Yan et al., 2006; Yu et al., 2021; Zhang & Peng, 1992; see Table 1). In LDTs, whole-word frequency effects have been consistently found, while mixed findings of character frequency effects have been reported (Peng et al., 1999; Taft et al., 1994; Xiong et al., 2023; Zhang & Peng, 1992). In Zhang and Peng (1992), facilitative whole-word and character frequency effects were found in separate experiments. RTs were shorter when the whole-word frequency of the target was higher. RTs were also shorter when the frequency of the embedded components of the target word was higher. When both frequency effects were examined within one experiment, interactions between character frequency and compound word frequency were found, although the interaction patterns differed from one study to another (Tse & Yap, 2018; Wang & Peng, 1999). Peng et al. (1999) used a factorial design and found facilitative character frequency effects only for frequent compound words. In contrast, Tse and Yap (2018) conducted a regression analysis which contained 18,983 two-character words, and they found a facilitative character frequency effect that was stronger for words with low whole-word frequency.
>
><anchor name="tbl1"></anchor>xhp_50_5_479_tbl1a.gif

Some other lexical decision studies revealed inhibitory character frequency effects, showing longer RTs for words comprising more frequent characters (Tsang et al., 2018; Sun et al., 2018; Xiong et al., 2023). In a mega lexical-decision study of more than 10,000 simplified Chinese words (Tsang et al., 2018), an inhibitory character frequency effect was found after accounting for the number of words the character can form. Notably, the variable in the above studies was the average character frequency within a multicharacter word instead of the separate character frequency for each component. Sun et al. (2018) conducted a reanalysis of two existing lexical decision databases (Chinese Lexicon Project [CLP], Tse et al., 2017; Megastudy of Lexical Decision in Simplified Chinese [MELD-SCH], Tsang et al., 2018), distinguishing first and second character frequency. Regression analyses initially revealed inhibitory character frequency effects of either component. However, a subsequent post hoc analysis that employed principal components as predictors—instead of using raw variables—revealed facilitative character frequency effects. Sun et al. posited that the initial inhibitory results were artifacts stemming from collinearity in the models, and they concluded that the character frequency effects were facilitative. Nevertheless, in a recent study strictly manipulating whole-word frequency and first character frequency of compound words, Xiong et al. (2023) observed inhibitory effects of first character frequency, but only for low-frequency words. They speculated that the reversed character frequency effects might stem from the influence of neighborhood size and/or frequency.

Eye-tracking studies, like lexical decision research, consistently show facilitative whole-word frequency effects during sentence reading, with high-frequency compound words being read faster than low-frequency compound words (Li et al., 2014; Ma et al., 2015; Sun et al., 2018; Tsang et al., 2018; Yan et al., 2006; Yu et al., 2021). However, findings of character frequency effects are mixed (see Table 1 for summary). Although Yan et al. (2006) found that the fixation durations on compound words were longer when their first character frequency is low, other studies revealed shorter times for words containing high-frequency first characters (Cui et al., 2021; Xiong et al., 2023; Yu et al., 2021). Still others did not find significant character frequency effects (Li et al., 2014; Ma et al., 2015).

In summary, given the mixed findings of character frequency effects in previous studies, it is hard to conclude whether Chinese compound words are processed in a holistic or decompositional manner.

<bold>Whole-Word Effects</bold>

While it is unclear whether and how embedded characters affect compound word processing, much robust and consistent evidence have been reported supporting that words are generally processed as whole units (e.g., Li et al., 2014; Yu et al., 2021; Xiong et al., 2023). Additionally, despite the absence of visual cues for word boundaries in Chinese, evidence suggests that words are generally processed as whole units. This is supported by longer reading times when spaces or other interference were added between characters within each word, but not between words themselves (Bai et al., 2008; M. Chen et al., 2021; Li et al., 2012, 2013; Zang et al., 2013). These results suggest that Chinese readers do not process texts character by character. Additionally, word superiority effects in Chinese show that characters in words are identified faster and more accurately than in nonwords (Reicher, 1969; Shen & Li, 2012). Thus, even without explicit boundaries between words, Chinese text appears to be processed holistically during sentence reading.

<h31 id="xhp-50-5-479-d282e420">Models on Chinese Compound Word Processing</h31>

Some models, based on the interactive activation principle (McClelland & Rumelhart, 1981), aimed to explain compound word processing in Chinese reading. Some of these models predict facilitative effects of components because of excitatory connections between characters and multicharacter words (e.g., Taft & Zhu, 1997; Tan & Perfetti, 1999). Some others assume that the effect of characters on compound word processing depends on the properties of the word (Peng et al., 1999; X. Zhou & Marslen-Wilson, 2000). For example, the inter–intra model suggests that if a compound word is semantically transparent, its parts positively influence how quickly the whole word is recognized; if the compound word is semantically opaque, its parts make it slower to recognize the whole word (Peng et al., 1999). Overall, these models predict character frequency affects Chinese compound word processing.

The Chinese reading model (CRM) proposed by Li and Pollatsek (2020) was designed to explain how Chinese readers recognize words and control eye movements without relying on interword spaces. The model comprises two modules: one for word recognition and another for eye-movement control. In the word recognition module, characters within the perceptual span are activated in parallel at the character level, and then they activate possible words containing these characters. Because each character can only belong to one word, CRM assumes that there are inhibitory lateral links between spatially overlapping word units. By doing so, all activated spatially overlapping words compete for recognition, and the word with the highest activation wins. This mechanism allows the model to simultaneously segment and recognize words in continuous Chinese text.

CRM provides a unique perspective on the mechanism of Chinese compound word processing. Unlike the traditional dichotomous approach, CRM centers on the competition among all activated words, including both single and multicharacter words within the perceptual span. Compound words win most of the time because they receive activation from all constituent characters, and their activation value increases faster than the embedded single-character words. Therefore, CRM predicts that compound words are ultimately identified as a whole, aligning with the evidence for holistic processing (e.g., Li et al., 2014; Ma et al., 2015; Yang et al., 2012). Specifically, the model simulated the findings of word frequency effects in Wei et al. (2013), where the two-character strings are recognized as a whole-word in 99% of the trials. Moreover, CRM predicts that the frequency of the component words (i.e., the embedded characters as individual words) impacts the competitive process. High-frequency component words may cause more competition, prolonging the time for compound words to settle the competition. Furthermore, lower frequency compound words should be more impacted by the competition from the embedded component words, given that the baseline activation of these low-frequency compound words are lower to begin with. Therefore, a larger component-word frequency effect is expected when identifying low-frequency compound words than high-frequency component words.

In summary, previous CRMs assume that the components of a compound word affect Chinese compound word processing, although different models make different predictions. Models of decomposition processing predict a facilitative effect at the character level, while CRM assumes that compound words are recognized based on competition and predicts an inhibitory effect of the component at the word level.

<h31 id="xhp-50-5-479-d282e464">The Present Study</h31>

The present study aimed to investigate the mechanism of Chinese compound word processing. Specifically, we tested one prediction of the CRM model. According to CRM, the embedded components compete with the whole word at the word level, and this competition results in an inhibitory component-word frequency effect. Previous studies on Chinese compound word processing only focused on the influence of character frequency, ignoring the fact that components in compound words could be used independently as words and compete with compound words during reading to induce an inhibitory effect on compound word processing. This may explain the inconsistent findings using different experimental materials because the component-word frequency was seldom controlled previously. Although CRM was initially designed to simulate word processing during sentence reading, it posits that words are the units of sentence reading and contain a word processing module. The LDT differs from sentence reading in that readers need to make decision regarding whether the characters make up the word. However, the initial word processing stage may be similar for lexical decision and natural reading. This is the reason that researchers use the LDT to study how words are identified. Therefore, it is reasonable to assume that CRM can simulate the procedure of compound word processing.

In Study 1, we analyzed the corpus data from the MELD-SCH (Tsang et al., 2018) and CLP-Tse (Tse et al., 2017) of traditional characters<anchor name="b-fn1"></anchor><sups>1</sups> for the LDT to investigate how whole-word frequency, character frequency, and component-word frequency jointly affect word processing. According to CRM (Li & Pollatsek, 2020), in addition to whole-word frequency, components are assumed to play inhibitory roles at the word level; according to other frameworks (Tan & Perfetti, 1999; Taft & Zhu, 1997), components are assumed to play facilitative roles at the character level. These predictions were evaluated in Study 1. In Study 2, we conducted two factorial design experiments to further examine component-word frequency effects on word identification with controlled character frequencies, which is the most important prediction of the present study. According to the architecture of CRM, where the component words compete with the whole words at the word level, we expected to observe inhibitory effects of compound-word frequencies. By controlling for character frequencies across conditions, Study 2 provides a more direct investigation of how component-word frequency affects word processing.

Study 1


> <h31 id="xhp-50-5-479-d282e491">Method</h31>

<bold>Database of MELD-SCH</bold>

MELD-SCH (Tsang et al., 2018) reported average RTs in an LDT for 12,578 simplified Chinese words, including 10,022 two-character words. Items were divided into 12 lists, and 42 participants were assigned to each list (504 participants in total). The mean error rate was 5.19%, and only correctly responded trials were included when calculating the RTs.

We analyzed RTs of the LDT on compound words to investigate how they were affected by the following seven linguistic properties: whole-word frequency of the compound word, number of strokes, character frequency, and component-word frequency of the first and second components. While whole-word frequency and number of strokes have been shown to robustly influence lexical decision latencies, the effects of character frequency have been mixed, and the effects of component-word frequency have not been examined. Frequency data were obtained from the SUBTLEX-CH frequency corpus based on simplified Chinese subtitles (Cai & Brysbaert, 2010).

Because the present study focused on distinguishing the effects of character frequency and component-word frequency, we only included those two-character compound words in which the individual components are also words by themselves (9,565 words). Moreover, we excluded items with a mean error rate above 0.33 (283 words). Following the guidelines of Baayen and Milin (2010), items with scaled absolute residual values over three were omitted (totaling 52 words), ensuring the residuals approximated a normal distribution (see Appendix A). The pruning of the statistical model did not change the pattern of statistical effects. Ultimately, 9,230 two-character words were included in the analyses. Finally, as the distributions of frequencies and RTs were highly positively skewed, we applied log transformation with a base of 10 to these values in the subsequent analysis. However, for ease of interpretation, Table 2 presents descriptive statistics of raw frequency values.
>
><anchor name="tbl2"></anchor>xhp_50_5_479_tbl2a.gif

<bold>Database of CLP-Tse</bold>

CLP-Tse (Tse et al., 2017) reported average RTs in an LDT for 25,286 traditional Chinese two-character compound words. Items were divided into 18 lists, and 33 participants were assigned to each list (594 participants in total). The mean error rate for words was 11.67%, and only correctly responded trials were included when calculating the RTs.

Although the words in CLP-Tse were written in traditional Chinese, which is visually more complex than simplified Chinese, it has been verified that simplified-character-based frequency measures explain slightly more variance in lexical decision RT than traditional character-based frequency measures (Tse et al., 2017). As a result, when analyzing CLP-Tse, the number of strokes was counted based on the form of traditional Chinese, and all other frequency measures were obtained from the SUBTLEX-CH frequency corpus (Cai & Brysbaert, 2010).

The analysis of CLP-Tse is trial-based, and there are 1,668,876 trials in the raw data set containing 25,286 different two-character words. First, we preprocessed the data based on items. Similar to the preprocessing of MELD-SCH, we only included those two-character compound words in which the individual components are also words by themselves (18,533 different words). Moreover, we excluded words with a mean error rate above 0.33 (816 words). Then, trials with RTs longer than 2,500 ms or shorter than 200 ms were excluded (7,189 trials). Since the distribution of RTs was positively skewed, log transformation was applied to reduce skewing. Next, as recommended by Baayen and Milin (2010), we removed 725 words whose scaled absolute residual values were over three to make the residuals approximately normally distributed (see Appendix A). The pruning of the statistical model did not change the pattern of statistical effects. Ultimately, 586,742 trials that contained 17,717 two-character words were included in the analyses. Finally, as the distributions of frequencies and RTs were highly positively skewed, we applied log transformation with base 10 to these values in the subsequent analysis. However, for ease of interpretation, Table 2 presents descriptive statistics of raw frequency values.

<bold>Analyses</bold>

The available data of MELD-SCH were based on items instead of including every response of each participant, so we fit linear regression models to the item-based average RTs in MELD-SCH. Meanwhile, we fit linear mixed-effect models (LMMs) to the trail-based RTs in CLP-Tse using the lme4 package for R 3.6.3 (Bates et al., 2015; R Development Core Team, 2020), with subject and word as random factors. Although the model was initially structured with a maximal random factor, convergence issues necessitated the removal of all random slopes. Consequently, the final model retained only random intercepts. The whole-word frequency of the compound word, number of strokes, character frequencies, and component-word frequencies of each component were included as predictors in multiple linear regression models fitted for data sets of MELD-SCH, and they were included as fixed factors in LMMs fitted for data sets of CLP-Tse in initial analyses. Models were constructed in which all predictors (whole-word frequency, numbers of strokes, character frequencies, and component-word frequencies) were entered simultaneously. The intercorrelations and variance inflation factors (VIFs) are shown in Appendix A. VIF is a measure of the severity of the multicollinearity problem in multiple linear regression models. Generally, if VIF is greater than 10, then multicollinearity is high (Kutner et al., 2004), and a cutoff of five is also commonly used (Sheather, 2009). In the current study, all VIFs were smaller than 5 in the model fitted for MELD-SCH, and all VIFs were smaller than 4 in the model fitted for CLP-Tse. Q–Q plots for the dependent variables and the residuals and residual plots of predicted values against residuals indicated that the assumptions of normal distribution and homoskedasticity were approximately satisfied (see Appendix A). Furthermore, the interaction terms between whole-word frequency and character frequency as well as whole-word frequency and component-word frequency were included in the second step. We included the interaction terms because some previous studies have shown interactive effects between whole-word frequency and character frequency (Cui et al., 2021; Tse & Yap, 2018; Peng et al., 1999; Wang & Peng, 1999; Yan et al., 2006). All independent variables were mean-centered and standardized (Ford et al., 2010). When the interaction term was significant, a simple slope analysis was conducted using GAMLj for jamovi 1.8 (Gallucci, 2019).

<bold>Transparency and Openness</bold>

The code of analysis can be retrieved from <a href="https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c" target="_blank">https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c</a>, and the data set of MELD-SCH (Tsang et al., 2018) and CLP-Tse (Tse et al., 2017) are available by corresponding the authors of the database.

<h31 id="xhp-50-5-479-d282e601">Results</h31>

The model accounted for 39.73% of the variance in the data of MELD-SCH.<anchor name="b-fn2"></anchor><sups>2</sups> As shown in Table 3, some classic effects of linguistic properties were found in the models. For both MELD-SCH and CLP-Tse, the regression coefficient of whole-word frequency was negative, indicating that RTs for high-frequency words were shorter than those for low-frequency words (for MELD-SCH, β = −.075, t = −70.30, p &lt; .001; for CLP-Tse, β = −.053, t = −94.58, p &lt; .001). Notably, the effect size of whole-word frequency on RTs is much larger than that of any component property, as indicated by regression coefficients. The regression coefficients of the number of strokes for both characters were positive, indicating that RTs for words with visually complex characters (with more strokes) were longer than those for words with simple characters (for MELD-SCH, βs &gt; .005, ts &gt; 4.95, ps &lt; .001; for CLP-Tse, βs = .004, ts &gt; 6.62, ps &lt; .001).
>
><anchor name="tbl3"></anchor>xhp_50_5_479_tbl3a.gif

Most interestingly, the component-word frequency and character frequency of the component showed opposite effects. Specifically, in both models, the regression coefficients of first component-word frequency were positive, indicating that compound words containing a high-frequency first component word were identified more slowly than those with a low-frequency first component word (for MELD-SCH, β = .012, t = 6.13, p &lt; .001; for CLP-Tse, β = .009, t = 8.66, p &lt; .001). In contrast, the regression coefficients of the first character frequency were negative, suggesting shorter RTs in lexical decisions to compound words with higher first character frequency (for MELD-SCH, β = .008, t = −3.89, p &lt; .001; for CLP-Tse, β = −.011, t = −10.44, p &lt; .001). However, the frequency effects of the second component were less robust. In the model fitted for MELD-SCH, no significant effect was found for the component-word frequency or character frequency of the second component (for second component-word frequency, β = .006, t = 1.20, p = .231; for second character frequency, β = .001, t = 0.46, p = .647); in CLP-Tse, both inhibitory component-word frequency and facilitative character frequency effects of the second component were significant (for second component-word frequency, β = .006, t = 6.64, p &lt; .001; for second character frequency, β = −.008, t = −7.73, p &lt; .001), namely, RTs were longer with increasing component-word frequency or decreasing character frequency of second components.

To further investigate whether whole-word frequency would moderate component frequency effects, including character frequency and component-word frequency, we constructed new models with interactions. In the model fitted for the data in MELD-SCH, both first and second component-word frequencies interacted with the whole-word frequency significantly (first component: β = .004, t = 2.23, p = .026; second component: β = .004, t = 2.14, p = .033). Subsequent simple effect analyses were conducted setting component-word frequency as a simple effects variable and whole-word frequency as a moderator. Moderators were set to three levels, namely, the 25th, 50th, and 75th percentiles, representing low-, medium-, and high-frequency levels, respectively; the corresponding values are shown in Table 2. In this way, after controlling the effects of other variables, the simple slopes of component-word frequency (the effect of component-word frequency) computed for low-, medium-, and high-whole-word frequency were obtained and are shown in Figure 1. For first component-word frequency, an inhibitory effect was observed for compound words of all frequency levels, and this effect increased with whole-word frequency (low-frequency: t = 3.01, p = .003; medium-frequency: t = 5.03, p &lt; .001; high-frequency: t = 5.69, p &lt; .001). For second component-word frequency, the effects were not significant regardless of the level of whole-word frequency (low-frequency: t = −0.95, p = .341; medium-frequency: t = 0.11, p = .911; high-frequency: t = 1.33, p = .182). Moreover, the interaction between first character frequency and whole-word frequency was not significant (β = .001, t = 0.63, p = .527). In contrast, the interaction for second character frequency was significant (β = .004, t = 2.01, p = .044). With second character frequency as the simple effects variable, the results showed that the effect was not significant for low- and medium-frequency compound words but inhibitory for high-frequency compound words (low-frequency: t = 0.06, p = .950; medium-frequency: t = 1.98, p = .194; high-frequency: t = 2.13, p = .033).
>
><anchor name="fig1"></anchor>xhp_50_5_479_fig1a.gif

In the model fitted for CLP-Tse, first component-word frequency had an interaction with whole-word frequency (β = .003, t = 3.06, p = .002). Simple effect analysis showed that at all whole-word-frequency levels, the effects of first component-word frequency were always significant in the direction of inhibition (low-frequency: z = 4.57, p &lt; .001; medium-frequency: z = 7.36, p &lt; .001; high-frequency: z = 8.09, p &lt; .001). Moreover, the interactions between whole-word frequency and character frequency were significant for the second component (β = .004, t = 4.05, p &lt; .001). Facilitative character frequency effects were observed for words of all frequency levels, although they decreased with whole-word frequency (low-frequency: z = −8.27, p &lt; .001; medium-frequency: z = −7.04, p &lt; .001; high-frequency: z = −3.52, p &lt; .001). Other interactive effects were not significant in the model (see Table A2 for more details).

<h31 id="xhp-50-5-479-d282e769">Discussion</h31>

To investigate whether the frequency of components influence whole compound word processing, two data sets of Chinese lexical decisions were analyzed in Study 1. Many interesting findings were observed in these analyses for both data sets. First, an inhibitory component-word frequency effect was observed, with RTs in the LDT increasing with component-word frequency regardless of the whole-word frequency. Second, we observed a facilitative character frequency effect, with RTs of the LDT decreasing with first character frequencies. The effect was significant only for the first character in MELD-SCH, but it was significant for both the first and second components in CLP-Tse. Third, a whole-word frequency effect was observed, with RTs in the LDT decreasing with an increase in whole-word frequency. Interestingly, the whole-word frequency had larger effects on RTs of lexical decisions than any character properties, which was reflected by regression coefficients. Finally, the interactions of component-word frequency and whole-word frequency had a consistent pattern in the analysis of two data sets, showing increased competition at the word level when processing high-frequency compound words.

In summary, when the statistical model considered both character frequency and component-word frequency simultaneously, they had effects in different directions. Moreover, the frequency effects of the first component were more stable than those of the second component, which might result from the left-to-right reading direction. Meanwhile, the effects found in CLP-Tse were more stable than those in MELD-SCH. This is possibly because there are more words in CLP-Tse, and this data set provides trial-based information, which makes the consideration of variance between subjects possible.

Study 2


>

Study 1 found that character frequency and component-word frequency affect the RTs of lexical decisions differently when considering the two variables simultaneously. As predicted by CRM, components would inhibit compound word processing. One problem with examining the two effects in uncontrolled corpus data sets, however, is that character frequency and component-word frequency are highly correlated (for first component, r = .87 in MELD-SCH and .85 in CLP-Tse; for second component, rs = .83 in both MELD-SCH and CLP-Tse). Thus, the statistical issue, known as multicollinearity, posed a challenge to reasonably interpret the effects of component-word frequency, which should be interpreted with caution. To bolster the finding of component-word frequency effects in Study 1, we conducted two factorial-design experiments using LDT in Study 2.

<h31 id="xhp-50-5-479-d282e783">Experiment 1</h31>

<bold>Method</bold>

Participants


>

Seventy-eight native Chinese-speaking participants (57 females) from Mainland China with normal or corrected-to-normal vision were recruited online to participate in the experiment. Their ages ranged from 18 to 29 years. Given the number of words in each condition, there were 1,716 observations per condition, which is comparable to the recommendation of Brysbaert and Stevens (2018). The study was approved by the ethics committee of the Institute of Psychology, Chinese Academy of Sciences, and the participants received a small monetary compensation for their participation.

Stimuli


>

Whole-word frequency (medium vs. low) and first component-word frequency (high vs. low) were orthogonally manipulated to form four conditions. The whole-word frequency of the compound word was divided into medium (M = 28, range from 9 to 69 occurrence per million) or low (M = 0.6, range from 0.1 to 2 occurrence per million). Similarly, the first component-word frequency—the word frequency of the first component when it was used as a single-character word—was also divided into high (M = 80, range from 40 to 210 occurrence per million) or low (M = 1.9, range from 0.4 to 3 occurrence per million). For example, in medium-frequency compound word conditions, the first component-word frequency of “蓝色” (meaning blue color) is high (“蓝” [meaning blue]), and the first component-word frequency of “危机” (meaning crisis) is low (“危” [meaning danger]). Similarly, in low-frequency compound word conditions, the first component-word frequency of “赌债” (meaning gambling debts) is high (“赌” [meaning gamble]), and the first component-word frequency of “汽船” (meaning steamship) is low (“汽” [meaning steam]). The entire list of stimuli can be found in Table B1. Except for these two factors, other character properties, including the number of strokes, character frequency, and family size, were controlled across conditions (see Table 4). The frequency data were obtained from SUBTLEX-CH (Cai & Brysbaert, 2010). A total of 88 two-character compound words were selected for four conditions, and therefore, there were 22 different words per condition.
>
><anchor name="tbl4"></anchor>xhp_50_5_479_tbl4a.gif

There were 88 two-character nonwords, which were combined with two characters by randomizing the second characters of all real words in the experiments. This ensured that character-level properties were matched between words and nonwords. All nonwords were manually checked to ensure that they were not an existing word orthographically or phonologically.

Apparatus


>

This study was conducted online on Pavlovia, and PsychoPy (Peirce et al., 2019) was used to program and implement the experiment, recording RTs and accuracy rates. All participants were asked to complete the experiment in a quiet room using their own computers, of which the resolution was set to 1,920 × 1,080 pixels and the refresh rate was 60 Hz. Stimuli were presented in black 26-size Song font on a gray background in the center of the display screen one at a time.

Procedure


>

Before the formal experiment, eight words and eight nonwords were presented to help participants familiarize themselves with the task. Each trial started with a 500-ms fixation cross in the center of the screen, followed by a stimulus that was displayed until the participant responded (or 2,500 ms). Participants decided whether the two-character string presented on the screen was a word by pressing the keyboard as quickly and as accurately as possible; participants pressed “J” for “yes” and “F” for “no.” They were presented with a 300-ms blank screen for their correct response or 300-ms feedback for their incorrect response, and after another 200-ms blank screen, a new trial started.

Transparency and Openness


>

The materials, raw data, and the code of analysis in R is publicly available at the Open Science Framework website (<a href="https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c" target="_blank">https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c</a>).

<bold>Results</bold>

Only responses for words in experimental parts were analyzed, including accuracy rates and RTs. Generalized linear mixed-effect models were tested using the lme4 packages (Bates et al., 2015) in R 4.2 to analyze accuracy rates, and LMMs were used to analyze RTs. Because of the positive skewness of the RTs, the data were log-transformed to meet the distribution assumption of LMMs.<anchor name="b-fn3"></anchor><sups>3</sups> In all models, whole-word frequency (medium was coded as −0.5 and low was coded as 0.5) and component-word frequency (high was coded as −0.5 and low was coded as 0.5) were entered as contrast coded fixed factors, specifying participants and items as crossed random factors. All models were initially constructed with a maximal random factor structure. If the maximal model did not converge, a simpler model was tested, with the random component generating the smallest variances removed (Barr et al., 2013). We report regression coefficients (bs), SEs, t values (for RTs) or z values (for accuracy rates), and corresponding p values of the optimal model.

Accuracy Rates


>

The mean accuracy of the lexical decisions for all words was 94.7%, and the accuracy rates were larger than 80% for all participants. Because the mean accuracy of two words was less than 67%, their data were excluded from the following analyses.<anchor name="b-fn4"></anchor><sups>4</sups> Both were low-frequency compound words, and one belonged to the high first component-word frequency condition, while the other belonged to the low first component-word frequency condition. The mean accuracy for the remaining words was 95.4%. The descriptive statistics and fixed-effect estimate from the GLMM are shown in Tables 5 and 6. The final model included random intercepts and slopes (i.e., whole-word frequency and the interaction) for subjects and random intercepts for items. The main effect of whole-word frequency was significant, and accuracy was higher for compound words with high whole-word frequency than for those with low whole-word frequency (z = −6.30, p &lt; .001). Neither the main effect of first component-word frequency nor the interaction was significant (p = .966 and .608, respectively).
>
><anchor name="tbl5"></anchor>xhp_50_5_479_tbl5a.gif
>
><anchor name="tbl6"></anchor>xhp_50_5_479_tbl6a.gif

RTs


>

Trials with incorrect responses were first excluded (4.6%), and RTs longer than 2,000 ms or shorter than 200 ms were excluded (0.1%). Finally, RTs beyond 3 SDs were excluded for each condition of each participant (3.1%). In total, this data exclusion procedure resulted in a loss of 7.8% of the data. The final model included random intercepts and slopes (i.e., whole-word frequency and component-word frequency) for subjects and random intercepts for items. The results of LMM showed significant main effects of whole-word frequency (t = 9.54, p &lt; .001) and first component-word frequency (t = −2.11, p = .038) on RTs. The classic whole-word frequency effect was replicated in this study in the direction of facilitation in Chinese compound word recognition; participants identified high-frequency compound words more rapidly than low-frequency compound words. In contrast, the effect of the first component-word frequency was in a reverse direction, which means that the higher the word frequency of the first component of the compound word, the slower the recognition of the whole compound word. Furthermore, the interaction between whole-word frequency and first component-word frequency was not observed (t = 0.07, p = .946; see Table 6). Finally, we calculated Cohen’s drm to compare the effect sizes of whole-word and component-word frequency using the method recommended by Lakens (2013) for repeated-measure mean difference effect size estimation.<anchor name="b-fn5"></anchor><sups>5</sups> The results revealed a stronger effect of whole-word frequency than first component-word frequency (drm = −1.06 and 0.24, respectively).

<h31 id="xhp-50-5-479-d282e918">Experiment 2</h31>

Nonwords in Experiment 1 were combined with two characters by randomizing the second characters of target words, which means the same character occurred once in the word context and once in the nonword context. This has the unintended consequence of priming the second occurrence of the same character, with unpredictable consequences for the lexical decision latency.<anchor name="b-fn6"></anchor><sups>6</sups> Experiment 2 was designed to exclude this possibility. In Experiment 2, characters in nonwords were not characters that were used in target words. Given that our focus centered on the component-word frequency effects on word identification, we chose to exclusively manipulate first component-word frequency in a broader sample of compound words for Experiment 2.

<bold>Method</bold>

Participants


>

In Experiment 2, 35 native Chinese-speaking participants (21 females) from Mainland China with normal or corrected-to-normal vision were recruited online to participate in the experiment. Their ages ranged from 19 to 26 years. The number of observations per condition in this experiment was 1,750, closely matching the 1,716 observations per condition in Experiment 1. As stated previously, these numbers are comparable to the recommendations made by Brysbaert and Stevens (2018).

Stimuli


>

First component-word frequency was manipulated and was divided into high (M = 73, range from 50 to 126 occurrence per million) or low (M = 1.8, range from 0.4 to 3 occurrence per million). The entire list of stimuli can be found in Table B2. Whole-word frequency and other character properties were controlled across conditions (see Table 7). A total of 100 two-character compound words were selected, with 50 different words per condition. Another 100 two-character nonwords were used as fillers, of which the characters were not in the target words.
>
><anchor name="tbl7"></anchor>xhp_50_5_479_tbl7a.gif

Apparatus


>

The same apparatus was used as in Experiment 1.

Procedure


>

The same procedure was used as in Experiment 1.

Transparency and Openness


>

The materials, raw data, and the code of analysis in R is publicly available at the Open Science Framework website (<a href="https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c" target="_blank">https://osf.io/cs9qv/?view_only=d3a690a906024821a6a22bb5374be10c</a>).

<bold>Results</bold>

The same analysis processes were used as in Experiment 1. In all LMMs, component-word frequency was entered as contrast-coded fixed factors (high was coded as −0.5 and low was coded as 0.5).

Accuracy Rates


>

The mean accuracy was 93.7%. The descriptive statistics and results of the GLMM are shown in Table 8. The final model included random intercepts for subjects and items. The component-word frequency effect was significant (b = 0.53, SE = 0.20, z = 2.69, p = .007). When first component is of low word frequency, the whole compound word is identified more accurately than the high component-word-frequency condition.
>
><anchor name="tbl8"></anchor>xhp_50_5_479_tbl8a.gif

RTs


>

Approximately 8.0% of the trials were excluded using the same criterion as in Experiment 1. The final model included random intercepts and slopes (i.e., component-word frequency) for subjects and random intercepts for items. The results in Table 8 showed a significant first component-word frequency effect (b = −0.03, SE = 0.01, t = −2.28, p = .025), indicating that readers responded more rapidly to compound words containing lower word-frequency first component. The effect size estimated in the same way as that in Experiment 1 showed a small effect of component-word frequency, Cohen’s drm = 0.24. The results replicated the inhibitory component-word frequency effects revealed in Experiment 1, excluding the possibility that the effect was driven by the priming of repeated characters between words and nonwords.

<h31 id="xhp-50-5-479-d282e1004">Discussion</h31>

The results of two factor-designed experiments generally replicated the major finding of Study 1. The results showed that it took longer to identify compound words containing high-word-frequency components than those containing low-word-frequency components. The interaction between whole-word frequency and component-word frequency was not significant. These findings provided evidence to support the argument that component words of compound words compete with the whole word during word processing.

General Discussion


>

The present study examined how Chinese compound words are processed by analyzing two large-scale databases and conducting two lexical decision experiments. In contrast to previous studies, we distinguished component-word frequency and character frequency when investigating how component properties affect compound word processing.

In the present studies, we found two main effects. The first is the classical whole-word frequency effect, with shorter lexical decision latencies for high-frequency compound words. Another is component-word frequency effects, with longer reading times for compound words containing high-frequency component words. The two frequency effects confirmed a prediction of CRM. When the model processes a compound word, both the whole word and the component words are activated and compete for a winner. The whole compound word wins most of the time because it receives more support from visual and character levels than any component words, so it will be identified as a word. CRM assumes that a high-frequency compound word takes less time to win than a low-frequency word, which is shown as the whole word frequency effect in the experiments. Meanwhile, the activation of embedded component words might cause some interference in the competition. CRM predicts that the activation of high-frequency component words is higher than that of low-frequency component words; thus, they cause more competition to the whole compound word. This stronger competition slows down word identification and results in longer processing times. The finding of an inhibitory component-word frequency effect is consistent with this prediction.

Furthermore, the effect size of whole word frequency is larger than that of component-word frequency in both studies. Although their frequency ranges were different, these variables were standardized in the analysis of Study 1 and measured in the same situation in Experiment 1 of Study 2. The finding of larger whole-word frequency effect aligns with the prediction of CRM, which predicts that the whole word usually wins the competition because the whole-word node is supported by bottom-up activation from more character nodes than its component words (i.e., one-character words). Therefore, the component words are inhibited by the whole compound word soon after being activated at the beginning of processing, while the whole compound word is long lived. This possibly makes the frequency effects of components either nonsignificant (as in previous studies, see Li et al., 2014; Ma et al., 2015; Rayner et al., 2007) or trivial compared to the whole-word frequency effects (as in the present study) and makes processing holistic-like in practice (Bai et al., 2008; Shen & Li, 2012; Shen et al., 2018; Yang et al., 2012; Zang et al., 2013; J. Zhou & Li, 2021).

It is necessary to clarify that the competition-based view is different from the dual-route model, where lexical access of component words and whole words takes place in different routes (Caramazza et al., 1988; Pollatsek et al., 2000). In the dual-route model, words are accessed through the faster route of either the holistic or decomposition one and component effects are considered as evidence for decomposition-then-composition. However, our current view posits that lexical processing of component words and whole words are simultaneous at the same level, predicting an inhibitory effect from component-word frequency because of competition. In short, we do not view compound word identification dichotomously but view it as an interactive activation-based competition among all possible words.

The findings of the present study might provide one solution to the discrepant findings in the literature regarding how character frequency affects word identification in Chinese reading. Some previous studies found a facilitative effect of character frequency on compound word processing (e.g., Peng et al., 1999; Wang & Peng, 1999; Yan et al., 2006), others found inhibitory effects (e.g., Tsang et al., 2018; Xiong et al., 2023; Yu et al., 2021), and still others found null effects (e.g., Cui et al., 2017; Li et al., 2014; Ma et al., 2015). As we argued in the Introduction section, components of compound words may produce two opposite effects on Chinese compound word processing: a facilitative effect at the character level (Taft & Zhu, 1997) and an inhibitory effect at the word level (Li & Pollatsek, 2020). Consistent with these predictions, inhibitory component-word frequency effects of the first component were observed in two studies, while facilitative character frequency effects were observed in Study 1. The balance of these two effects can explain the mixed findings from previous studies, which only included character frequencies as variables without considering component-word frequencies (e.g., H. C. Chen et al., 2003; Tsang et al., 2018; Tse & Yap, 2018). Based on the results from the new analysis on the corpus data in Study 1 and the two experiments in Study 2, we argued that the key to solving this puzzling picture in the literature is to consider the effects of component words when theorizing Chinese compound word processing. Possibly, if target words differ greatly in character frequency but not in component-word frequency, a facilitative effect of character frequency on word recognition might be observed. However, if the components are of high word frequency in the high character frequency condition, an inhibitory effect might override the facilitative one. Meanwhile, this explanation is just one possibility causing the mixed results of character frequencies in previous studies and it does not exclude other possibilities.

Additionally, in Study 1, the interactions between whole-word frequency and component-word frequency were significant, suggesting that whole-word frequency is an essential determinant of component-word frequency effects and component-word frequency effects are stronger when the whole-word frequency is higher. However, the interaction was not replicated in Experiment 1 of Study 2, an empirical study in which whole-word frequency and first component-word frequency were manipulated as category variables. In contrast, first component-word frequency showed inhibitory effects on the RTs of lexical decisions independent of compound word frequency. One probable reason for the absence of an interaction is that the range of whole word frequency is limited. It remains to be seen whether an experimentally manipulated component-word frequency effect would be smaller or nonexistent for compound words with high whole-word frequency. Note that despite not using the high-frequency words, the frequency range we selected in Experiment 1 covers 65% of all the two-character words, suggesting that the competition pattern we observed occurs for most of the Chinese compound words.

The results of Study 1 also showed that the effects of the first character and the second character were different to some degree. The frequency effects of the first component are more robust and stronger than those of the second component, which is consistent with previous studies showing similar patterns in Chinese compound word processing even when words were presented in isolation (Peng et al., 1994; Tan & Perfetti, 1999). Differences between the two characters of a word might be caused by reading direction. Because Chinese readers usually read from left to right so that their eyes usually move from left to right, the first character of a word may have some advantages over the second character during reading (Ma et al., 2015). However, considering that the frequency effects of the second component are not consistent in Study 1, significant in the analyses of CLP-Tse but not in those of MELD-SCH, more empirical studies are needed to verify the frequency effects of the second component on Chinese compound word processing. Meanwhile, CLP-Tse is a data set of traditional Chinese, while MELD-SCH is based on simple Chinese, so it is also possible that there are some differences between the lexical identifications in these two visually different Chinese.

Inhibitory effects of component word frequency on compound word processing have also been observed in some alphabetic languages such as Basque and Vietnamese (Pham & Baayen, 2015; Vergara-Martínez et al., 2009). Most studies of English observed facilitative effects of morpheme frequency (Inhoff et al., 2008; Schmidtke et al., 2021). However, the effect is not always robust. For example, in an LDT, when the second component was a high-frequency word, the frequency effect of first component was not significant (Juhasz et al., 2003). Moreover, in eye movement studies, Juhasz et al. (2003) also did not find significant first lexeme effects. Although studies of English compound words did not consistently observe component word frequency, none has reported inhibitory effects. Apparently, there are some cross-language differences regarding how component frequency affects compound word processing. The exact reasons for these differences are currently unclear, and further research is required to understand them.

This raised a question of whether the mechanism of compound word processing proposed in the present study is specific to Chinese or is a universal approach for all writing systems. The unique properties of Chinese might affect compound word processing in the following ways. First, Chinese words are short, allowing readers to process a word within a single fixation. In contrast, longer compound words in alphabetic languages might need more fixations, preventing holistic processing. Second, because there are no explicit marks to demarcate words in Chinese, readers need to decide which word each character belongs to. This may encourage competition between whole words and the components. In contrast, for English compounds, the absence of whitespace may suggest that the embedded word is not to be identified separately, potentially reducing inhibitory effects. Finally, morphemes are salient in Chinese and likely to be activated early during processing. This might not happen as quickly in alphabetic languages if morpheme boundaries are not apparent. These differences suggest that compound word processing in Chinese might have unique properties compared to alphabetic writing systems. The linguistic experience could affect how readers process words (Traficante et al., 2018). Therefore, it is an interesting question regarding how well CRM explains word processing in alphabetic languages.

One further question is whether the mechanism for processing compound words in an LDT could be applied to natural sentence reading. On the one hand, multiple words are presented simultaneously without obvious word boundaries during natural reading. It is likely that the mechanism of compound word processing would be affected by the procedure of word segmentation during sentence reading. Zang et al. (2016) manipulated the lexical probability (i.e., the likelihood of a character being a single-character word vs. part of a two-character word) of the first component and the preview of the second component in a sentence reading study, with character frequency matched. They found when the first component was more likely to be a single-character word, the preview effects on the whole words reduced, indicating Chinese readers could use lexical probability cues for word segmentation during sentence reading. On the other hand, given that words are presented with contexts and readers might rely more on top-down information during reading, the influence of character frequency might be relatively weak (Cui et al., 2013, 2021; Li et al., 2014; Ma et al., 2015; Yan et al., 2006; Yu et al., 2021). Accordingly, words in a sentence might be processed essentially as psychological units and possibly induce no or little difficulty in segmentation for Chinese readers (Bai et al., 2008). In sum, future research is crucial to determine the extent to which character frequency and component-word frequency serve as distinct factors in the mental lexicon of Chinese readers, as well as to assess the generalizability of compound word processing mechanisms across tasks.

Similar to the results of the studies presenting words in isolation, previous sentence-reading studies tended to observe robust whole-word frequency effects and mixed character frequency effects. Recent studies have found inhibitory effects of character frequency on compound word processing during sentence reading (Cui et al., 2021; Xiong et al., 2023; Yu et al., 2021). Cui et al. (2021) explained the inhibitory first-character effect under the constraint hypothesis (Hyönä et al., 2004) based on the observation that morphological family members (number of words the character appears in) and first character frequency were strongly correlated. It was hypothesized that the fewer the morphological family members associated with the first character, the stronger constraint the first character has on the possible compound words. The constraint might be particularly useful when the whole compound word is low frequency. Yu et al. (2021), however, pointed out that family member sizes are mostly found to be facilitative in alphabetic languages (e.g., Dutch: Kuperman et al., 2009; English: Juhasz & Berkowitz, 2011; Finnish: Kuperman et al., 2008), as well as in Chinese (Yao et al., 2022). Furthermore, when they analyzed only a subset of target words to equate family member size, the inhibitory effect of first character frequency was still present. They therefore refuted the constraint hypothesis. In the current research, when including family size into the analysis of Study 1, its effect on word identification was only significant in the analysis of CLP-Tse, in a direction of facilitation, but absent in the analysis of MELD-SCH (more details in Table A3). Notably, even when including family sizes, there is still facilitative character frequency and inhibitory component-word frequency effects, consistent with initial findings. Instead, Yu et al. argued that the inhibitory character frequency effect reflects the heuristics Chinese readers use to perform word segmentation when reading multiple consecutive characters in a sentence, whereby the unfamiliarity from a low-frequency first character induces an inference of a one-character word and a short fixation. However, our current lexical-decision results imply that the inhibitory effect of the component does not necessarily emerge because of the need for segmentation because the targets were presented in isolation (also see Xiong et al., 2023). We leave the question of generalization between single-word and sentence-reading paradigms to future studies where the effect of component-word frequency is explicitly examined. If the component-word frequency affects the eye-movement measures in the same way when controlling the character frequency, it will enhance the application of our theory in Chinese reading.

Finally, we acknowledge the limitation that we did not consider semantic processing, although this is an integral part of compound word processing. Peng et al. (1999) found that character frequency effects were moderated by the semantic transparency of the whole word. To interpret this, compound words were divided into semantic transparent or opaque words in their model (not computationally implemented), and there were different types of connections between morpheme and word nodes depending on the transparency. Simply based on the measurements of RTs in LDTs, it is also difficult to discriminate the time courses or processing stages of different frequency effects on compound word identification. Considering the tasks in the two studies were both lexical decisions, it is uncertain whether the results could be generalized to other tasks. Additionally, there are inevitably problems to be solved in explaining word processing in other languages because the competition-based word processing mechanism in CRM was targeted at specific properties of Chinese. In the future, further studies are needed to investigate these questions. However, for the present, we mainly focus on the effects at the word level in Chinese compound word identification.

Conclusion


>

By analyzing two existing lexical decision databases and conducting empirical research using LDTs, the present study showed that whole-word properties and component properties affect word processing during Chinese reading. Specifically, facilitative whole-word frequency effects and inhibitory component-word frequency effects were observed in the analyses of previous corpus as well as the experiments with factorial design. These findings support a novel view of how compound words are processed in Chinese reading. According to this approach, both the whole compound word and the words formed by components are activated, and these words compete for a winner. Because compound words are supported by more character units than any component word, the whole word almost always wins the competition, resulting in the compound word being processed as a unit. Meanwhile, because the activated component words compete with the whole word, their properties also influence the time it needs to identify whole compound word. This new approach might explain the previous inconsistent findings about the effects of component frequency and highlight the importance of component words.

Footnotes

<anchor name="fn1"></anchor>

<sups> 1 </sups> Simplified Chinese characters are used mainly in mainland China and have fewer strokes. Traditional Chinese characters, used in regions such as Taiwan, Hong Kong, and Macau, are more complex and retain historical forms. The two systems differ in character complexity and appearance.

<anchor name="fn3"></anchor>

<sups> 3 </sups> Models using raw data of RTs showed similar patterns of significance from the ones conducted on log-transformed data, and therefore, only the results for log-transformed RTs are reported.

<anchor name="fn4"></anchor>

<sups> 4 </sups> Words with accuracy lower than 0.67 may not be processed as words by readers although their word frequencies were not different significantly from other words. The two words are “协约”and“支流”. Models based on all words showed similar patterns of significance from the ones conducted on the trimmed data, and therefore, only the results for trimmed data are reported.

<anchor name="fn5"></anchor>

<sups> 5 </sups> For ease of understanding, the effect size calculations here are consistent with those in Table 1. Negative values indicate facilitative effects and positive values indicate inhibitory effects, and larger absolute values indicate stronger effects.

<anchor name="fn6"></anchor>

<sups> 6 </sups> We thank Sachiko Kinoshita and the editor for pointing out this problem of Experiment 1.

<anchor name="fn2"></anchor>

<sups> 2 </sups> We constructed the linear mixed-effects model to fit the data of CLP-Tse, and therefore, R² is not available in the analysis. When averaging RTs of participants for each word, the linear regression model accounted for 38.1% of the variance in the data of CLP-Tse.

References

<anchor name="c1"></anchor>

Andrews, S. (1986). Morphological influences on lexical access: Lexical or nonlexical effects?Journal of Memory and Language, 25(6), 726–740. 10.1016/0749-596X(86)90046-X

<anchor name="c2"></anchor>

Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12–28. 10.21500/20112084.807

<anchor name="c3"></anchor>

Baayen, R. H., & Schreuder, R. (2000). Towards a psycholinguistic computational model for morphological parsing. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 358(1769), 1281–1293. 10.1098/rsta.2000.0586

<anchor name="c4"></anchor>

Bai, X., Yan, G., Liversedge, S. P., Zang, C., & Rayner, K. (2008). Reading spaced and unspaced Chinese text: Evidence from eye-movements. Journal of Experimental Psychology: Human Perception and Performance, 34(5), 1277–1287. 10.1037/0096-1523.34.5.1277

<anchor name="c5"></anchor>

Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human Perception and Performance, 10(3), 340–357. 10.1037/0096-1523.10.3.340

<anchor name="c6"></anchor>

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. 10.1016/j.jml.2012.11.001

<anchor name="c7"></anchor>

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. 10.18637/jss.v067.i01

<anchor name="c8"></anchor>

Beijing Language Institute. (1986). Modern Chinese frequency dictionary (Xian Dai Han Yu Ci Pin Ci Dian). Beijing Language Institute Press.

<anchor name="c9"></anchor>

Bertram, R., & Hyönä, J. (2003). The length of a complex word modifies the role of morphological structure: Evidence from eye-movements when reading short and long Finnish compounds. Journal of Memory and Language, 48(3), 615–634. 10.1016/S0749-596X(02)00539-9

<anchor name="c10"></anchor>

Bien, H., Levelt, W. J., & Baayen, R. H. (2005). Frequency effects in compound production. Proceedings of the National Academy of Sciences of the United States of America, 102(49), 17876–17881. 10.1073/pnas.0508431102

<anchor name="c11"></anchor>

Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1(1), Article 9. 10.5334/joc.10

<anchor name="c12"></anchor>

Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLOS ONE, 5(6), Article e10729. 10.1371/journal.pone.0010729

<anchor name="c13"></anchor>

Caramazza, A., Laudanna, A., & Romani, C. (1988). Lexical access and inflectional morphology. Cognition, 28(3), 297–332. 10.1016/0010-0277(88)90017-0

<anchor name="c14"></anchor>

Chen, H. C., Song, H., Lau, W. Y., Wong, K. F. E., & Tang, S. L. (2003). Developmental characteristics of eye-movements in reading Chinese. In C.McBride-Chang & H. C.Chen (Eds.), Reading development in Chinese children (pp. 157–169). Praeger.

<anchor name="c15"></anchor>

Chen, M., Wang, Y., Zhao, B., Li, X., & Bai, X. (2021). The trade-off between format familiarity and word-segmentation facilitation in Chinese reading. Frontiers in Psychology, 12, Article 602931. 10.3389/fpsyg.2021.602931

<anchor name="c16"></anchor>

Cui, L., Häikiö, T., Zhang, W., Zheng, Y., & Hyönä, J. (2017). Reading monomorphemic and compound words in Chinese. The Mental Lexicon, 12(1), 1–20. 10.1075/ml.12.1.01cui

<anchor name="c17"></anchor>

Cui, L., Wang, J., Zhang, Y., Cong, F., Zhang, W., & Hyönä, J. (2021). Compound word frequency modifies the effect of character frequency in reading Chinese. Quarterly Journal of Experimental Psychology, 74(4), 610–633. 10.1177/1747021820973661

<anchor name="c18"></anchor>

Cui, L., Yan, G., Bai, X., Hyönä, J., & Liversedge, S. P. (2013). Processing of compound-word characters in reading Chinese: An eye-movement-contingent display change study. Quarterly Journal of Experimental Psychology, 66(3), 527–547. 10.1080/17470218.2012.667423

<anchor name="c19"></anchor>

Duñabeitia, J. A., Perea, M., & Carreiras, M. (2007). The role of the frequency of constituents in compound words: Evidence from Basque and Spanish. Psychonomic Bulletin & Review, 14(6), 1171–1176. 10.3758/BF03193108

<anchor name="c20"></anchor>

Ford, M. A., Davis, M. H., & Marslen-Wilson, W. D. (2010). Derivational morphology and base morpheme frequency. Journal of Memory and Language, 63(1), 117–130. 10.1016/j.jml.2009.01.003

<anchor name="c21"></anchor>

Gallucci, M. (2019). GAMLj: General analyses for linear models [Jamovi module]. <a href="https://gamlj.github.io/" target="_blank">https://gamlj.github.io/</a>

<anchor name="c22"></anchor>

Giraudo, H., & Grainger, J. (2000). Effects of prime word frequency and cumulative root frequency in masked morphological priming. Language and Cognitive Processes, 15(4–5), 421–444. 10.1080/01690960050119652

<anchor name="c23"></anchor>

Hasenacker, J., & Schroeder, S. (2019). Compound reading in German: Effects of constituent frequency and whole-word frequency in children and adults. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(5), 920–933. 10.1037/xlm0000623

<anchor name="c24"></anchor>

Hyönä, J., Bertram, R., & Pollatsek, A. (2004). Are long compound words identified serially via their constituents? Evidence from an eye movement-contingent display change study. Memory & Cognition, 32(4), 523–532. 10.3758/BF03195844

<anchor name="c25"></anchor>

Hyönä, J., & Olson, R. (1995). Eye fixation patterns among dyslexic and normal readers: Effects of word length and word frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(6), 1430–1440. 10.1037/0278-7393.21.6.1430

<anchor name="c26"></anchor>

Hyönä, J., & Pollatsek, A. (1998). Reading Finnish compound words: Eye fixations are affected by component morphemes. Journal of Experimental Psychology: Human Perception and Performance, 24(6), 1612–1627. 10.1037//0096-1523.24.6.1612

<anchor name="c27"></anchor>

Inhoff, A. W., Starr, M. S., Solomon, M., & Placke, L. (2008). Eye movements during the reading of compound words and the influence of lexeme meaning. Memory & Cognition, 36(3), 675–687. 10.3758/MC.36.3.675

<anchor name="c28"></anchor>

Juhasz, B., & Berkowitz, R. (2011). Effects of morphological families on English compound word recognition: A multitask investigation. Language and Cognitive Processes, 26(4/5/6), 653–682. 10.1080/01690965.2010.498668

<anchor name="c29"></anchor>

Juhasz, B. J., Starr, M. S., Inhoff, A. W., & Placke, L. (2003). The effects of morphology on the processing of compound words: Evidence from naming, lexical decisions and eye fixations. British Journal of Psychology, 94(2), 223–244. 10.1348/000712603321661903

<anchor name="c30"></anchor>

Kuperman, V., Bertram, R., & Baayen, R. H. (2008). Morphological dynamics in compound processing. Language and Cognitive Processes, 23(7–8), 1089–1132. 10.1080/01690960802193688

<anchor name="c31"></anchor>

Kuperman, V., Schreuder, R., Bertram, R., & Baayen, R. H. (2009). Reading polymorphemic Dutch compounds: Toward a multiple route model of lexical processing. Journal of Experimental Psychology: Human Perception and Performance, 35(3), 876–895. 10.1037/a0013484

<anchor name="c32"></anchor>

Kutner, M. H., Nachtsheim, C. J., Neter, J. (2004). Applied linear regression models (4th ed.). McGraw-Hill Irwin.

<anchor name="c33"></anchor>

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, Article 863. 10.3389/fpsyg.2013.00863

<anchor name="c34"></anchor>

Lexicon of Common Words in Contemporary Chinese Research Team. (2008). Lexicon of common words in contemporary Chinese. Commercial Press.

<anchor name="c35"></anchor>

Li, X., Bicknell, K., Liu, P., Wei, W., & Rayner, K. (2014). Reading is fundamentally similar across disparate writing systems: A systematic characterization of how words and characters influence eye-movements in Chinese reading. Journal of Experimental Psychology: General, 143(2), 895–913. 10.1037/a0033580

<anchor name="c36"></anchor>

Li, X., Gu, J., Liu, P., & Rayner, K. (2013). The advantage of word-based processing in Chinese reading: Evidence from eye movements. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(3), 879–889. 10.1037/a0030337

<anchor name="c37"></anchor>

Li, X., Huang, L., Yao, P., & Hyönä, J. (2022). Universal and specific reading mechanisms across different writing systems. Nature Reviews Psychology, 1(3), 133–144. 10.1038/s44159-022-00022-6

<anchor name="c38"></anchor>

Li, X., & Pollatsek, A. (2020). An integrated model of word processing and eye-movement control during Chinese reading. Psychological Review, 127(6), 1139–1162. 10.1037/rev0000248

<anchor name="c39"></anchor>

Li, X., Zhao, W., & Pollatsek, A. (2012). Dividing lines at the word boundary position helps reading in Chinese. Psychonomic Bulletin & Review, 19(5), 929–934. 10.3758/s13423-012-0270-6

<anchor name="c40"></anchor>

Ma, G., Li, X., & Rayner, K. (2015). Readers extract character frequency information from nonfixated-target word at long pretarget fixations during Chinese reading. Journal of Experimental Psychology: Human Perception and Performance, 41(5), 1409–1419. 10.1037/xhp0000072

<anchor name="c41"></anchor>

McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1.: An account of basic findings. Psychological Review, 88(5), 375–407. 10.1037/0033-295X.88.5.375

<anchor name="c42"></anchor>

Meyer, D. E., & Schvaneveldt, R. W. (1971). Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. Journal of Experimental Psychology, 90(2), 227–234. 10.1037/h0031564

<anchor name="c43"></anchor>

Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., & Kristoffer Lindeløv, J. (2019). Psychopy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195–203. 10.3758/s13428-018-01193-y

<anchor name="c44"></anchor>

Peng, D. L., Li, Y., & Liu, Z. (1994). Identification of the Chinese two-character word under repetition priming condition. ACTA Psychological Sinica, 26(4), 393–400.

<anchor name="c45"></anchor>

Peng, D. L., Liu, Y., & Wang, C. (1999). How is access representation organized? The relation of polymorphemic words and their morphemes in Chinese. In J.Wang, A. W.Inhoff, & H.-C.Chen (Eds.), Reading Chinese script: A cognitive analysis (pp. 65–89). Lawrence Erlbaum.

<anchor name="c46"></anchor>

Pham, H., & Baayen, H. (2015). Vietnamese Compounds show an anti-frequency effect in visual lexical decision. Language, Cognition and Neuroscience, 30(9), 1077–1095. 10.1080/23273798.2015.1054844

<anchor name="c47"></anchor>

Pollatsek, A., Hyönä, J., & Bertram, R. (2000). The role of morphological constituents in reading Finnish compound words. Journal of Experimental Psychology: Human Perception and Performance, 26(2), 820–833. 10.1037/0096-1523.26.2.820

<anchor name="c48"></anchor>

Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422. 10.1037/0033-2909.124.3.372

<anchor name="c49"></anchor>

Rayner, K., & Duffy, S. A. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition, 14(3), 191–201. 10.3758/BF03197692

<anchor name="c50"></anchor>

Rayner, K., Li, X., & Pollatsek, A. (2007). Extending the E–Z reader model of eye-movement control to Chinese readers. Cognitive Science, 31(6), 1021–1033. 10.1080/03640210701703824

<anchor name="c51"></anchor>

R Development Core Team. (2020). R: A language and environment for statistical computing (Version 4.0.0) [Computer software]. R Foundation for Statistical Computing. <a href="https://www.R-project.org" target="_blank">https://www.R-project.org</a>

<anchor name="c52"></anchor>

Reicher, G. M. (1969). Perceptual recognition as a function of meaningfulness of stimulus material. Journal of Experimental Psychology, 81(2), 275–280. 10.1037/h0027768

<anchor name="c53"></anchor>

Schmidtke, D., Van Dyke, J. A., & Kuperman, V. (2021). CompLex: An eye-movement database of compound word reading in English. Behavior Research Methods, 53(1), 59–77. 10.3758/s13428-020-01397-1

<anchor name="c54"></anchor>

Schreuder, R., & Baayen, R. H. (1995). Modeling morphological processing. In L. B.Feldman (Ed.), Morphological aspects of language processing (pp. 131–154). Lawrence Erlbaum Associates.

<anchor name="c55"></anchor>

Sheather, S. J. (2009). A modern approach to regression with R. Springer.

<anchor name="c56"></anchor>

Shen, W., & Li, X. (2012). The uniqueness of word superiority effect in Chinese reading. Chinese Science Bulletin, 57(35), 3414–3420. 10.1360/972012-666

<anchor name="c80"></anchor>

Shen, W., Li, X., & Pollatsek, A. (2018). The processing of Chinese compound words with ambiguous morphemes in sentence context. Quarterly Journal of Experimental Psychology, 71(1), 131–139. 10.1080/17470218.2016.1270975

<anchor name="c57"></anchor>

Sun, C. C., Hendrix, P., Ma, J. Q., & Baayen, R. H. (2018). Chinese Lexical Database (CLD): A large-scale lexical database for simplified Mandarin Chinese. Behavior Research Methods, 50(6), 2606–2629. 10.3758/s13428-018-1038-3

<anchor name="c58"></anchor>

Taft, M., & Forster, K. I. (1975). Lexical storage and retrieval of prefixed words. Journal of Verbal Learning and Verbal Behavior, 14(6), 638–647. 10.1016/S0022-5371(75)80051-X

<anchor name="c59"></anchor>

Taft, M., & Forster, K. I. (1976). Lexical storage and retrieval of polymorphemic and polysyllabic words. Journal of Verbal Learning and Verbal Behavior, 15(6), 607–620. 10.1016/0022-5371(76)90054-2

<anchor name="c60"></anchor>

Taft, M., Huang, J., & Zhu, X. (1994). The influence of character frequency on word recognition responses in Chinese. In H. W.Chang, J. T.Hung, C. W.Hue, & O.Tzeng (Eds.), Advances in the study of Chinese language processing (pp. 59–73). National Taiwan University.

<anchor name="c61"></anchor>

Taft, M., & Zhu, X. (1997). Submorphemic processing in reading Chinese. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(3), 761–775. 10.1037/0278-7393.23.3.761

<anchor name="c62"></anchor>

Tan, L. H., & Perfetti, C. A. (1999). Phonological activation in visual identification of Chinese two-character words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25(2), 382–393. 10.1037/0278-7393.25.2.382

<anchor name="c63"></anchor>

Traficante, D., Marelli, M., & Luzzatti, C. (2018). Effects of reading proficiency and of base and whole-word frequency on reading noun- and verb-derived words: An eye-tracking study in Italian primary school children. Frontiers in Psychology, 9, Article 2335. 10.3389/fpsyg.2018.02335

<anchor name="c64"></anchor>

Tsang, Y. K., Huang, J., & Lui, M., Xue, M., Chan, Y.-W. F., Wang, S., Chen, H.-C. (2018). Meld-sch: A megastudy of lexical decision in simplified Chinese. Behavior Research Methods, 50(5), 1763–1777. 10.3758/s13428-017-0944-0

<anchor name="c65"></anchor>

Tse, C. S., & Yap, M. J. (2018). The role of lexical variables in the visual recognition of two-character Chinese compound words: A megastudy analysis. Quarterly Journal of Experimental Psychology, 71(9), 2022–2038. 10.1177/1747021817738965

<anchor name="c66"></anchor>

Tse, C. S., Yap, M. J., Chan, Y. L., Sze, W. P., Shaoul, C., & Lin, D. (2017). The Chinese lexicon project: A megastudy of lexical decision performance for 25,000+traditional Chinese two-character compound words. Behavior Research Methods, 49(4), 1503–1519. 10.3758/s13428-016-0810-5

<anchor name="c67"></anchor>

Vergara-Martínez, M., Duñabeitia, J. A., Laka, I., & Carreiras, M. (2009). ERP Correlates of inhibitory and facilitative effects of constituent frequency in compound word reading. Brain Research, 1257, 53–64. 10.1016/j.brainres.2008.12.040

<anchor name="c68"></anchor>

Wang, C., & Peng, D. (1999). The role of surface frequencies, cumulative morpheme frequencies, and semantic transparencies in the processing of compound words. Acta Psychologica Sinica, 31(3), 266–273.

<anchor name="c69"></anchor>

Wei, W., Li, X., & Pollatsek, A. (2013). Word properties of fixated words affect outgoing saccade length in Chinese reading. Vision Research, 80, 1–6. 10.1016/j.visres.2012.11.015

<anchor name="c70"></anchor>

Xiong, J., Yu, L., Veldre, A., Reichle, E. D., & Andrews, S. (2023). A multitask comparison of word- and character-frequency effects in Chinese reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 49(5), 793–811. 10.1037/xlm0001192

<anchor name="c71"></anchor>

Yan, G., Tian, H., Bai, X., & Rayner, K. (2006). The effect of word and character frequency on the eye-movements of Chinese readers. British Journal of Psychology, 97(2), 259–268. 10.1348/000712605X70066

<anchor name="c72"></anchor>

Yang, J., Staub, A., Li, N., Wang, S., & Rayner, K. (2012). Plausibility effects when reading one- and two-character words in Chinese: Evidence from eye movements. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38(6), 1801–1809. 10.1037/a0028478

<anchor name="c73"></anchor>

Yao, P., Staub, A., & Li, X. (2022). Predictability eliminates neighborhood effects during Chinese sentence. Psychonomic Bulletin & Review, 29(1), 243–252. 10.3758/s13423-021-01966-1

<anchor name="c74"></anchor>

Yu, L., Liu, Y., & Reichle, E. D. (2021). A corpus-based versus experimental examination of word-and character-frequency effects in Chinese reading: Theoretical implications for models of reading. Journal of Experimental Psychology: General, 150(8), 1612–1641. 10.1037/xge0001014

<anchor name="c75"></anchor>

Zang, C., Liang, F., Bai, X., Yan, G., & Liversedge, S. P. (2013). Interword spacing and landing position effects during Chinese reading in children and adults. Journal of Experimental Psychology: Human Perception and Performance, 39(3), 720–734. 10.1037/a0030097

<anchor name="c76"></anchor>

Zang, C., Wang, Y., Bai, X., Yan, G., Drieghe, D., & Liversedge, S. P. (2016). The use of probabilistic lexicality cues for word segmentation in Chinese reading. Quarterly Journal of Experimental Psychology, 69(3), 548–560. 10.1080/17470218.2015.1061030

<anchor name="c77"></anchor>

Zhang, B., & Peng, D. L. (1992). Decomposed storage in the Chinese lexicon. In H.-C.Chen & O.Tzeng (Eds.), Language processing in Chinese (pp. 131–149). North-Holland. 10.1016/S0166-4115(08)61890-7

<anchor name="c78"></anchor>

Zhou, J., & Li, X. (2021). On the segmentation of Chinese incremental words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 47(8), 1353–1368. 10.1037/xlm0000984

<anchor name="c79"></anchor>

Zhou, X., & Marslen-Wilson, W. (2000). Lexical representation of compound words: Cross-linguistic evidence. Psychologia: An International Journal of Psychology in the Orient, 43(1), 47–66.

<h31 id="xhp-50-5-479-d282e4022">APPENDICES</h31> <anchor name="A"></anchor> <h31 id="xhp-50-5-479-d282e4023">APPENDIX A: Supplementary Analyses of the Data Sets in Study 1</h31>

<anchor name="B"></anchor> <h31 id="xhp-50-5-479-d282e4032">APPENDIX B: Stimuli Used in Study 2</h31>

Submitted: April 2, 2023 Revised: January 11, 2024 Accepted: January 14, 2024