4.1 Existing Classification of MWEs

MWEs are heterogeneous, and thus they are often classified into different categories. There are diversified categories from different perspectives: pedagogically-oriented (Alexander, 1984; Biber, 2009; Howarth, 1998; Lewis, 1993; Nattinger & DeCarrico, 1992), linguistically-oriented (Cowie, 1988; Granger & Paquot, 2008; Moon, 1998), and NLP-oriented (Baldwin, 2006; Becker, 1975; Sag, Baldwin, Bond, Copestake, & Flickinger, 2002). Some of the proposed categories in English are composed of: (1) polywords that function like individual lexical items, such as by the way; (2) phrasal constraint, such as a__ago, dear___, the___er; (3) sentence builders, containing slots for parameters or arguments, like I think that X, That reminds me of X, Have you heard about X? (4) collocations (noun+adjective, verb+noun, verb+adverb, etc.); (5) institutionalized expressions that have pragmatic functions. They stand as separate utterances in distinct social situations, such as How do you do? (6) discourse devices, such as logical connectors—as a result of, in spite of; temporal connectors —the next is Y; spatial connectors—at the corner; fluency devices—you know; exemplifiers—in other words; summarizers—to sum up, and so on. These categories are very wide, from morphemes to sentences, not committing to any linguistic status of this multiword unit phenomenon.

Regarding Chinese MWEs, there are also varied views on their classification. Here are some representative ones. Y. Liu (2004) divided lexical phrases into fixed lexical bundles and fixed frames based on the views of Nattinger and DeCarrico (1992). J. Zhou (2007) classified lexical chunks into three categories: collocations, customary expressions, and connectors. H. Wang (2007) classified them into familiar phrases (set phrases, institutionalized expressions, two-part allegorical sayings, abbreviations, etc.), proper names, connectors, insertions, high frequency collocations, and institutionalized sentences. H. Li (2008) classified them into three groups: phrases, fixed sentences and frames. Qian (2008) put a wide range of expressions into lexical chunks, including collocations, institutionalized expressions, idioms, proverbs, maxims, sayings, aphorisms, polite conversations, songs, lyrics, and religious texts. Qi (2008) proposed an even wider range of lexical chunks, including fixed collocations, suffixes, and some sentences. Except discourses, he included everything else, from suffixes to sentences. This is the broadest classification until now. Wu, He, and Wu (2009) pointed out that chunks are ubiquitous, but due to the different language family that Chinese and English belong to, there are significant differences in their chunks’ types and characteristics. They subdivided the chunks into fixed phrases, frame chunks, separable word chunks, verb complement chunks and idiom chunks. Considering those chunks which are created instantly by Chinese learners from different linguistic background under different circumstances during the process of learning, they named those chunks “instant chunks”. J. Zhou (2009) divided chunks into fixed structure, filling structure and associated structure on the basis of the differences in their external structure. Xue and Shi (2013) attempted to propose a systematic classification system for chunks with hierarchical characteristics. The subcategories under the system reach up to 14; the number of layers reaches up to 5. W. Wang (2013) divided Chinese chunks into five main classes: collocations, frame structures, customary expressions, idioms and polite formulas. K. Xu (2015) drew upon automatic extraction and manual intervention to extract colloquial conventionalized expressions from a self-built Chinese learners’ colloquial corpus. On this basis, oral chunks are classified based on their forms and functions: among formal classification, chunks are divided into typical chunks and kind chunks; among functional classification, chunks are divided into three types: referential chunks, interpersonal chunks, and textual chunks.

It can be seen that there are various kinds of Chinese MWEs classification systems and the terms used are even more varied. The classification system has the following characteristics. (i) Integration. Most of the classification systems have incorporated the existing research about idioms, collocations, phrases, frame structures, etc. (ii) Hierarchical. Most of the classification systems hold that MWEs are hierarchical, though different criteria are adopted for the division of subcategories. (iii) Continuum. For example, Qian (2008)’s division of solidification combination, limited combination and free combination constitute a continuum whose state develops from solidification to freedom. Wu et al. (2009) proposed a concept named “chunking degree”: the weaker variability of the chunks, the higher their chunking degree, and vice versa. It reveals the prototypical feature of MWEs. Besides, researchers agree that idioms are the most typical type of MWEs, while collocations are considered to be MWEs with lowest typical degree. Between idioms and collocations, there are other categories of MWEs with varying degrees of typicality. This research holds that integration, hierarchy, and continuum are inherent characteristics of MWEs, which are heterogeneous communicative units. It is necessary for a reasonable MWE classification system to be equipped with them.

Considering the various classification systems of Chinese MWEs, this research has noticed that there are some drawbacks. First, very little research mentions what standards it used to make the classification. Second, absolute description is commonly used when mentioning the property of one category. That is, whether a category has or not has certain property. While in fact, a language has many in-between cases depending on whether they are typical. Therefore, such cases should be a “high” or “low” tendency of whether a category has certain property. Third, Chinese is a classifier language, which has special pedagogical value in TCSL. However, little research has put classifier phrases as a category of MWEs. Fourth, due to the different views on the scope of MWEs, some research put morphemes and sentences as the categories of MWEs. The problem is that each of them actually stands alone as a linguistic unit, but none is further investigated. Such an embracive scope cannot display the characteristics of MWEs. Thus they are out of the scope of the current research. Fifth, existing research on Chinese MWEs usually only listed some categories with several typical examples, which is of little help to TCSL learners.

4.2 Classification of MWEs in This Research

On the basis of a comprehensive review of the previous classification of MWEs, this research puts forward the following principles on the classification of MWEs. (1) Absorb the research outcomes in the fields of Chinese lexicology, which have unconsciously studied some MWEs, such as idioms. The fruitful achievements can help to greatly reveal their characteristics, and thus they should be absorbed in the study of MWEs. (2) Taking prefabrication as the core feature during the process of determining the range of MWEs, which can meet the common understanding that people have about MWEs. Therefore, it is inappropriate to incorporate the units of discourses, paragraphs, words, and morphemes in the range of MWEs. (3) Reflecting the continuum characteristics of MWEs and classifying them into subcategories according to common characteristics and different prefabrication property. (4) The classification of MWEs in this study serves CSL. On the one hand, the classification system of MWEs should reflect the hierarchical characteristics. At the same time, the division of hierarchy attaches importance to simplicity and generality. On the other hand, terms used for naming the subcategories should be clear and easy, so as to avoid ambiguity and coinage.

Based on the principles above, this research puts forward a classification system about Chinese MWEs after referring to previous studies and making analysis toward the corpus data, as shown in Table 4.1.

Table 4.1 The classification system of MWEs

4.3 Characteristics of Each Category of MWEs

This section introduces the characteristics of each category of MWEs.

4.3.1 Idioms

Idioms, as a kind of prefabricated language units with rich cultural implication, are the focus of Chinese lexicology research. The study of idioms has a long history, achieves rich results, and contributes to the establishment of a special discipline—the study of idioms. After years of discussion, the academia has reached the basic understanding of idioms’ nature and composition which believes that “idiom” is a general name which includes set phrases, institutionalized expressions, proverbs, two-part allegorical sayings, and maxims. Idioms are characterized by rich contents, concise forms, and frequent use. They also show the characteristics of structural regularity, semantic fusion, and functional integrity (Shao, 2007). Another important mark of the rich study about idioms is the publication of a variety of dictionaries, such as A Dictionary of Chinese Idioms (X. Yang et al., 2005), A Large Dictionary of Chinese Set Phrases (Chinese Big Dictionary Compilation Office, 2007), A Dictionary of Chinese Institutionalized Expressions (Binhong Huang, 2009), and Xinhua Dictionary of Set Phrases (Dictionary Research Center of The Commercial Press, 2013).

Based on the existing research and the idioms extracted from our textbook corpus, this study divides idioms into three sub-classes: set phrases, institutionalized expressions, and other types of idioms. Other types of idioms mainly cover two-part allegorical sayings, proverbs and maxims. Due to the small number of them in the corpus, they are classified into ‘‘Other types of idioms’’. Detailed description of each subcategories is illustrated as follows.

4.3.1.1 Set Phrases

Set phrases are the most typical MWEs in Chinese, which come into being in the form of phrases after long-term use. In terms of structure, a set phrase has a fixed structural form, in which the components cannot be changed. Most of the set phrases has four characters. There are only a few set phrases with three or five or even more characters, such as mòxūyǒu ‘groundless; fabricated’, táolǐ mǎn tiānxià ‘(lit. peach trees and plum trees are everywhere) having students all over the world’ and bǎisī bùdé qí jiě ‘have difficulty in understanding even after repeated thinking’. In terms of the meaning, a set phrase expresses a fixed meaning, which is used as a whole in sentences. Some set phrases’ meanings can be literally understood, such as céngchū bùqióng ‘emerge in an endless stream’, fèijìn xīnjī ‘exhaust all mental efforts’, gèqǔ suǒxū ‘each takes what he needs’, and huǎngrán dàwù ‘understand suddenly’. However, some set phrases’ meanings can not be literally understood, especially those originated from historic stories, such as mángrén mōxiàng ‘(lit. the blind men feel an elephant) make an overall judgement of sth. on the basis of one-sided viewpoint’ and nányuán běizhé ‘(lit. go south by driving the chariot north) act in a way that defeats one’s purpose’. As for their sources, a large number of set phrases are inherited from ancient time, and thus the choice of words is different from modern Chinese. Some morphemes of a set phrase retain the meaning or grammar of ancient Chinese. Some set phrases are quoted directly from ancient articles. There are also some set phrases without a clear source or allusion, but they also become one of the members of set phrases after a long period of use. In terms of registers, idioms generally are classic and are often used in written Chinese.

As a mature part of the study of idioms, there are a large number of dictionaries of set phrases published, which lays the foundation for the identification and interpretation of idioms. This study also refers to them for the extraction of idioms.

4.3.1.2 Institutionalized Expressions

Institutionalized expressions are a kind of MEWEs that are created and developed into fixed forms by people in their daily language use. They have the following characteristics. (i) In terms of the meaning, the whole meaning of an institutionalized expression is fixed by its metaphoric meaning or extended meaning. For example, gǔn xuěqiú ‘(lit. roll a snowball) (of a business, project, etc.) get bigger and bigger as it proceeds’ is the metaphor for the increasing scale of a certain object. (ii) In terms of the structure, the main components of institutionalized expressions are fixed, but their forms can be changeable to some degree. For instance, pèng dīngzi ‘(lit. bump one’s head against a nail) receive serious rebuff’ can have another form: pèng le yī gè dà dīngzi ‘(lit. bump one’s head against one big nail) receive serious rebuff’. Institutionalized expressions mainly have three characters and some have more than three characters, such as gǎn yāzi shàngjià ‘(lit. drive a duck onto a perch) make sb. do sth. entirely beyond him’, liǎng tiáo tuǐ zǒulù ‘(lit. walking on two legs) have a balanced development’, and zuòniú zuòmǎ ‘(lit. work like a horse and cattle) slave for sb.’. (iii) Most institutionalized expressions are verb phrases and some are noun phrases, such as báirì mèng ‘daydream’ and bànbiān tiān ‘(lit. half the sky) women of the new society; womenfolk’. (iv) In terms of the sources, institutionalized expressions are mostly developed from spoken language. The formation of their meanings originates from the accumulation and solidification of people’s experience in daily life. (v) As for the registers, institutionalized expressions have the obvious character of spoken language, whose use effect is vivid, lively, imaginably, and interesting. For the emotional color, institutionalized expressions have strong emotional color, many of which are derogatory.

4.3.1.3 Other Types of Idioms

Other types of idioms include proverbs, two-part allegorical sayings, maxims, famous dicta, aphorisms, and so on. The number of these types of idioms are relatively small in this research’s corpus. Thus they are put into one category in this study.

Their structures are more complicated and the meanings they convey are more complex compared with set phrases and institutionalized expressions. They tend to reveal a general accepted truth through certain things or phenomenon, such as shíjiān rú liúshuǐ ‘(lit. time is like flowing water)’, which indicates a truth that ‘time flies quickly’, with the purpose of persuading people to cherish time. In terms of its use, such an idiom is used as a whole. It can also stand alone as a complete sentence, instead of acting as a syntactic element of a sentence, which is a main feature that is different from other idioms.

4.3.2 Polite Formulas

People often use some fixed sentences or phrases in different communicative settings. When they meet a stranger for the first time, for example, they will say nínhǎo ‘hello’ as a greeting or they may raise a question: nín guìxìng ‘(a polite way of asking one’s name) what’s your honourable surname?’; when receiving praises from others, they tend to respond with the expression: nǎlǐ, nǎlǐ ‘(lit. where, where) a humble response, meaning thank you’; when disturbing others, they will use the expressions such as dǎrǎo le ‘excuse me’, duìbùqǐ ‘sorry’, and bù hǎo yìsi ‘sorry’; when they are in someone’s birthday party, people tend to say zhù nǐ shēngrì kuàilè ‘happy birthday to you’ to give someone good wishes; when they are spending the Spring Festival, the usual expression is xīnnián kuàilè ‘Happy New Year’; at the end of the official letter, people prefer to write shùnzhì chéngzhì de jìngyì ‘please accept my sincere respect’. All these kinds of expressions are called ‘polite formulas’.

Since polite formulas have the character of integrity for storage and extraction, they are also considered as MWEs (Qi, 2008; Qian, 2008; Xue & Shi, 2013). Polite formulas are important teaching contents especially in communicative language teaching. Almost all textbooks will teach them from the elementary level and require students to grasp them. Their characteristics are summarized as follows. (1) They have clear pragmatic settings and communicative purpose, such as zhù nǐ yīlù shùnfēng ‘bon voyage’ which is often used in the farewell scene to convey blessings. (2) They tend to have association with certain behaviors, such as qǐng zuò ‘please take a seat’, qǐng shāoděng ‘please wait for a moment’, qǐng jièshào yīxià ‘please introduce it to us’, and nín gěi gè jià ‘please give a price’. (3) Although there is no definite restriction about the length, the polite formulas chosen in this study from teaching materials are commonly used short phrases rather than complicated long sentences.

4.3.3 Conventionalized Expressions

Quite a lot of scholars take “conventionalized expressions” as a subclass of MWEs (Y. Liu, 2004; Qi, 2008; W. Wang, 2013; K. Xu, 2015; J. Zhou, 2007). However, their boundaries, characteristics, and nature are still vague, which is a weak part of the study of MWEs.

The study of spoken language shows great concern for conventionalized expressions. Chang (1989) pointed out that a conventionalized expression has a fixed form, fixed meaning, and fixed usage context. Their constituents and word order tend to be fixed. Besides, conventionalized expressions are developed from temporary or free combinations into formed forms in oral communication. A representative work of conventionalized expressions is the dictionary A Functional Dictionary of Colloquial Conventionalized Expressions (Chang, 1993), which collects 379 main entries and 148 affiliated entries. A main entry is one of the meaning-related conventionalized expressions; an affiliated entry is the synonyms or antonyms to a main entry. The conventionalized expressions collected in the dictionary include two kinds of forms: phrases and frame structures. Examples of phrases include dǐngduō ‘at (the) most; at best’, èrhuà méishuō ‘without another word’, liǎobùdé ‘wonderful; terrific’, méicuòr ‘can’t go wrong’, méishìr ‘have nothing to do / it doesn’t matter’, nǐ kàn, ‘You look!’, shuō qǐlái ‘in fact; as a matter of fact’, shuō shízài de ‘to tell the truth’. As for the frame structures, examples are like ài [A] [A] ‘love [A] not [A]’, bù mán [A] shuō ‘not to conceal [A]’, duì (yú)[A] láishuō ‘for [A]’, gēn [A] yībān jiànshì ‘(lower oneself to) the same level as sb.; stoop to [A]’s level’, [A] liǎo zhī ‘one [A] end’. The discussion about conventionalized expressions are summarized in Table 4.2.

Table 4.2 Summary of the study of conventionalized expressions

Based on the analysis and the identified conventionalized expressions in this study, a conventionalized expression is defined as: a phrase with a specific pragmatic meaning that can be used as a whole in communication, but has not yet as fixed as set phrases and institutionalized expressions.

4.3.4 Parentheses

The parenthesis has drawn attention since the 1930s. It is an additional component of speech and writing. As a common phenomenon in pragmatics, its presence or not does not affect the integrity of sentences and the semantic truth value. The parenthesis is also called chāyǔ ‘inserted element’ (L. Wang, 1944), dúlìyǔ ‘independent language’ (F. Xing, 1991), chuānchāyǔ ‘plug in element’ (W. Chen, 1978), and so on. Although these studies mentioned parentheses, they did not systematically study their forms, meanings, and functions in depth.

With the growing attention of studies on Chinese parentheses, researchers realize that parentheses have special pragmatic functions, including the modality function, the communication function, and the textual function (Bai, 2008). H. Wang (2007) pointed out that the role of parentheses is to complement the meaning of a sentence such as the speaker’s attitude towards speaking and attracting the hearer’s attention. Borong Huang and Liao (2007) pointed out that parentheses can show the emotional attitude of the speaker and strengthen the tone of speech; parentheses can be interpreted and supplemented, speculated and estimated, etc. The mechanism of the pragmatic function formation of parentheses is “subjectivization”. Si (2009) examined the parentheses of the “speaking” type. She found that as parentheses, they do not have the original meaning of speaking and have become subjective. Wei (2010) held that the use of parentheses is a linguistic phenomenon solidified by long-term pragmatic inference and they are subjective emotional expression of speakers.

Since the independent status from the perspective of pragmatic functions is established, the form and meaning of parentheses are paid more attention to. H. Wang (2007) believed that the form of parentheses is fixed and the grammatical meaning is definite. Bai (2008) pointed out that the fixed structure and semantic integrity are important features of parentheses. Si (2009) classified parentheses as fixed phrases (idioms), because their structures are fixed, but can not express meanings independently and can only be used in sentences. Qiu (2010) discussed the characteristics of parentheses from the aspects of form and meaning. He held that in terms of stylistic features, a parenthesis usually has a solidification or semi-solidification form; in the aspect of semantic features, the meaning of a parenthesis is not the direct addition of its constituents’ meaning, but a solidified integral meaning. Parentheses are a semi-open class. Y. Li (2006) explicitly defined a parenthesis as a chunk that does not serve as a syntactic component, which is semantically vague and has a solidified structure and a special pragmatic function”.

Generally speaking, there has been a general consensus that the structure of a parenthesis has the characteristics of solidification or semi-solidification, which has integral meaning and its own unique features in pragmatics (such as subjectivity and non-independence). These properties of parentheses conform to the definition of MWEs and some studies have directly classified them into MWEs (Y. Li, 2006). Therefore, they are included as a subclass of MWEs in this study.

The parenthesis is an important object of vocabulary teaching in CSL. The research of Chinese teaching has investigated the problem of parenthesis teaching and put forward corresponding teaching strategies. Studies have found that parentheses are a major difficulty in learning Chinese, especially those that cannot be understood literally (Bai, 2008). However, the teaching contents of parentheses in textbooks are too simple, which cannot help students use them in communication (N. Xu, 2017). The absence, misrepresentation, and generalization of parentheses are the types of errors frequently made by learners (Jiang, 2010; Y. Li, 2006; Z. Xu, 2006; J. Zhang, 2009). Due to the characteristics of parentheses and problems in teaching and learning, researchers suggest that attention should be paid not only to contextual teaching, but also to integral teaching of them. That is, paying attention to the integral input so as to enable students to remember parentheses as a whole and use them as a whole (N. Xu, 2017).

4.3.5 High Frequency Collocations

A high frequency collocation is a phrase composed of two or more words with high co-occurrence frequency, which is relatively free compared to the fixedness of idioms, polite formulas and conventionalized expressions.

Many studies have taken high frequency collocations as a category of MWEs. For example, the chunk system of H. Wang (2007) covered high frequency collocations, which is reflected by “high frequency in semantic co-occurrence” and “high frequency of grammatical meaning”. “High frequency of semantic co-occurrence” is due to the semantic co-occurrence of words in a collocation, which mutually attracts and foresees each other. “High frequency of grammatical meaning” mainly refers to the grammatical format with intrinsic grammatical relations, e.g. zài +chùsuǒcí ‘in/at + a place word’; shì……de ‘is……’; fēi……bùkě ‘must; have to’. J. Zhou (2007) took phrasal collocations as an important chunk type, which can be understood from the perspective of associability. For example, yīyuàn xià le bìngwēi tōngzhī, lǎo Wáng shèn gōngnéng yǐjīng…… ‘The hospital has issued a critically ill notice; Old Liu has renal function……’ In this case, the follow-up word is most likely to be shuāijié ‘failure’; thus the collocation gōngnéng-shuāijié ‘function-failure’ can be seen as a chunk. Qian (2008) discussed three types of combination of lexical chunks, which are free combinations, limited combinations, and solidification combinations. Solidification combinations refer to set phrases and institutionalized expressions, such as mù bù zhuǎnjīng ‘(lit. look with fixed gaze) regard with rapt attention’, wéilì shìtú ‘seek nothing but profits’, xiǎocài yīdié ‘(lit. a small dish) a piece of cake’; they are not in the scope of high frequency collocations. In contrast, limited combinations and free combinations are high frequency collocations. Regarding the limited combinations, the main basis for judging the degree of limitation includes the number of words that can be matched with the node words and the directionality of choice (unidirectional and bidirectional). Qian (2008)’s analysis about limited combination provides a theoretical basis to judge them, but it lacks the discussion about the nature of free combinations. W. Wang (2013) basically followed Qian (2008)’s analysis, but she especially emphasized the factor of “frequency”. She suggested that the frequency standard can be adopted for judging limited combinations. As for free collocations, most of them belong to common phrases; only those which meet a certain frequency standard have the possibility to be regarded as chunks. Xue and Shi (2013) discussed collocational chunks of the fixed selection type (dìng xuǎn shì) and the pairing up type (pèiwǔ shì), which are equivalent to limited combinations and free combinations of Qian (2008). The difference between these two types mainly depends on the degree of fixedness. The chunks of the fixed selection types usually consist of two parts. They tend to be the fixed and orderly collocations and their functions are similar to fixed phrases, such as duānzhèng tàidù ‘correct one’s attitude’, duànliàn shēntǐ ‘build up one’s body’, and tiānzī cōngyǐng ‘intelligent by natural endowments’. As for the chunks of the pairing up type, although the two parts in one such chunk co-occur, the certainty of the collocation is weaker than that of the fixed type. Examples are like tiǎoqǐ – jiūfēn / huòduān / máodùn ‘provoke – dispute / the source of a disaster / contradiction’, chuàngzào - tiáojiàn / jīhuì ‘create – a condition / an opportunity’, mǎimài - zìyóu / gōngpíng ‘trade - freedom / fair’, jīngjì / mínshì - jiūfēn ‘economic / civil - dispute’, and jiěchú - hòugù zhīyōu / jǐngbào ‘remove - fear of disturbance in the rear / alarm’. Besides, to a certain extent, the words in a pairing up type can have various ways of combination, such as máodùn shì yóu duìfāng tiǎoqǐ de ‘the contradiction is provoked by the other side’, tiǎoqǐ le yī gè xīn de huòduān ‘provoke a new source of the disaster’, gōngpíng mǎimài ‘fair trade’, and jǐngbào jiěchú ‘the alarm is removed’.

To sum up, the main features of high frequency collocations as a category of MWEs are reflected in the following aspects. (1) Semantic relevance, restriction and associability. Collocations such as gǎn xìngqù ‘be interested in’, gōngnéng shuāijié ‘functional failure’ and zhēng yǎnjīng ‘open one’s eyes’ have semantic relevance and restriction among their components. The semantic meaning of the morpheme zhēng ‘open’ itself has already indicated that its collocational object is yǎnjīng ‘eyes’.

(2) The characteristics caused by high frequency co-occurrence of the words in a collocation. The frequent combination of two words makes each other predictable and thus becomes an MWE, such as chéngjiù shìyè ‘achieve a career’, chǎngkāi xīnfēi ‘open oneself up’, dǎ tàijíquán ‘play taiji’, and fúlì dàiyù ‘welfare and treatment’. What should be noticed is that the “high frequency co-occurrence” (gāopín gòngxiàn) is different from “stable reproduction” (wěndìng fùxiàn) in conventionalized expressions. “High frequency co-occurrence” indicates that the components in a collocation have high absolute frequency within a certain range, while “stable reproduction” means that if the component A appears, then the component B is very likely to appear too, but the combination of these two components does not necessarily have high co-occurrence frequency.

High frequency collocations have a variety of internal members. First, they can express a wide range of concepts. For example, there are noun phrases which express general concepts, such as huācǎo shùmù ‘(lit. flowers, grass, trees and woods) plants and trees’, fúlì dàiyù ‘welfare and treatment’, and xiàrì lièshǔ ‘a scorching summer’. There are also noun phrases related with some professional fields, e.g. hànyǔ cèyàn ‘Chinese test’, huánjìng bǎohù ‘environmental protection’, nóngyào cányú ‘pesticide residue’, and rénkǒu liúdòng ‘floating population’.

Second, there are collocations with high degree of restrictions, such as bǐng zhù hūxī ‘hold one’s breath’, chěng wēifēng ‘behave in an aggressively arrogant way’, chū luànzi ‘cause trouble’, dù mìyuè ‘have honeymoon’, dòng gǎnqíng ‘be carried away by one’s emotions’, gǎn xìngqù ‘be interested in’, gǔ qǐ yǒngqì ‘pluck up one’s courage’, and hào miànzi ‘be concerned about face-saving’. There are also collocations with low degree of restrictions, such as kàn shū ‘read books’, kàn xiǎoshuō ‘reading novels’, gōngzuò fánmáng ‘work busy’, hùxiāng bāngzhù ‘help each other’, and jiāoliú xìnxī ‘exchange information’. Third, they have a wide range of grammatical structures, such as the subject-predicate structure (e.g. yīshān zhěngjié ‘clothes tidy and clean’, zhǎngshēng rèliè ‘applause warm’), verb-object structure (e.g. shìyìng huánjìng ‘adapt to the environment’, tígāo xiūyǎng ‘improve cultivation’, yánjiàn bīnkè ‘meet guests’), joint structure (e.g. xiōngdì jiěmèi ‘brothers and sisters’, xuànlì duōcǎi ‘bright and colorful’), and modifier-head structure (e.g. nóngyù qìxī ‘strong flavor’, qǐmǎ yīnggāi ‘should at least’).

4.3.6 Frame Structures

The frame structure is an important category of MWEs. Although different terms are used to refer to it, most of the systems of Chinese MWEs contain it (Y. Liu, 2004; Qi, 2008; W. Wang, 2013; Wu et al., 2009; Xue & Shi, 2013; J. Zhou, 2007; J. Zhou, 2009). Frame structures in these studies can be grouped into three levels. (1) The phrase level. Their forms are like A shàng A xià ‘A up A down’ (e.g. tiào shàng tiào xià ‘jump up and down’, pá shàng pá xià ‘climb up and down’), bù A yěbà (e.g. bù tí yěbà, bù xiě yěbà), …nián rú yī rì [e.g. shí nián rú yī rì ‘(lit. ten years as one day) maintain a habit or work attitude for a long time’], which are all formed with frames. (2) The word level. For example, Qi (2008) regarded the word-building affixes [e.g. jiā ‘a person engaged in a certain trade’, zhě ‘one or those who; the thing or things which; -er’, lǎo ‘(a prefix)’] as chunks. (3) The sentence level. Some research put conjunctions that are used in the sentence into this level (W. Wang, 2013; Xue & Shi, 2013; J. Zhou, 2007; J. Zhou, 2009), such as wúlùn……dōu…… ‘no matter how…’, yīnwèi……suǒyǐ…… ‘because…so...’, and bùjǐn…érqiě… ‘not only…but also…’. The parentheses (the inserted elements) in a sentence are also put to this level, such as zǒng’éryánzhī ‘generally speaking’, nándào shuō + a clause ‘Could it be that + a clause’, hěn nánshuō + a clause “it is hard to say that + a clause’ (Y. Liu, 2004; Qi, 2008; Xue & Shi, 2013).

Regarding the research mentioned above, this study holds that frame structures of the phrase level are MWEs; frame structures of the word level are not MWEs; some frame structures of the sentence level are MWEs, while others are not. The reasons are as follows. First, as for word level frame structures, affixes which can form words are open. It means that there can be numerous words that are derived from the same affix, and they are lack of the character of integrity for storage and extraction. For a word which is derived from an affix and has become a fixed form, it is better to put it into the range of words.

Second, as for the sentence level frame structures, this study holds that parentheses are MWEs, while conjunctions are not. Because the frames that conjunctions form are only related with logic and do not have close connection with a specific meaning. What’ more, a conjunction and the filling content can form infinite number of sentences. Therefore, it has little value to take them as MWEs especially in the field of MWE teaching. Instead, it is better to place conjunctions in the teaching of complex sentences.

It needs to be noted that although this study treats parentheses as MWEs, they are considered as an independent category, rather than a subclass of frame structures. This is because they do not form “frames” to be filled by themselves. They have a loose link with the rest of the sentences and they feature in strong stability and integral meaning.

Based on the summary of the existing studies and the analysis to the textbook data, this study finds two major types of frame structures. (1) Four-character structure, e.g. AB jiāojiā (chóu hèn jiāojiā ‘worry accompanied with hate’, léidiàn jiāojiā ‘lightning accompanied with thunder’), A lái A qù (fēilái fēiqù ‘fly round and round’, yóu lái yóu qù ‘swim around’), A lái B wǎng (chēlái chēwǎng ‘cars coming and going’, dōnglái xīwǎng ‘coming and going’, rénlái rénwǎng ‘people coming and going’), AB yǔfǒu ‘AB or not’(chéngrèn yǔfǒu ‘admit or not’, zhēnchéng yǔfǒu ‘sincere or not’), háobù AB (háobù kuāzhāng ‘no exaggeration’, háobù lìnsè ‘be unstintingly generous’, háobù yóuyù ‘without hesitation’). (2) Phrasal frames, e.g. ……yīdài (dōngnán yánhǎi yīdài ‘the southeast coastal area’), bāokuò……zàinèi ‘including…’ (bāokuò hǎiwài liúxuéshēng zàinèi ‘including overseas students’), chúle……yǐwài ‘except for…’ (chúle bùnéng dài zǒu de yǐwài ‘except for the things that can not be taken way’), and dāng……de shíhòu ‘when…’ (dāng tiānhēi de shíhòu ‘when the dark comes’).

4.3.7 Classifier Phrases

The classifier phrase is a unique category in MWEs’ classification system of this study, which most relevant studies do not cover. The main basis of regarding classifier phrases as MWEs originates from the selection relation between the classifier and the noun or verb, which is also a core issue in the study of Chinese classifiers. It is basically agreed that the usage and the choice of classifiers are determined by nouns (S. Zhou, 2006). In the 1980s, under the influence of descriptivism, the classifiers’ collocation list and dictionaries emerged, i.e. the appendix A Matching Table of Chinese Nouns and Classifiers in Modern Chinese Eight Hundred Words (S. Lv, 1980), A Handbook of Modern Chinese Classifiers (Guo, 1987), and A Dictionary of Collocations of Nouns and Classifiers in Modern Chinese (X. Liu & Deng, 1989). Moreover, the relationship between nouns and classifiers is considered to be conventional. Zhu (1982) pointed out that though nouns sometimes have a certain kind of connection with their corresponding classifiers in terms of meanings, it only exist in a few cases. In general, what classifiers are used to modify what nouns are established through long social practice. Since 1990s, researchers have been working on explaining the choice between classifiers and nouns or verbs, including semantic analysis (Shao, 1996), cognitive analysis (Shi, 2001), and historical development analysis (Jin & Chen, 2002). Regardless of the stage that the study of classifiers has reached and the research methods that are used, the fact that classifiers have restrictions on their selection with nouns and verbs are the basic consensus.

In TCSL, the teaching of classifiers is a difficult point. Considering the conventional selection restriction between classifiers and modified nouns or verbs, taking the classifier phrase as a whole is an important. H. Wang (2007) pointed out that Chinese classifiers are not only in large quantity, but also complicated in terms of usage. Some rules about the use of classifiers are flexible and some are fixed. Though the theoretical basis for some fixed uses can be found, there is no need to spend a great deal of time in explaining them in detail during the process of TCSL. In fact, considering classifier phrases as chunks for learning and memorizing is enough, such as yī chǎng bǐsài ‘a match’, yī mén kè ‘a lesson’, and yī tiáo mǎlù ‘a road’. This kind of teaching idea is exactly the same with the idea of teaching other MWEs.

4.3.8 Summary of the Characteristics of Different Categories of MWEs

Combining the research on various categories of MWEs and the MWEs identified in this study, the characteristics of MWEs are summarized from five aspects, namely, forms, grammatical functions, semantics, pragmatics and sources, as shown in Table 4.3.

Table 4.3 Characteristics of MWEs

The analysis of the characteristics of each category of MWEs can not only help us deeply understand them, but also guide the classification and extraction of MWEs. On the one hand, the member of each category of MWEs have the feature of prototypicality. Just as MWEs themselves have different prototypicality, showing different “chunking degree” (Wu et al., 2009), MWEs from the same category also have the distinction between being typical and atypical, which means that there is a grey area between various members of each category of MWEs. On the other hand, MWEs’ prototypicality requires that the comparative classification method should be adopted to classify a unit into a category of MWEs. When it is difficult to classify some MWEs, they will be compared with their possible categories of MWEs. For example, when judging whether a unit is a set phrase, we should pay attention to its distinction with institutionalized expressions, conventionalized expressions, and high frequency collocations. Compared with institutionalized expressions, conventionalized expressions, set phrases are mainly composed of four characters, reflecting the feature of classic ancient Chinese, strong written language style, and the complex and profound meaning. Compared with high frequency collocations, set phrases are often not simply the literal meaning of different morphemes and their components are fixed and unchangable. Another example is when judging whether a unit is an institutionalized expression, we should focus on its distinction with conventionalized expressions. Compared with conventionalized expressions, institutionalized expressions usually have metaphorical meanings or extended meanings, and many of them have a verb-object structure.

4.4 Categories of MWEs in the Textbooks

4.4.1 Distribution of Different Categories of MWEs in Four Sets of Textbooks

This section conducts an analysis according to the categories of MWEs in four sets of textbooks. The total number of each categories, percentage, types, and token/type ratio are shown in Table 4.4.

Table 4.4 Distribution of different categories of MWEs in four sets of textbooks

According to the number of MWEs, the nine categories of MWEs can be divided into three echelons. The first echelon contains high frequency collocations, set phrases and classifier phrases. The total number of high frequency collocations reaches 1460, accounting for 27.07% of the total number of MWEs, and they are the MWEs with the largest quantity. The total number of set phrases and classifier phrases are 1154 and 957 respectively, accounting for 21.40% and 17.75%. The three categories of MWEs take up 66.22% of the total number of MWEs, which are the main part of MWEs. The three categories of MWE have a lower token/type ratio, indicating low recurrences.

The second echelon contains conventionalized expressions, frame structures, parentheses, and polite formulas. The number of these four categories of MWES is between 350 and 500, accounting for 6.66–8.55%. They take up 29.58% of all MWEs. The token/type ratios of the four categories of MWEs are high, especially the ratio of polite formulas is as high as 5.13, indicating that they have more recurrences.

The third echelon contains institutionalized expressions and other types of idioms, whose total number is 131 and 96 respectively, accounting for 2.43% and 1.78%. The total number of the two categories of MWEs takes up 4.21% of all the MWEs, making them “minorities” in MWEs. Their token/type ratios are also low, indicating low recurrences. In general, the characteristics of the first echelon are “large quantity and low recurrences”. The characteristics of the second echelon are “the quantity is in the middle and have relatively high recurrences”. The characteristics of the third echelon are “small quantity and low recurrences”.

The total number of MWEs, MWEs’ density and the ratio of token/type can only reflect the general situation. In order to observe the distribution of MWEs in each set of textbooks, we need to investigate from the perspective of MWEs’ categories. The distribution of different categories of MWEs in each set of textbooks is shown in Table 4.5.

Table 4.5 Category distribution of MWEs in each set of textbooks

As shown in Table 4.5, the four sets of textbooks are mainly composed of idioms, high frequency collocations, and classifier phrases. These three categories account for 66.22%. Among them, the number of high frequency collocations is the largest, accounting for 27.07%. The MWE category distribution of Boya Chinese and Developing Chinese is closer to the overall category distribution of all MWEs, because their top three categories are the main categories of all MWEs (first echelon)—high frequency collocations, idioms, the classifier phrases. However, the number of set phrases in Boya Chinese and Developing Chinese is higher than that of high frequency collocations, which is inconsistent with the overall distribution of MWEs. In contrast, the distribution of high frequency collocations in New Practical Chinese Reader and Chinese Made Easy accords with MWEs’ overall distribution.

4.4.2 MWEs’ Categories in Different Levels of Textbooks

Table 4.6 illustrates the percentage of different categories of MWEs in different levels or volumes. The top three categories in each levels or volumes are high frequency collocations, classifier phrases and idioms. (i) the category of high frequency collocations is the most common in these books: the intermediate level of Developing Chinese, volumes 2, 3, 4 of New Practical Chinese Reader, and volumes 3, 4, 5, 6 of Chinese Made Easy. (ii) the category of classifier phrases is the most common in these books: the elementary level and quasi-intermediate level of Boya Chinese, the elementary level of Developing Chinese, volume 5 of New Practical Chinese Reader, and volumes 1, 2 of Chinese Made Easy. (iii) the category of set phrases is the most common in these books: the intermediate level and the advance level of Boya Chinese, the advanced level of Developing Chinese, and volume 6 of New Practical Chinese Reader. (iv) Different from other books, the most common category in volume 1 of New Practical Chinese Reader is polite formulas.

Table 4.6 The number and percentage of different categories of MWEs in different levels or volumes

4.4.3 High Frequency MWEs in Textbooks

This section conducts further analysis toward the MWEs with high frequency. There are 50 MWEs which show up 10 times or above. Their categories and frequency, as well as the number of sets of textbooks and the number of volumes that each MWE occur can be seen in Table 4.7.

Table 4.7 The distribution of the high frequency MWEs in textbooks

There are several characteristics about the distribution of the high frequency MWEs. (1) With the decrease of frequency, the scope of MWEs’ distribution among textbooks shows a decreasing tendency. Among the high frequency MWEs, the frame structure yuèláiyuè…… ‘more and more…’ has the highest frequency. It appears in each of the four sets of textbooks and is distributed among these 20 volumes. Except it, the number of MWEs which are distributed in all the volumes shows a decreasing tendency. MWEs with a frequency of 15 or below are distributed in no more than 9 volumes, which means that the scope of MWEs’ distribution is decreasing. (2) Some MWEs are intensively distributed at some levels. For example, the polite formula nǐhǎo ‘hello’ is distributed in the elementary-level volumes of different sets of textbooks, such as the elementary level (I) and quasi-intermediate level (I) of Boya Chinese, Elementary Comprehensive Course (I) (II) of Developing Chinese, volumes 1–3 of Chinese Made Easy and elementary level (I) of New Practical Chinese Reader. It can be seen that nǐhǎo ‘hello’ only shows up at the elementary stage. Obviously, the subsequent learning does not need to focus on this most basic greeting. (3) Some MWEs are intensively distributed in some textbooks. For example, yǒu yītiān ‘one day’, yuè……yuè…… ‘the more (comparative adjective) the more (comparative adjective)’ and gè ( ) gè ( ) ‘every ( ) every ( ) ’ are mainly distributed in Boya Chinese and Developing Chinese; shíjì shang ‘in fact’ is mainly used in Boya Chinese; yě jiùshì shuō ‘that is to say’ is mainly used in Developing Chinese; chúcǐ zhīwài ‘other than this’ is mainly used in Chinese Made Easy. (4) Some MWEs have highly concentrated distribution, which is restricted by the teaching materials, as shown in Table 4.8. Both yī zhǎn dēng ‘a lamp’ and bǐfāng shuō ‘for example’ only appear in one book of Boya Chinese and one book of Developing Chinese. Among them, yī zhǎn dēng ‘a lamp’ appears as high as 15 times just in Intermediate level (II) of Boya Chinese and bǐfāng shuō ‘for example’ appears as high as 9 times in Intermediate level (I) of Boya Chinese. Similarly, guò shēngrì ‘celebrate the birthday’ appears in three books of two sets of textbooks. The above situation shows that due to the influence of the selected content of the textbooks, some MWEs have very concentrated distribution.

Table 4.8 Examples of textbooks that MWEs appear

(5) Many of the high frequency MWEs are classifier phrases. In Table 4.7, 16 out of the 50 MWEs are classifier phrases (shown in Table 4.9) accounting for 32%, which indicates that classifier phrases have an advantage of recurrences in the all the textbooks. However, the distribution of these classifier phrases in the textbooks is relatively concentrated. Except that yī jù huà ‘a sentence’ is distributed in 13 volumes, all others are distributed in less than 10 volumes. yī zhǎn dēng ‘a lamp’ even appear in only 2 volumes. yī jié kè ‘a lesson’ and yī zhī gǒu ‘a dog’ appear only in 4 volumes respectively.

Table 4.9 16 classifier phrases (chosen from Table 4.7)

4.4.4 Enlightenments to Teaching from the Distribution of MWEs

The distribution of MWEs has the following enlightenments to teaching. (1) The main categories of MWEs are high frequency collocations, set phrases and classifier phrases, which require sufficient presentation of them in textbooks. In teaching, students can be consciously encouraged to expand their learning of them in order to improve their Chinese language proficiency. (2) MWEs need to be studied step by step. The proportion of MWEs whose categories are polite formulas and classifier phrases generally decreases from the elementary level to the advanced level in each set of textbooks, while the proportion of MWEs whose categories are high frequency collocations and set phrases increases gradually. The proportion of other categories of MWEs doesn’t show significant trend of rise or fall. Therefore, we can see that the elementary level textbooks focus more on the study of polite formulas and classifier phrases; with the improvement of proficiency level, the focus of study should gradually shift to high frequency collocations and set phrases. (3) The distribution of MWEs with high frequency in textbooks has the characteristics of intensiveness. Such concentrated distribution is greatly affected by the content of certain lesson, which is likely to make some MWEs appear collectively in the same lessons. This issue requires textbook editors to pay attention to the adjustment toward the distribution of MWEs and their repetition, so as to ensure learners to have a sufficient and balanced input.