Recent studies have repeatedly investigated the influence of semantic transparency on language processing (e.g., McCormick, Rastle, & Davis, 2009; Rastle, Davis, Marslen-Wilson, & Tyler, 2000), and on the lexical representation of complex words, in particular. If the meaning of a complex word like understand is semantically opaque and cannot be composed from the meaning of its parts, will it be lexically represented via its base stand or via the whole form? Is the lexical representation of a complex verb with respect to its base different from that of two verbs with different bases? We refer to the former relation—between verbs with the same base—as semantic transparency, and to the latter—the relation between verbs with different bases—as semantic relatedness.

So far, lexicon-based theories on derivations agree that the lexical processing and representation in Indo-European languages depends on semantic transparency (e.g., Diependaele, Sandra, & Grainger, 2005; Feldman & Soltano, 1999; Feldman, Soltano, Pastizzo, & Francis, 2004; Longtin, Segui, & Hallé, 2003; Marslen-Wilson, Bozic, & Randall, 2008; Marslen-Wilson, Tyler, Waksler, & Older, 1994; Meunier & Longtin, 2007; Rastle et al., 2000; Taft & Nguyen-Hoan, 2010). These models assume a fundamental difference in the storage and processing of morphologically related words that are semantically transparent (i.e., whose meaning can be derived from the meanings of their parts), like success-ful and re-fill, and opaque words whose meaning cannot be derived from the meanings of their parts, like success-or and re-hearse. Under experimental conditions that tap into lexical processing and representation (i.e., cross-modal priming or visual priming at long exposure durations of 230–250 ms; see Smolka, Preller, & Eulitz, 2014, for a review), abundant findings in English and French have shown that semantically transparent words like successful and refill facilitate the recognition of their base (success, fill), whereas semantically opaque words like successor and rehearse do not. These findings have been taken to indicate that the former possess a lexical entry that corresponds to the base. A word like successful will thus be represented as the base success and the suffix -ful. By contrast, semantically opaque words like successor and rehearse, whose meanings cannot be derived from the meanings of the parts, must be represented in their full form (e.g., Diependaele et al., 2005; Marslen-Wilson et al., 2008; Meunier & Longtin, 2007; Rastle et al., 2000; Taft & Nguyen-Hoan, 2010).

Although on different grounds, distributed connectionist approaches also assume that the magnitude of the facilitation between a complex word and its base depends on their meaning relation (e.g., Gonnerman, Seidenberg, & Andersen, 2007). Graded effects—that is, strong facilitation effects by strongly phonologically and semantically related word pairs (preheat–heat), intermediate effects by semantically pairs (midstream–stream), no effects by low semantically related word pairs (rehearse–hearse), and inhibition by purely form-related word pairs (coffee–fee)—are interpreted to indicate that word processing depends on the form and meaning overlap between words.

The effects of semantic transparency or semantic relatedness between morphologically related words (e.g., successful–success vs. successor–success) have usually been tested against the priming between purely semantically related words with different bases, such as blackwhite and destiny–fate (see Gonnerman et al., 2007; Kielar & Joanisse, 2011; Marslen-Wilson & Tyler, 1998; Napps, 1989, Exp. 3; Rastle et al., 2000), which are different in many respects from the words used in the morphologically relevant conditions, such as word category and morphological complexity.

Contrary to the findings in English and French, though, recent behavioral and electrophysiological studies in German have shown that German complex verbs prime their base regardless of semantic transparency (Smolka, Libben, & Dressler, 2018; Smolka, Gondan, & Rösler, 2015; Smolka, Komlósi, & Rösler, 2009; Smolka et al., 2014). Semantically opaque verbs like entwerfen (“design”) produced the same amount of priming of their base werfen (“throw”) as did semantically transparent verbs like bewerfen (“throw something at someone”). Moreover, such morphological priming (by verbs sharing the same base) was significantly stronger than the priming by purely semantically related verbs, like wegschmeissen (“throw away”), or by verbs with a purely form-related base, like bewerben (“apply”). We take these findings to indicate that German complex verbs like verstehen (“understand”), unlike the English understand, are lexically processed and represented via their base. What’s more, these findings stress the importance of cross-language comparisons.

Providing valid evidence requires strict control of the stimulus materials for usage-based variables such as lemma frequencies, on the one hand, and for the assessment of semantic transparency or semantic relatedness, on the other hand. Measures of meaning relatedness and semantic transparency are thus important means to define the experimental conditions. Therefore, studies on the lexical representation of complex words need to define the semantic transparency of their complex words or the semantic relatedness of the words being tested. In the present work, we aim to provide a number of useful stimulus measurements for this purpose—among others, the semantic association ratings collected in six tests with 334 participants and their corresponding lexical paraphrases for German verbs and the semantic similarity vectors between verbs, as well as such additional measures as the age of acquisition and age of reading, verb family size, verb regularity, and lexical frequency of these verbs.

German complex verbs

Morphologically complex verbs possess meanings that range from completely transparent to completely opaque with respect to the meaning of their base verb. Interestingly, all German complex verbs are by definition real (i.e., etymological) morphological derivations of their base verb. That is, even completely opaque German complex verbsof the verstehen–stehen type (“understand–stand”) type correspond to the English cryptic–crypt type, which are diachronically related, but the native speaker typically is not aware of this relation.

In German, the word formation of verbs is dominated by prefixation and is very productive and analogous to word formation by suffixation (Eisenberg, 2004). For example, the prefix ent- not only derives verbs from verbs (laufenentlaufen “run–run away”), but also verbs from nouns (Gleisentgleisen, “track–be derailed”).

The standard linguistic literature (Eisenberg, 2004; Fleischer & Barz, 1992; Olsen, 1996) distinguishes two types of word formation in German: prefix verbs and particle verbs. Both consist of a verbal root and either a verbal prefix or a particle. Verbal prefixes are inseparable from the base in finite forms (Sie befindet sich in X “she resides in X”), whereas particles are free morphemes and are separated from the verb base in finite forms (Sie findet sich mit X ab “she accepts X”).Footnote 1 A few particles, including durch, um, unter, and über, may function as either separable particles or inseparable prefixes. For example, umfahren takes two different meanings (“knock down” and “drive around something”) that can be differentiated by stress (UMfahren vs. umFAHREN, respectively); in finite forms, by their separable or inseparable behavior, as in “Sie fährt das Straßenschild um” (“she knocks down the traffic sign”) versus “Sie umfährt das Straßenschild” (“she drives around the traffic sign”); and by their participle formation, as in “Sie hat das Straßenschild umgefahren” (“she knocked down the traffic sign”) versus “Sie hat das Straßenschild umfahren” (“she drove around the traffic sign”). In the present study, we examined prefix and particle verbs and did not include compound verbs that operate via copredication, such as sauber + kehren (“clean” + “sweep,” “sweep clean”) or schwarz + fahren (“black” + “ride,” “dodge the fare”).

Given that German prefix and particle verbs are very productive, a single base verb may yield families of up to 150 prefixed verb derivations, all with different meanings ranging from truly transparent to truly opaque. Prefix and particle verbs are frequently used in standard German and are thus a particularly useful means by which to study the effects of semantic transparency to the same base verb. For example, the particle verbs auffinden (“find, locate”) and abfinden (“compensate, accept”) are morphologically derived from the base finden (“find”). Thus, they both share their (morphological) form with finden (“find”), though only auffinden (“find, locate”) also relates to its meaning.

Both prefix and particle verbs may vary with respect to their semantic transparency to the base. This means that whether or not a particular prefix or particle verb is semantically transparent with respect to its base, and to what degree, is completely arbitrary. For example, the particle an (“at”) only slightly alters the meaning of the base führen (“guide”) in the derivation anführen (“lead”), but it radically alters meaning with respect to the base schicken (“send”) in the opaque derivation anschicken (“get ready”). Similarly, the prefix ver- produces the transparent derivation verschicken (“mail”) as well as the opaque derivation verführen (“seduce”). However, such semantic information is not yet part of such lexical databases as CELEX (Baayen, Piepenbrock, & van Rijn, 1993), dlexDB (Heister et al., 2011), DeWaC (Baroni, Bernardini, Ferraresi, & Zanchetta, 2009), or TIGER (Brants et al., 2004).

Several databases in German provide plentiful distributional variables, such as various frequency measures on word units or sublexical units (Baayen et al., 1993; Geyken, 2009; Heister et al., 2011; Hofmann, Stenneken, Conrad, & Jacobs, 2007; wortschatz.uni-leipzig.de). However, complex verbs are significantly underestimated in lexical databases: Because German is a verb-second language with a subject–object–verb word order (e.g., Haider, 1985), complex verbs in German are decomposed whenever they occur in finite forms (i.e., in all main clauses in the present, preterite, and imperative). In contrast to phrasal verbs in English, for which the particle may or may not be separated by a noun phrase (e.g., “The professor broke the project down into three separate parts, following a long discussion among the participants of the seminar”), in German the particle must appear sentence-final (“Die Professorin teilte das Projekt nach einer langen Diskussion der Seminarteilnehmer in drei Teile auf.”). An almost infinite amount of material may be inserted between the finite verb and its particle, ranging from complex noun phrases to relative clauses. Hence, the particle, which is required to complement the meaning of the whole complex verb, is presented only several words after the base (i.e., the base verb). Until now, such German lexical databases as CELEX (Baayen et al., 1993), dlexDB (Heister et al., 2011), DeWaC (Baroni et al., 2009), and TIGER (Brants et al., 2004) have not counted particle verbs if the particle was separated and occurred at the end of the sentence. Take the examples of the inflected particle verbs in “Sie bringt ein Geschenk mit” (“She brings a present with her/along.”) and “Sie bringt den Käfer um” (“She kills the beetle.”). The verb bringen (“bring”) requires an object, which must be inserted between the finite verb and the particle, so that the particular word combinations bringt mit and bringt um will not occur in syntactically correct German, as the above examples demonstrate. This has the following consequences for databases: CELEX returns the possible inflections bringt um and bringt mit, but counts their occurrence as zero, whereas dlexDB does not return anything when the particle is separated. DeWaC (which has to be parsed and lemmatized by a provided tool) does not differentiate between complex and simple verbs. It indicates the separated particle as a verbal affix, but not to which verb the particle belongs. In TIGER, the separated particle is annotated as such and can be traced to its verb (by means of additional programming). Nevertheless, the verb lemma does not encompass the particle when it is separated, nor does TIGER differentiate between complex and simple verbs once the particle is separated.

Semantic similarity information can be estimated on the basis of distributional semantic models (e.g., Marelli & Baroni, 2015). Vector semantic models such as latent semantic analysis (LSA; Landauer & Dumais, 1997) are now available for German, as well (Günther, Dudschig, & Kaup, 2015). The goodness of fit between these computational models and semantic relatedness norms measured in behavioral studies may be tested by correlating the estimates, such as the cosine similarities between semantic vectors with human ratings. The present study provides measures for such an enterprise.

The main object of this study was to provide measures of semantic transparency and semantic relatedness by human raters. In addition, we present the verb regularity of all verbs, two counts of verb family size for 184 base verbs, as well as age of acquisition and age of reading for a subgroup of 200 verbs. The measures for each verb are supplied in the online supplementary materials. The operational definitions of each of these variables are described in the following paragraphs.

Even though psycholinguistic experiments would profit immensely from information on the semantic transparency of complex verbs or the semantic relatedness between verbs, lexical databases usually do not provide such information, since it is very laborious to collect (see the Method section below). We have taken the first step, and hereby provide the information we have previously collected on German complex verbs and their base verbs (i.e., on verb pairs with the same base), as well as on verb pairs with different bases (see Smolka, 2012; Smolka & Eulitz, 2011; Smolka, Khader, Wiese, Zwitserlood, & Rösler, 2013; Smolka et al., 2009; Smolka et al., 2014; Smolka, Tema, & Eulitz, 2010).

Semantic transparency of complex verbs

One measure that we provide in the present study refers to the semantic transparency of the complex verb itself. Is the meaning of the complex verb reflected in the meaning of its base?

Semantic association test

To measure the semantic transparency of a complex verb relative to its base, participants rated in a semantic association test whether the meaning of the complex verb reflected the meaning of the base verb. In particular, we asked how strongly the meaning of the complex verb is related with the meaning of its base. These ratings thus measure the “whole-word transparency” of a complex word, as defined by Marelli and Baroni (2015). For example, how strongly is the meaning of anführen (“guide”) related to führen (“lead”)? Or, how strongly is the meaning of verführen (“seduce”) related to führen?

In the following sections, we merged the data from six different semantic association tests (collected from 334 participants) that tested the meaning relatedness of about 1,185 complex verbs relative to their base verbs; see also Table 1.

Table 1 Examples of possible relations between (complex or simple) verbs and base verbs

Lexical paraphrase

Another way to measure the semantic transparency of a complex word is to refer to its lexical paraphrase (e.g., Hay, 2001): If the lexical paraphrase of the complex verb refers to the base verb, it is considered as being “semantically transparent”; otherwise, it is considered as being “semantically opaque.” Given that we search for the compositional parts of a complex verb, the lexical paraphrases used here measure the “compositional transparency” as defined by Marelli and Baroni (2015). For example, the dictionary definition in the DUDEN (Dudenredaktion, 2009) paraphrases the complex verb anführen (from führen “guide,” “lead”) as “preceding a group in a leading way.” Because the base “lead” appears in the definition, the complex verb is considered as being transparent. By contrast, since the definition of verführen (“seduce”)—to get someone to do something unwise and wrong against his will—does not refer to the base führen, the complex verb verführen is considered as being opaque.

Semantic relatedness between verbs

The semantic transparency between morphologically related words with the same base, such as departure–depart or department–depart, has usually been measured in comparison to the priming between semantically related words with different bases, mostly using semantic associations between noun–noun pairs such as chairstool or adjective–adjective pairs such as awkwardclumsy (e.g., Rastle et al., 2000). So far, there are no semantic relatedness ratings for verb–verb pairs, at least not in German. This was a further purpose of the present study.

Semantic association test

Another measure we thus collected was the meaning relatedness between two verbs with different bases. In particular, we asked how strongly the meanings of the two verbs are related and examined the meaning relatedness between either a complex and a simple verb, such as zuschnüren (“lace up”) and binden (“bind”), or between two simple verbs, such as schnüren (“lace”) and binden (bind”).

We tested the semantic relatedness between (a) purely semantically related verbs, such as zuschnürenbinden (“lace up–bind”) or schnürenbinden (“lace–bind”); (b) purely form-related verbs, such as abbildenbinden (“picture–bind”) or bildenbinden (“build–bind”); or (c) completely unrelated verbs, such as abholzenbinden (“fell–bind”), or hupenbinden (“honk–bind”); see also Table 1. In this article, we report the semantic relatedness between 775 verb pairs with different bases.

Lexical paraphrase

In addition to the semantic association test, we also applied lexical paraphrases to measure whether two verbs are semantically related with each other: If the lexical paraphrase of one verb referred to the base of the other verb (or vice versa), we considered them as being “semantically related,” or otherwise, as semantically unrelated.

For example, the dictionary definition from the DUDEN (Dudenredaktion, 2009) paraphrases the complex verb anleiten (based on leiten “guide,” “lead”) as “instruct, guide someone with something, lead.” Because the base “lead” appears in the definition, anleiten is considered as being semantically related with the verb führen (“lead”). By contrast, the definition of the complex verb befühlen (based on fühlen “feel something”) as “touching and feeling in a testing way; to stroke something with the fingers, the hand” does not refer to the base “lead”, so the two verbs befühlen and führen are considered as being form-related but not semantically related. We applied the same principle to pairs of simple verbs. For example, beißen (“bite”) is defined as “chew, to grind food with the teeth.” This definition uses the word “chew” so that the verbs beißen and kauen (“chew”) are considered as being semantically related with each other.

Age of acquisition and age of reading

Age of acquisition (AoA) has become an important factor in word recognition. It has been shown to affect lexical processing, such as lexical decision or naming latencies, above and beyond the effects of word frequency (e.g., Brysbaert & Cortese, 2011). Even though Brysbaert and Cortese (2011, p. 558) concluded that “the practical impact of AoA in word recognition, while still important for theoretical approaches, may have been overemphasized in the past,” AoA remains an important factor when selecting the stimulus materials for experiments with children. For example, Behrens (1998) reported a sharp delay in the acquisition of prefix verbs as compared to the acquisition of separable particle verbs in German, but not in English or Dutch. In German, separable particle verbs are acquired early on in development and represent a respectable proportion of children’s overall verb use, whereas the acquisition of prefix verbs is delayed. No such developmental difference between prefix and particle verbs is evident in English or Dutch.

In addition to the differential acquisition of simple, particle, and prefix verbs, AoA may also correlate with the semantic transparency of complex verbs. That is, do children acquire semantically transparent verbs earlier than semantically opaque ones?

In addition to AoA, we defined a new measure that we consider important for experimenting with children by means of written stimulus materials: age of reading (AoR), or the age at which children encounter a word in text. Even though we hypothesized that the two measures AoA and AoR would be correlated, the knowledge that a child knows a certain word may not be sufficient in experiments on visual word recognition, if the child does not know how to read this word or has not encountered the word before in text.

Family size

Family size is known to be an important factor in morphological processing. Words with larger families are processed faster than those with smaller families. However, this effect is assumed to be of morphological as well as semantic nature, since only morphologically related words that are also semantically related contribute to the effect (Moscoso del Prado Martín et al., 2004; Bertram, Baayen, & Schreuder, 2000; Schreuder & Baayen, 1997; for a review of the effects of family size, see Marelli & Baroni, 2015). It is thus important to consider the family size of a verb when dealing with its semantic transparency.

As we have mentioned above, because the word formation for German verbs occurs by means of prefixation, the derivation of prefixed verbs is very productive in German. A single base verb may yield families of up to 150 prefixed verb derivations. For example, the German base stehen (“stand”) has more than 100 prefixed derivations—many more than the same base stand in English, which possesses the prefixed derivations understand and withstand, as well as about 20 phrasal verbs (cf. McCarthy-Morrogh, 2006). In fact, there are hardly any base verbs that do not have a family of derived prefix or particle verbs. Depending on the base verb, the number of complex verb derivations ranges from relatively small to very large. For example, the base verb herrschen (“govern”), with its three family members beherrschen (“rule”), anherrschen (“bark at someone”), and vorherrschen (“predominate”), possesses a relatively small family size. By contrast, the base verb gehen (“go”) has the considerable family size of 141 complex verb derivations. Between these two extremes, many verbs have a medium family size, such as the base verb kehren (“sweep”) with its 19 family members: abkehren (“sweep off,” “turn away”), aufkehren (“sweep and collect on a shovel”), auskehren (“sweep”), bekehren (“convert”), einkehren (“stop for a bite to eat”), heimkehren (“return home”), hervorkehren (“disclose”), umkehren (“turn around”), verkehren (“consort”), vorkehren (“precaution”), wegkehren (“sweep away”), wiederkehren (“recur”), zukehren (“turn to”), zusammenkehren (“sweep”), herauskehren (“display”), rückkehren (“turn back”), überkehren (“turn upside down”), zurückkehren (“return”), and sauberkehren (“sweep clean”).

To obtain an exhaustive count of all possible verb derivations, we included all possible prefixed variants of a base verb, including prefixes, inseparable particles, separable particles, and copredicates (e.g., sauber + kehren, “sweep clean”). Since the data for the present study comprise verbs only, the count of the verb family size includes only the word category of verbs. In this respect, our count of verb family size differs from the original count of family size, which includes all word categories (e.g., De Jong, Schreuder, & Baayen, 2000).

Verb regularity

Past tense and participle formation in several languages has often been used to investigate the organization of lexical representations. Similar to the assumption that the semantic transparency of a complex word determines whether it is represented via its base or the whole word, inflectional regularity has often been postulated to determine lexical representation. For example, dual-mechanism models (e.g., Clahsen, 1999; Marcus, Brinkmann, Clahsen, Wiese, & Pinker, 1995; Pinker & Ullman, 2002; Prasada & Pinker, 1993) assume that regularly inflected words such as walked are represented via their base walk, whereas irregularly inflected words such as thrown are represented as whole words. These accounts further assume that all regular inflections share one lexical entry, so that the priming by regularly inflected forms like walkedwalk takes place via the base in a lexical network. By contrast, irregularly inflected forms are assumed to have their own representations in associative memory, so that the priming between such irregular forms as thrownthrow occurs due to spreading activation, resembling that by semantically associated entries.

However, priming studies in Italian (Orsolini & Marslen-Wilson, 1997), French (Meunier & Marslen-Wilson, 2004), and German (Smolka et al., 2013; Smolka et al., 2018; Smolka, Zwitserlood, & Rösler, 2007) assume similar storage and processing of regularly and irregularly inflected verbs, due to the similar priming effects of regular and irregular verbs. For example, Smolka and colleagues (Smolka, 2005; Smolka et al., 2013; Smolka & Eulitz, 2015; Smolka et al., 2007) have demonstrated that both German regular and irregular verbs are lexically represented via their base. A similar proposal has been formulated by Marantz and colleagues (Embick & Marantz, 2005; Fruchter, Stockall, & Marantz, 2013; Stockall & Marantz, 2006), assuming that all (English) verb forms, regular and irregular alike, are parsed on the basis of stochastic, phonologically driven rules.Footnote 2 Regardless of the different assumptions concerning the types of storage, there is recent evidence that regular and irregular verbs differ with respect to their semantic richness (Baayen, Feldman, & Schreuder, 2006; Baayen & Moscoso del Prado Martín, 2005; Basnight-Brown, Chen, Hua, Kostić, & Feldman, 2007; Davis, Meunier, & Marslen-Wilson, 2004; Feldman, Kostić, Basnight-Brown, Đurđević, & Pastizzo, 2010; Ramscar, 2002). For example, Baayen and Moscoso del Prado Martín examined the attributes of about 1,500 regular and 150 irregular English monomorphemic verbs and found that regular verbs tend to have fewer and weaker semantic interconnections with other words than irregular forms do.

Since semantic properties such as semantic richness have been shown to influence word recognition, the verb regularity variable may be an important covariate of semantic verb properties, and is thus provided for all verbs in the present database.

Method

Materials

One-hundred eighty-four base verbs were selected from the CELEX German lexical database (Baayen et al., 1993). According to CELEX, 168 were monomorphemic, five were complex, and 11 were conversions (e.g., bürsten, “brush” from the noun Bürste, “brush”; spielen, “play” from the noun Spiel). Even though the CELEX classifications by and large converge with those of the online dictionary www.canoo.net, the latter is more precise in that it classifies eight of the monomorphemic verbs in CELEX as conversions (e.g., handeln “trade” from the noun Handel “trade,” or kämpfen “fight” from the noun Kampf “fight”). None of the base verbs had a prefix. The supplementary materials provide the morphological complexity definitions of the base verbs in CELEX and canoo.net.

Each base verb was combined with at least two, and up to eight, other verbs. These could be simple or complex (i.e., prefix or particle) verbs that differed in their relation with the base verb. Specifically, these verb relations were defined by the factors of morphological, semantic, and form relatedness with the base verb. Morphologically related verbs have the same stem as the base and are defined by their semantic transparency—that is, as being transparent or opaque with respect to their base (note that morphologically related verbs are always also form-related, because they have the same stem). Transparent derivations were synonyms or associates of the base, such as zubindenbinden (“tie up–bind”). Synonyms were selected by means of the online synonym dictionaries www.canoo.net and http://synonyme.woxikon.de. Opaque derivations were defined as not being related with the meaning of the base, as when the meanings of the complex verb and its stem refer to different semantic fields, such as umkommenkommen (“die–come”), and when the verbs or verb bases do not co-occur in collocations or compounds.

Morphologically unrelated verbs have different stems and were defined by their semantic or form relatedness with another base verb. Semantically related (S) verbs were synonyms or associates, such as zuschnürenbinden (“lace up–bind”) or schnürenbinden (“lace–bind”); synonyms were selected by means of the online synonym dictionaries www.canoo.net and http://synonyme.woxikon.de. Form-related (F) verbs were not semantically related but possessed the same onset or first syllable and differed in the rime by a single grapheme (one or two letters) or phoneme, such as abbildenbinden (“picture–bind”) or bildenbinden (“build–bind”). Unrelated verbs (U) were neither morphologically nor semantically nor form related, such as abholzenbinden (“fell–bind”) or hupenbinden (“honk–bind”). Table 1 depicts the characteristics of the possible verb relations, together with the examples from the database provided in the Appendix.

Measures relating to the verbs and verb pairs

In the following section, we provide the measures for these 184 base verbs and the corresponding verb of the verb pair. Table 2 provides an overview of the different measures, including (a) the semantic transparency ratings between complex verbs and their bases, (b) the semantic relatedness ratings between verb pairs having different bases, (c) lexical paraphrases referring to semantic transparency, (d) lexical paraphrases referring to semantic relatedness, (e) similarity computations LSA and HAL, (f) age of acquisition and age of reading, (g) verb family size, and (h) definitions of verb regularity. We now describe the data acquisition for each of these measures in more detail.

Table 2 Summary of the different measures provided for German verbs

Semantic association test

Participants (raters)

Altogether, 334 German native speakers from different areas in Germany and Austria participated in the tests. Most of them were students or staff at the Philipps University Marburg and at the University of Konstanz, and thus had a higher educational level (both female and male, age range between 20 and 40). To also reach people outside the university, tests were further distributed to acquaintances of the researchers, who were all native speakers of German but whose gender, age, and education were not systematically documented.

Two measures of interrater agreement are provided: first, by the measures describing the ratings of each verb pair (mean, median, standard deviation, minimal and maximal ratings, and number of raters; see the Appx.), the second, by Krippendorff’s alpha (Hayes & Krippendorff, 2007), which indexes the overall agreement among the 334 raters who rated subsets of the verb pairs (see Table 3 and the Results section).

Table 3 Interrater agreement for six association tests (334 raters for subsets of verb pairs covering up to 1,185 verb pairs)

Design and procedure

In total, 1,186 verb pairs were tested, of which 432 were morphologically related; 310 were purely semantically related (S), and of these, 119 were simple and 191 were complex verbs; 224 were form-related (F), of which 38 were simple and 186 complex verbs; and 220 were unrelated (U), of which 136 were simple and 84 were complex verbs. Altogether, in the latter three conditions (S, F, U), 293 of the verbs were simple and 461 were complex. Altogether, there were 893 complex verbs and 293 simple verbs.

Verbs relating to the same base verb were rotated over lists (between two and eight) according to a Latin square design, with only one combination with a base verb in a list. Participants received only one experimental list, and therefore saw each base verb only once. Depending on the test, a single list consisted of between 36 and 132 verb pairs. All verbs were presented in citation form (root + -en). When the critical verb pair comprised a complex and a simple verb, the complex verb always preceded the simple verb. Participants rated the meaning relatedness between the verbs of each pair on a 7-point scale, from completely unrelated (1) to highly related (7). The instructions included two examples, one representing a highly related verb pair, the other representing a pair low in meaning relatedness. Verb pairs were presented in booklets that were distributed in person or via e-mail.

Lexical paraphrase

Two German dictionaries were consulted for the lexical paraphrases—Wahrig (2007) and the Duden Deutsches Universalwörterbuch (Dudenredaktion, 2009)—and were independently cross-checked by two native speakers. Given that the lexical paraphrases of the two dictionaries often differed for a specific verb or verb pair, we provide both paraphrases in the supplementary materials. In addition, we provide a combined paraphrase definition that is semantically transparent (T) or semantically related (S+), if one of the dictionaries provided a T or S+ definition.

Semantic transparency

For 893 complex verbs, we gathered the lexical paraphrases with respect to their own base. Was the meaning of each complex verb defined via its base? The lexical paraphrase was defined as being semantically transparent (T) if the lexical definition of the complex verb referred to its base or to another complex word with the same base. Otherwise, it was defined as being semantically opaque (O); see the supplementary materials. For example, the dictionary entry of the complex verb bewerfenFootnote 3 (“throw at”) repeatedly refers to the verb werfen (“throw”) and is thus considered as being semantically transparent (T). By contrast, the definition of the complex verb entwerfenFootnote 4 (“design”) does not refer to its base werfen, and is thus considered as being semantically opaque (O).

Semantic relatedness

For 754 verb pairs (in the S, F, and U conditions) with different bases, we examined the lexical paraphrases referring to the other verb’s base. Of these verb pairs, 293 consisted of simple verbs, such as kochenbacken (“cook–bake”), and 461 consisted of a complex and a simple verb, such as zuschnürenbinden (“lace up–bind”). Were the meanings of the two verbs related? If the base of one verb was used in the lexical definition of the other, the verb pair was marked as being semantically related (S+). If neither of the verbs was described using the other or the other’s base, the pair was defined as being semantically unrelated (S−). For example, one definition of the complex verb zuschnüren (“lace up one’s shoes”) is “bind together (shoe).” Since it explicitly refers to the verb binden (“bind”), it is considered as being semantically related (S+) to the base binden. Also, the lexical paraphrase of the simple verb schnüren (“lace”) uses the base binden several times in its definitions, and is thus considered as semantically related (S+) to binden. By contrast, the dictionary definition of the complex verb abbilden (“depict”) is to “depict persons or objects.”Footnote 5 Since this definition does not refer to the base “bind,” abbilden is considered as semantically unrelated (S−) and purely form-related to the base.

Age of acquisition (AoA) and age of reading (AoR)

In this study, nine kindergarten and primary school teachers judged the age of acquisition (AoA) of 40 base verbs and 160 complex verbs. In addition, the new measurement age of reading (AoR)—the age at which children encounter a particular verb in a text—was introduced and collected from the same teachers for the same verbs.

Verb family size

We conducted two counts of verb family size for the 184 base verbs, using the CELEX lexical database (Baayen et al., 1993) and a dictionary of German verbs that provides an exhaustive account of complex verbs (Mater, 1966). To obtain an extensive count of all possible verb derivations, we included all possible prefixed variants of a base verb, including prefixes (e.g., ent-, be-), inseparable particles (e.g., um, unter, über), separable particles (e.g., an, auf, mit, um, unter, zu), and copredicates or compound verbs (e.g., sauber + kehren, “sweep clean”), without counting the base itself.

Verb regularity

The regularity of 1,259 verbs (i.e., 432 simple/base and 827 complex verbs) was defined according to their participle formation. Participle formation in German is completely concatenative for both regular and irregular verbs: The prefix ge- and one of two suffixes (-t/-en) are affixed to the base. Four types of participle formation result from the combination of stem (“regular” infinitive stem/“irregular” vowel change) and suffix (“regular” -t/“irregular” -en). They are labeled, following our previous definitions (e.g., Smolka et al., 2013; Smolka et al., 2018; Smolka et al., 2007), as “regular” (infinitive root/-t suffix), “irregular 1” (infinitive root/-en suffix), “irregular 2” (vowel change/-en suffix), and “irregular 3” (vowel change/-t suffix). The label “regular,” although motivated differently, coincides with the label “default” in dual-mechanism accounts (Clahsen, 1999; Marcus et al., 1995; Wunderlich & Fabri, 1995). Complex verbs usually inherit the verb regularity of their base. Hence, all verb derivations of trinken (“drink”), including betrinken (“get drunk”), austrinken (“drink up”), abtrinken (“sip off”), and zutrinken (“raise one’s glass to someone”), undergo the same irregular verb inflections as the base verb does (cf. trinkentrankgetrunken, “drink–drank–drunk,” with betrinkenbetrankbetrunken).

Results

Semantic association test

Interrater agreement

To calculate the interrater agreement among the 334 raters (who rated subsets of the 1,186 overall verb pairs), we used Krippendorff’s alpha (Hayes & Krippendorff, 2007), which is particularly apt to be used for scaled variables. We used the SPSS implementation of Andrew Hayes (www.comm.ohio-state.edu/ahayes/macros.htm) to calculate Krippendorff’s alpha and aimed at reaching an alpha of .8, which is considered as indexing high interrater agreement.

The inclusion of all 1,186 verb pairs reached an alpha of α = .7094, which indexes a moderate level only. This was not surprising, given that many different types of verb pairs were included in the semantic association tests, so that the coherence of the ratings would naturally differ between them.

Because verb pairs with high variability in the ratings are less suitable for psycholinguistic experiments, we aimed at providing a criterion that would allow for applying a reasonable cutoff to the standard deviation of the ratings per verb pair to decide whether a particular verb pair should be included in an experiment. To this end, we stepwise excluded those verb pairs that had ratings with large standard deviations and recalculated Krippendorff’s alpha until it reached .8. That is, in the next round, only verb pairs with a standard deviation lower than 2.0 were included; then, only those with a standard deviation lower than 1.9 were included; and so forth. Table 3 gives an overview of how Krippendorff’s alpha changed with the stepwise lowering of the critical standard deviation. An interrater agreement of α = .8 was reached when only verb pairs with a standard deviation lower than 1.5 across ratings were included. However, the probability of failure to achieve an alpha level of at least .80 was still quite high, with p = .4730. This probability of failure (to achieve an alpha of at least .80) was reduced to p = .0760 when only verb pairs with a standard deviation lower than 1.4 across ratings were included, which also improved Krippendorff’s alpha to α = .8143. On the basis of these data, we recommend selecting verb pairs with standard deviations lower than 1.4 across ratings for experimental studies. In the present database, this holds for 778 of the verb pairs.

Linear mixed-effect modeling

We used R (R Core Team, 2012) and lme4 to perform linear mixed-effect analyses on the rating data (e.g., Baayen, Davidson, & Bates, 2008; Bates, 2005; Bates, Maechler, & Bolker, 2011). To avoid collinearity, we ran multiple Pearson correlation analyses to assess whether the variables of interest (i.e., the different frequency measures, the two counts of verb family size, AoA, AoR, and the ratings of the semantic association test) were correlated with each other. Appendices A and B provide correlation matrices of these measures. In the following analyses, only variables with a correlation coefficient lower than .33 were included within the same model.

As random effects, we had intercepts for participants, base verbs, and their companion verbs (i.e., the simple or complex verb whose relatedness to the base verb was tested). Because these companion verbs always preceded the base verbs in the rating tests, and for reasons of simplicity, we refer to them as “primes,” in contrast to the “base” verbs. In all of the following analyses, we tested the influence of various distributional variables separately for the base verb and its prime. These variables were numbers of letters and syllables, absolute and normalized lemma frequencies, absolute and normalized word form frequencies taken from CELEX (Baayen et al., 1993) or dlexDB (Heister et al., 2011), as well as verb family size (of the base) collected from CELEX or Mater (1966), verb regularity, and prime complexity. The frequency variables were log-transformed and centered, and all other distributional variables (referring to family size and numbers of letters and syllables) were centered (see Winter, 2013). The best model fit was obtained by comparing the Akaike information criterion (AIC) statistics between models (cf. Sakamoto, Ishiguro, & Kitagawa, 1986).

Semantic transparency between complex verbs and their base

We started out with analyses that included the semantic association ratings between morphologically related verbs, testing the semantic transparency between complex verbs and their base.

The best model fit included the fixed-effect factors semantic transparency (transparent/opaque), prime complexity (prefix/particle/both), and verb family size, and the control factor prime frequency. Here and in the analyses below, verb family size refers to the centered verb family counts from the dictionary of German verbs (Mater, 1966), and prime frequency refers to the word form frequencies (i.e., the infinitive) from dlexDB (Heister et al., 2011), which were log-transformed and centered. Table 4 summarizes the effects.

Table 4 Predictors in the linear mixed-effect model for ratings of semantic transparency between complex verbs and their stems

The significant effect of semantic transparency confirmed that the a priori definitions of semantic transparency coincide with the semantic association ratings: Semantically transparent complex verbs were rated as possessing higher meaning relatedness with their base than were semantically opaque verbs. Verb family size had a negative effect on ratings: Base verbs with larger families were rated as possessing lower meaning relatedness with their complex verbs than were base verbs with smaller families. That is, the larger the family size, the lower the meaning relatedness between complex verbs and their bases. This indicates that larger families increase the meaning variability, and thus also the semantic opaqueness (see the left panels of Fig. 1). The frequency of the complex verb had a negative effect, too. The higher-frequent a complex verb, the lower are the ratings of meaning relatedness with its base. That is, higher-frequent complex verbs are judged as being more semantically opaque than lower-frequent complex verbs (see the right panels of Fig. 1). The prime complexity variable indicated that particle verbs (as the reference level) were rated as possessing higher meaning relatedness with their base than did prefix verbs. That is, particle verbs were judged as being semantically more transparent than prefix verbs (see the left panels of Fig. 2).

Fig. 1
figure 1

Semantic association ratings (on a scale from 1 to 7) between transparent and opaque complex verbs and their base verbs, with higher ratings reflecting greater transparency. The left panel displays the effect of verb family size; the right panel shows the effect of prime frequency (i.e., the frequency of the complex verb) on the ratings

Fig. 2
figure 2

The left panel displays the effects of prime complexity (prefix verb, particle verb or both) on the semantic association ratings of semantically transparent and opaque verbs. The right panel shows the effects of prime complexity (simple verb, prefix verb, particle verb or both) on ratings between unrelated, form-related, and semantically related verbs

Semantic relatedness between verbs with different bases

The second analysis tested the semantic relatedness of verb pairs with different bases (i.e., in the semantically, form-related, or unrelated conditions). These verb pairs consisted of either a complex and a simple verb or two simple verbs.

The best model fit included the fixed-effect factors prime type (S/F/U) and prime complexity (simple/particle/prefix/both) and the control factor prime frequency. Table 5 summarizes the effects.

Table 5 Predictors in the linear mixed-effect model for ratings of semantic relatedness between verbs with different stems

Purely form-related verb pairs and purely semantically related verb pairs were rated as being higher in meaning relatedness than unrelated verb pairs. Again, the prime complexity factor negatively affected the ratings (with simple verbs as the base condition): With complex verbs as the primes (i.e., particle verbs, prefix verbs, and particle verbs with a separable or an inseparable particle), verb pairs were rated as possessing lower meaning relatedness than verb pairs with a simple verb as the prime (see the right panels of Fig. 2). Even though the effect of prime frequency was only marginally significant, its inclusion significantly improved the model, which is the reason why we kept it in the model.

Lexical paraphrase

We consulted the dictionaries Wahrig (2007) and Duden (Dudenredaktion, 2009) for lexical paraphrases, referring to either semantic transparency (i.e., the definition of a complex verb via its own base) or semantic relatedness (i.e., the definition of a simple or complex verb via the base of another verb). Altogether, we collected lexical paraphrases for 893 different complex verbs; however, since complex verbs in German are very productive, 26 were not encountered in either dictionary (and are thus indicated as “not available,” NA, in the supplementary materials).

The Pearson correlation coefficient indicated that the definitions of the Wahrig and Duden dictionaries are highly correlated, r(182) = .92808, p < .0001. Nevertheless, they diverged from one another in 275 cases regarding the semantic transparency of complex verbs, and in 164 cases regarding the semantic relatedness between verbs with different bases. We thus provide a joint measure that defines a complex word as semantically transparent, T, if one of the dictionaries referred to its base; and another that defined verb pairs (with different bases) as semantically related, S+, if one of the dictionaries defined them as semantically related S+. These combined measures are provided in Appendix A.

Semantic transparency of complex verbs

The lexical paraphrases referring to a complex verb’s own base showed that in the morphologically related conditions, 325 were defined as semantically transparent (T) and 106 as semantically opaque (O), NA = 1. In the morphologically unrelated conditions, the numbers were even more skewed. In the unrelated condition (U), 70 verbs were semantically transparent (T) and 14 opaque (O); in the purely semantically related condition (S), 136 verbs were semantically transparent (T) and 45 opaque (O), NA = 10. In the form-related condition (F), 134 verbs were semantically defined via their own base and thus semantically transparent (T), whereas 37 were not defined by their own base and thus were semantically opaque (O); the not applicable cases (NA) were 15. The numbers of transparent versus opaque verbs differ in the morphologically unrelated conditions (F/S/U) because they were selected according to their relation with the target verb rather than with respect to their own base. For example, the form-related verbs bewerbenwerfen (“apply–throw”) are lexically paraphrased as semantically unrelated (S–); however, the whole-word meaning of bewerben (“apply”) is paraphrased as transparent with respect to its base werben (“advertise”). Overall, these skewed numbers indicate that lexical paraphrases render complex verbs as relatively transparent with respect to their base.

Semantic relatedness between verbs with different bases

The lexical paraphrases that refer to the relatedness between verbs with different bases confirmed that all 207 (NA = 17) form-related (F) verb pairs and all 220 unrelated (U) verb pairs were not defined via each other, and thus are semantically unrelated (S−). However, of the semantically related verbs (S), only 194 were lexically paraphrased via the target (S+), such as benötigen–brauchen (“require–need”) and sortieren–ordnen (“sort–put in order”), whereas 106 verbs were not defined via the target (S−), such as kochen–backen (“cook–bake”), NA = 10. The latter type (S−) may partly indicate highly associated verbs that are not synonyms.

Semantic similarity vectors

We provide the cosine similarities between semantic vectors from LSA and HAL simulations of 846 verb pairs (Günther et al., 2015) in Appendix A and examine here how well the model-based and the human-based similarity measures compare. Figure 3 illustrates the correlations between the similarities of the vector-based measures LSA and HAL (left panel) and those between the vector-based and human-based measures (right panel).

Fig. 3
figure 3

Correlations between the vector-based semantic similarity measures LSA and HAL (left panel) for 846 verb pairs, and correlations between vector-based similarity values (LSA) and human ratings for the same verb pairs (right panel)

As the left panel of Fig. 3 demonstrates, the cosine similarities between the semantic vectors from LSA and HAL simulations are rather similar (taub = .664, rho = .849; all ps < .0001), whereas the right panel shows that the similarities between the vector-based measures and human ratings are more scattered, thus reflecting reduced, though still significant, correlations (taub = .380, rho = .546; all ps < .0001). A comparison of the two scatterplots suggests that the vector-based similarity measures are not in one-to-one correspondence with the human measures but provide different aspects of semantic similarity.

Age of acquisition and age of reading

We collected estimates of age of acquisition (AoA) and age of reading (AoR) for 240 verbs. As expected, both measures are highly correlated, r(238) = .96208, p < .0001, and verbs are acquired earlier (mean AoA = 7.16 years, SD = 1.78, range 2–12.9 years) than they are encountered in text (mean AoR = 8.6 years, SD = 1.46, range 6.3–13.2 years). Even though this finding is not surprising, it indicates that AoA is not a sufficient measure when preparing the stimulus materials for experiments on reading with children, but rather that AoR—the age at which children encounter a word in text—needs to be taken into account.

We used R and lme4 to perform linear mixed-effect analysis on the estimates of AoA and AoR (e.g., Baayen et al., 2008). To avoid collinearity, we included only variables with a correlation coefficient lower than .5 within the same model. As random effects, we had intercepts for teachers (i.e., the raters) and verbs.

The best model fit included the fixed-effect factors verb complexity (simple/particle/prefix) and verb frequency (the absolute word form frequencies from dlexDB, log-transformed and centered). The model was equivalent (ΔAIC < 4) when the fixed-effect factor verb regularity (“regular”/“irregular1”/“irregular2”/“irregular3”) was included, as well. Table 6 summarizes the effects.

Table 6 Linear mixed-effect model testing the effects of verb complexity, verb frequency, and verb regularity on AoA and AoR

The factor verb complexity (where simple verbs functioned as the reference level) indicated that simple verbs are both acquired and read earlier than either particle verbs or prefix verbs (see Fig. 4). Verb frequency had a positive effect, in that higher-frequent verbs were both acquired and read earlier than lower-frequent verbs (see Fig. 4). Verb regularity indicated that verbs of the “irregular 3” type are acquired and read earlier than verbs of the other types (see Fig. 5). We suspect that this is because the “irregular 3” verbs contain modal verbs and extremely high-frequent verbs such as kennen (“know”), denken (“think”), and brennen (“burn”), which are heard early on by infants. However, there was no interaction between verb complexity and verb regularity.

Fig. 4
figure 4

Effects of verb frequency (absolute type frequency from dlexDB, log-transformed) and verb complexity (simple/prefix/particle verbs) on AoA (left panel) and AoR (right panel)

Fig. 5
figure 5

Effects of verb frequency (absolute type frequency from dlexDB, log-transformed) and verb regularity (“regular”/“irregular 1”/“irregular 2”/“irregular 3”) on AoA (left panel) and AoR (right panel)

AoA/AoR and semantic transparency

We further examined the AoA and AoR of the 200 prefix and particle verbs with regard to their semantic transparency. That is, are semantically transparent (T) and opaque (O) words acquired at the same time? Indeed, as Fig. 6 shows, semantically transparent verbs are acquired earlier than semantically opaque verbs, and transparent verbs are encountered in text earlier than opaque verbs. The best linear mixed-effect model (see Table 7) included the same factors as in the analysis above—verb complexity, verb frequency, and verb regularity—in addition to the fixed-effect factor semantic transparency, which indicated that semantically opaque complex verbs are acquired later on than semantically transparent complex verbs (see Fig. 6). All other factors showed the same effects as discussed above.

Fig. 6
figure 6

Effects of semantic transparency (T) or opacity (O) of complex verbs on AoA (left panel) and AoR (right panel)

Table 7 Linear mixed-effect model testing the effect of semantic transparency on AoA and AoR

AoA/AoR and family size

Significant negative correlations between family size and AoA or AoR further indicated that verbs with larger families are both acquired and read earlier than verbs with smaller families: AoA and FS Mater: r(38) = – .44140, p = .0044; AoA and FS CELEX: r(38) = – .47666, p = .0019; AoR and FS Mater: r(38) = – .47422, p = .0020; AoR and FS CELEX: r(38) = – .47417, p = .0020.

Family size

We used two measures of verb family size, a count in the CELEX lexical database (FS CELEX) and a count in the dictionary on German verbs (FS Mater). The Pearson correlation coefficient indicated that both measures are highly correlated, r(182) = .92808, p < .0001. Nevertheless, the CELEX count, with an average of 11.2 family members (SD = 12.2), strongly lags behind the dictionary count, with an average of 29 family members (SD = 28) (see also the supplementary materials). The probability plot of family size (FS Mater; see Fig. 7) illustrates the productivity of German complex verbs with the same base. See Table 4 and the left panel of Fig. 1 for the negative effect of family size on ratings of semantic transparency—that complex verbs are rated as more opaque with reference to their base, the larger their family. (Note that verb family size is calculated on verbs with the same base.)

Fig. 7
figure 7

Probability plot of verb family size (calculated from a dictionary: Mater, 1966) of German base verbs

Verb regularity

We collected the regularity of 1,259 different verbs. Altogether, 316 base verbs belonged to the “regular” type, 30 verbs to the “irregular 1” type, 79 to the “irregular 2” type, and seven to the “irregular 3” type. Of the 827 complex verbs, 497 belonged to the “regular” type, 86 verbs to the “irregular 1” type, 232 to the “irregular 2” type, and 12 to the “irregular 3” type. This skewed distribution roughly corresponds to the skewed occurrence of German verbs. For example, of the roughly 1,900 monomorphemic German verbs in CELEX (Baayen et al., 1993), only 200 types are irregular verbs. “Irregular 3” verbs are of very low type frequency and very high token frequency, since many of them function as modal verbs (e.g., müssen “must,” können “can,” dürfen “may”), and others are extremely high-frequency verbs (e.g., kennen “know,” denken “think,” bringen “bring”). The high token frequency of “irregular 3” verbs seems to determine the effect of verb regularity on AoA and AoR—namely, that they are acquired earlier than other verb types (see Fig. 5 and Tables 6 and 7).

Verb regularity and semantic transparency

The semantic transparency of a complex verb is independent of its regularity. This is suggested by the nonsignificant correlations both when irregularity was considered as a binary factor (regular vs. irregular), r(1148) = .04577, p = .1211, and when irregularity was a graded factor, r(1148) = .03940, p = .1821. This is further indicated by the lack of an effect of regularity when it comes to semantic transparency ratings. Indeed, the best model fit indicated that regularity does not affect semantic transparency ratings (and was therefore not included in the model; see Table 4).

General discussion

This study has made measures of semantic transparency and semantic relatedness for approximately 1,200 German verb pairs and the similarity computations for approximately 860 verb pairs available for public use, together with estimates of age of acquisition and age of reading, counts of verb family size and verb regularity, and correlations between these measures. Such measures are a vital means to study the mental representations of complex words, in particular whether the semantic transparency of the complex verb (in relation to its own base) affects its lexical representation. That is, are semantically opaque words like verstehen (“understand”) represented differently from semantically transparent ones like aufstehen (“stand up”)? Furthermore, are the lexical representations of complex verbs and their bases different from the lexical representations of semantically related verbs with different bases?

A general means to study this issue has been to manipulate the morphological, semantic, and form relatedness between complex word primes and targets. If lexical representation depends on morphological but not on semantic relatedness, both semantically transparent forms like insincere and opaque forms like restrain should prime their bases sincere and strain, respectively. However, previous studies in English and French have observed that only transparent forms induced priming, indicating that the lexical representations of complex words in these languages are determined by semantic transparency (for prefixed words of the type used in the present study, see Exp. 1A in Feldman et al., 2002; Exp. 4 in Gonnerman et al., 2007; Exps. 4 and 5 in Marslen-Wilson et al., 1994). By contrast, in German both behavioral (Smolka et al., 2009; Smolka et al., 2014; Smolka et al., 2018) and electrophysiological (ERP; Smolka et al., 2015) data have provided converging evidence that opaque verbs such as verstehen (“understand”) prime their base stehen (“stand”) to the same extent as transparent forms such as aufstehen (“stand up”). Relative to the unrelated condition (with slowest reaction times [RTs] and the most negative-going amplitude), semantically transparent and opaque complex verbs yielded the strongest priming effects: fastest RTs and most positive-going event-related potential (ERP) amplitudes (with an N250, P300, and N400 effect), and no difference between the two morphological conditions. These findings indicate that complex verbs in German are represented via their base, regardless of semantic transparency; that is, verstehen is lexically processed and represented via stehen. We have further shown that this (morphological) effect by verbs with the same base was stronger (faster RTs and more positive-going ERPs) than the effects between purely semantically related verbs with different bases, such as behindern–stören (“hinder–disturb”). Finally, form-related verbs such as verstehen–stehlen (“understand–steal”) inhibited behavioral responses and induced slightly more positive-going N250 and N400 amplitudes than in the unrelated condition. Altogether, these findings indicated that morphological priming in German is independent of pure semantic and form priming (Smolka et al., 2014).

Until today, information on the semantic transparency of complex verbs or the semantic relatedness between verbs with different bases has not been available in existing German lexical databases (e.g., Baayen et al., 1993; Heister et al., 2011) and had to be laboriously constructed by means of semantic association tests (see also Zinsmeister & Smolka, 2012). Therefore, the present database presents these measures for public use. In particular, we provide two measures of semantic transparency for 1,186 German verbs: (a) ratings from semantic association tests and (b) lexical paraphrases from two German dictionaries. We also offer two measure of the semantic relatedness between 774 (simple and complex) verbs with different bases: (a) ratings from semantic association tests and (b) lexical paraphrases from two German dictionaries. In addition, we provide the verb regularity of all verbs, two counts of verb family size for 184 base verbs, and estimates of AoA and AoR for a subset of 200 verbs. For the verbs presented here, we further include the absolute and normalized frequencies of their lemma and word form, all measures that are publicly available in CELEX and dlexDB. Finally, we provide the cosine similarities between semantic vectors from LSA and HAL simulations of 846 verb pairs (Günther et al., 2015). Given that these measures correlate to some degree with semantic transparency and semantic relatedness, they represent important factors in the construction of experiments. From the experimenter’s perspective, the different norms will be useful for planning experiments.

Semantic transparency

Semantic association test

In the semantic association test, we had asked in how far the meaning of the complex verb reflected the meaning of the base verb. Given that we asked to compare the meaning of the whole words, we had thus asked for the “whole-word” transparency of the complex verbs according to the definition by Marelli and Baroni (2015). The ratings of the semantic association test confirmed that the a priori defined transparency of complex verbs coincided with the raters: semantically transparent complex verbs yielded higher ratings than semantically opaque ones.

These ratings were affected by some characteristics of the complex verb. For example, the frequency of the complex verb had a negative effect on the ratings, with higher-frequent complex verbs being rated as semantically less transparent with respect to their base than lower-frequent ones (see the right panel of Fig. 1). This finding indicates that the higher-frequent a complex verb, the more strongly is it perceived as possessing an idiosyncratic meaning that becomes independent of the meaning of its base.

The family size of the verb pair had a negative effect on the semantic transparency ratings: The larger the verb family, the lower a complex verb was rated in terms of meaning relatedness with its base (see the left panel of Fig. 1). This finding indicates that larger families produce higher uncertainty in the meaning that the complex verbs belonging to a particular family may take: That is, verbs from larger families may have a wider range of semantic transparency and opacity than verbs from smaller families. At least this is how participants seem to perceive verbs from large families (as indicated by the semantic association ratings). Also, the type of morphological complexity of the complex verb played a role: Prefix verbs were rated as being semantically more opaque than particle verbs (see the left panel of Fig. 2), indicating that participants perceive prefix verbs as possessing a more idiosyncratic meaning than particle verbs.

Finally, the regularity of the complex verb did not affect the ratings (note that complex verbs inherit the verb regularity of their base).

Lexical paraphrase

The lexical paraphrase defining semantic transparency refers to the meaning compositionality of the complex verb itself. Is the meaning of the complex verb reflected in the meaning of its base? Hay (2001) argued that the lexical paraphrase may be used to define the semantic transparency of prefixed words: If a prefixed word is highly transparent, it will be defined by reference to its base word. If, however, the base word is absent from the definition of the complex word, this can be taken as a clear sign of semantic drift. Given that this concept of semantic transparency defines the composition of the complex word, it clearly refers to its “compositional” transparency (as defined by Marelli & Baroni, 2015).

We consulted two standard dictionaries of German, the DUDEN (Dudenredaktion, 2009) and Wahrig (2007). However, the fact that the two disagree in 275 out of 1,185 cases indicates that the definition via lexical paraphrase is not simple, at least for complex verbs in German. For example, in the lexical paraphrase of the complex verb abbiegen (“turn”), the DUDEN refers to the base biegen (“bend”), but Wahrig does not, and vice versa for the complex verb eingreifen (“intervene”), for which Wahrig uses greifen (“grab”) in the lexical paraphrase, but the DUDEN does not.

Comparison of semantic association test and lexical paraphrase

A comparison between the semantic association ratings and the lexical paraphrases shows that the latter may diverge from the intuition of German native speakers (as reflected in the semantic association tests). The agreement between the semantic association ratings and the lexical paraphrases referring to the bases of complex verbs was not straightforward.

In fact, about 30 of the 244 morphologically related verbs were lexically paraphrased as semantically opaque with respect to their base, even though they received mean association ratings higher than 4. For example, the DUDEN paraphrases the complex verb anschauen (“look at”) as ansehen (“look at”), and thus as being semantically opaque with respect to its base schauen (“look”). However, participants rated the verb pair anschauen–schauen (“look at–look”) as being highly related, with mean ratings of 5.56 (SD = 1.25). The same goes for the verb pairs vortäuschen–täuschen (“simulate–cheat”; mean rating 6.0, SD = 1.1) and antreffen–treffen (“come across someone/something–meet”; mean rating 5.49, SD = 1.5).

With respect to experimenting, these data imply that it might not be sufficient to rely on lexical paraphrases when searching for semantically opaque verbs in German.

It is possible that the (allegedly) worse performance of paraphrases depends on the binary scale of the paraphrase-based variable. A binary classification may not be able to capture a property like semantic transparency that is inherently graded. In contrast to the binary paraphrase definitions, the interrater variability of the semantic association test is graded, and the interrater reliability may provide a stronger tool to select items that native speakers “reliably” consider as being relatively transparent or relatively opaque.

These diverging results further indicate that ratings and paraphrases are not totally congruent in capturing semantic transparency. The here presented ratings cover “whole-word” transparency, whereas the lexical paraphrases cover “compositional” transparency.

However, it seems that in case of a discrepancy between the ratings and the lexical paraphrases, more often than not is the “true” meaning of a complex word or at least its dominant meaning better reflected in the ratings than in the lexical paraphrases. For example, the complex verb umkommen means “perish,” “die.” The ratings of the semantic association test (mean = 2.05, SD = 1.56) reflect that the whole-word meaning of umkommen is semantically opaque with respect to the meaning of its base kommen (“come). However, according to the lexical paraphrase, the complex verb umkommen should be considered as being transparent, since the DUDEN refers to the base “come” in the definition of umkommen (“perish”): “find death because of an accident or tragedy,” and “lose their life,” which is literally expressed as “come around life” in German. On the other hand, the complex verb befolgen means “to obey someone or something.” The lexical paraphrase—”to comply with something,” “be guided in his actions” does not refer to the base folgen (“follow,” “obey”), so that befolgen is to be considered as being semantically opaque. However, German native speakers consider the complex verb as being semantically transparent (mean rating = 5.22, SD = 1.58), which seems to be a better reflection of the verb’s meaning than the paraphrase definition.

It is therefore advisable to consider both measures when investigating the semantic transparency of complex verbs, at least in experiments on German verbs.

Semantic relatedness

Semantic association test

Semantic relatedness was rated between verb pairs with different bases. Unsurprisingly, the semantic association test confirmed that verbs that were a priori classified as semantically related received the highest ratings, whereas verbs that were defined a priori as purely form-related received lower ratings, and those that were defined as unrelated received the lowest ratings in meaning relatedness. Interestingly, even pure form-relatedness seems to positively enhance the perception of semantic relatedness (relative to the unrelated condition).

As with morphologically related verb pairs (with the same base), also the ratings between verb pairs with different bases were affected by the complexity of the prime (i.e., the simple or complex verb preceding the base verb). Verb pairs were rated as higher in meaning related when the prime was a simple verb, verb pairs were rated as lower in meaning relatedness when the prime was a particle verb, even lower when it was a prefix verb, and lowest when it could be interpreted as with both a separable and an inseparable particle (see the right panel of Fig. 2). Finally, the semantic relatedness ratings were not influenced by the regularity of the verbs (neither of the prime nor of the base).

Lexical paraphrase

We also considered the lexical paraphrases of verbs to determine their semantic relatedness, that is, whether they are defined one via the other. The two dictionaries we consulted disagreed in 164 out of 1,185 cases and thus indicated that this may not be a simple measure for German verb pairs. Indeed, lexical paraphrases were straightforward only for unrelated or purely form-related verbs, but did not mirror the ratings of the semantic association test in about one third of verb pairs a priori defined as semantically related.

A total of 310 verb pairs were defined a priori as semantically related; the lexical paraphrases defined 106 of these as unrelated in meaning (S−), even though 83 of these 106 verb pairs received mean ratings of higher than 4 in the semantic association test (62 of these had SD lower than 1.4 and thus very stable ratings across raters). That is, even though the verb pair anbrüllen–schreien (“bawl at someone–scream”) was lexically paraphrased as unrelated in meaning, participants rated this pair as highly meaning related (mean rating of 5.81, SD = 0.91) in the semantic association test. And similar for the verb pairs ausrutschen–stürzen with mean ratings of 6.0 (SD = 0.63), and kochen–backen with mean ratings of 5.58 (SD = 1.17).

Typically, these types of verb pairs consist of verbs that are strong associates but not synonyms. This indicates that lexical paraphrasing may be an accurate tool for defining the semantic relatedness of synonyms (that are more likely to be defined one via the other) but less so for defining semantic associates.

Vector-based semantic similarities

The similarity between 846 verb pairs in text corpora was calculated and indicated that LSA- and HAL-based vectors (Günther et al., 2015) provide analogous similarity measures (see also the left panel of Fig. 3). However, even though LSA and HAL similarity measures are good predictors of each other, these vector-based measures are less good at predicting humans, and seem to provide different aspects of semantic similarity than the human association ratings do (see the right panel of Fig. 3).

To summarize, vector-based similarity measures, lexical paraphrases and semantic association ratings deliver complementary information about the semantic relatedness of verb pairs. Lexical paraphrases are particularly apt to define unrelated or form-related verb pairs or synonyms (and less so semantic associates). The vector-based similarities represent the similarities between words in text corpora, and both differ from human similarity ratings; further, the lexical paraphrases represent a binary measure, whereas the semantic association ratings represent a graded/scaled measure. The variability across raters on the semantic association ratings for each verb pair provides valuable information to select stimuli for psycholinguistic experiments. Analyses of the inter-rater agreement across all items using Krippendorff’s alpha indicated that verb pairs with standard deviations lower than 1.4 had sufficiently coherent semantic association ratings. In the present database this holds for 778 of the verb pairs, which are thus suitable items for experimental studies in which semantic relatedness is in focus.

Age of acquisition and age of reading

The age at which a word is acquired is known to affect lexical processing, and has thus become an important factor in word recognition studies (e.g., Brysbaert & Cortese, 2011). We have introduced a new factor—AoR, the time at which a child encounters a word in text. As expected, both factors are highly correlated and our data confirm that verbs are acquired in speech before they are encountered in text. Given that the two factors—AoA and AoR—are highly correlated, one could argue that one can focus on a single measure instead of considering both for the purpose of item matching. Indeed, this is the case when items are selected for experiments with adults and teenagers who are fluent readers. However, when constructing experiments on visual word recognition or reading with children who are not fluent readers yet, the variable AoR may be important. For example, the complex verb hierbleiben (“stay here”) is acquired at the age of 3 (mean AoA = 3.06, SD = 1.13) but is estimated to be encountered in text only 3.5 years later (mean AoR = 6.7, SD = .71). Hence, in an experiment with visual stimulus presentation, children should have at least the age of 7 when reading hierbleiben even though they know the verb for a much longer time.

Our data further show that AoA/AoR is strongly affected by other variables, such as morphological complexity and semantic transparency. Indeed, the morphological complexity of a verb determines when it is acquired: Simple verbs are acquired earliest, followed by separable particle verbs, and prefix verbs are acquired latest (see Fig. 4). This finding thus supports the observation that the acquisition of prefix verbs in German is severely delayed relative to that of separable particle verbs (see Behrens, 1998).

Most interestingly, the semantic transparency of complex verbs in German was found to correlate with AoA and AoR. Our data provide evidence that the acquisition of semantically transparent complex verbs occurs earlier on than the acquisition of semantically opaque verbs. As Table 7 indicates, semantically transparent complex verbs (in the intercept) have been acquired by the age of about 6 years, and opaque complex verbs about 1.5 years later. This effect seems to remain evident in the lexical processing of children for a couple of years. Indeed, a recent study on the acquisition of the lexical representations in German has shown that semantic transparency affects lexical processing and representation in 11- to 12-year-old children, but not in older children (Smolka & Baayen, 2018). For 11- to 12-year-olds, priming by semantically transparent complex verbs (zubinden–binden, “tie–bind”) was stronger than that by semantically opaque verbs (entbinden–binden, “deliver–bind”), indicating that the meaning relatedness between the complex verb and its base played a role in lexical processing. These findings correspond to the present data that semantically transparent complex verbs are acquired earlier on than semantically opaque ones, which may affect the lexical processing of semantically transparent versus opaque verbs. Interestingly, the semantic transparency effect that was found in 11- to 12-year-old children diminished in 14- to 15-year-old children and was completely absent in adults. We interpret the latter finding to indicate that children learn morphological regularities the more they are exposed to the language. In the case of complex verbs, children need to learn that many complex verbs have the same base verb, even though they possess many different meanings, ranging from semantically transparent to semantically opaque. That is, children learn that the meaning of verbs becomes more unpredictable, the more verbs have the same base.

Family size

We had previously asked (Smolka et al., 2010) whether verb family size affects the lexical processing and representation of complex verbs—in particular, whether a verb’s family size influences the effects of semantic transparency. Indeed, the present findings show that complex verbs belonging to large families are perceived as being more semantically opaque than verbs belonging to small families (see the left panel of Fig. 1).

The present study provides two measures of verb family size, one count from the lexical database CELEX (Baayen et al., 1993) and another count from a dictionary of German verbs (Mater, 1966). Even though both measures are highly correlated, the CELEX count provides on average 23.3 members less than the dictionary count. For example, CELEX provides 27 family members of the base verb treten (“kick”), in comparison to 67 family members in the dictionary entries (see also the supplementary materials for further examples). We thus suggest using the dictionary count if a realistic measure of verb family size is required.

Note that this count of verb family size includes only the word category of verbs, and thus differs in this respect from the count of family size that was introduced by Baayen and colleagues (De Jong et al., 2000), who included all word categories in the count.

Some morphological theories assume separate lexical entries for verbs and nouns, such as one lexical entry for the noun timev and another one for the verb timev , despite their having the same base (e.g., Taft, 2004). The count of verb family size may thus provide an important measure for studies investigating the processing of verbs. Nevertheless, since the derivation of nouns and adjectives from verbs is very productive in German, we may assume that the overall family size will strongly correlate with verb family size.

Family size is a morphological concept but is known to be of a semantic nature (Bertram et al., 2000; Schreuder & Baayen, 1997). It is thus important to consider the family size of a verb when dealing with its semantic transparency. Interestingly, as we have discussed above, verb family size has a negative effect on the ratings of morphologically related verbs (with respect to their own base): The larger the verb family, the lower the semantic transparency ratings. This indicates that participants seem to be aware of the family size of a verb and of the idiosyncratic meanings that come with this.

Verb regularity

Verb regularity (of the base or the prime) affected neither the ratings of the semantic association test nor the semantic transparency of complex verbs. Verb regularity only affected AoA and AoR in that “irregular 3” verbs are acquired earlier than other verb types. We have hypothesized above that this may be due to the fact that extremely high-frequent modal verbs, such as the German forms of can, may, and must, and other high-frequent verbs, such as the German forms of think or bring, belong to the “irregular 3” type, and that young children are exposed to these verbs early on.

Conclusion

This article provides a number of measures important for designing psycholinguistic experiments using German verbs: semantic association ratings, lexical paraphrases, and vector-based similarity measures, two counts of verb family size, estimates of age of acquisition and age of reading, and verb regularity, together with lemma and type frequencies from public lexical databases. For measures of semantic transparency and semantic relatedness, we recommend using verb pairs whose ratings show standard deviations lower than 1.4 and using lexical paraphrases for unrelated or form-related verb pairs or synonyms. We further recommend considering the book count of a verb’s family size and neglecting a verb’s regularity.

Author note

This study was supported by the German Research Foundation (DFG), grant EU 39/7-1 awarded to C.E., and by the Volkswagen Foundation, grant FP 561/11 awarded to E.S. We thank Sarolta Komlósi, Katrin Preller, Nadine Tema, and many of the psychology students at the Philipps-Universität Marburg for their help in collecting the semantic association ratings. Special thanks to Fritz Günther, who calculated the semantic similarity vectors using his R package for the here-presented verb pairs.