If you talk to a man in a language he understands, that goes to his head. If you talk to him in his own language, that goes to his heart.’ Nelson Mandela.

1 Introduction

Researchers in all disciplines strive to increase their research impact in form of citations which have become the default measure of scientific success (Martin and Irvine 1983; Wang 2014). According to Merton (1968) citations serve as a major building block of the scientific reward system. Even in the absence of citations, an article’s publication in a peer-reviewed journal is considered as a signal of its original contribution and a measure of quality. Accumulating citations indicates additional peer recognition of an article’s value and its impact for science (DeBellis 2009; Merton 1973; Simonton 2004). Previous research shows that the majority of Nobel laureates were among the top 0.1 percent cited authors (Garfield 1973). Furthermore, Cole and Cole (1967) show that citations are more important than the amount of publications in receiving awards and being widely known in the scientific community.

Recent research analyzing drivers of scientific impact focused their analysis on several dimensions: authors (Podsakoff et al. 2008; Judge et al. 2007; Nerkar and Lahiri 2014), journals (Podsakoff et al. 2005; Rafols et al. 2012), references (Mingers and Xu 2010; Pehlan et al. 2002) or linguistic attributes (Judge et al. 2007; Antons et al. 2019). We expand research on the latter with more sophisticated measures to detail the use of language and study the corresponding effect on article impact. Existing research uses basic linguistic attributes, e.g. an article’s length or its number of keywords. More recently, Antons et al. (2019) analyzed the content of articles and their effect on impact by extracting underlying topics and their structure.

However, little is known on how articles use the variations of language in scientific communities that typically follow disciplinary conventions regarding language complexity (Stremersch et al. 2007; Locke and Golden-Biddle 1997). This is surprising as the choice of language seems to be likely to modify the article’s reception and thus may provide a barrier for the author’s research. Understanding these effects not only enables recommendations for authors but further explains how readers, reviewers as well as the evolution of scientific disciplines is affected by language complexity. We seek to address this research gap by asking: How does language complexity in communicating science affect impact? Contrary to previous research we take on a different perspective. We assume that scientific communities have their own languages and jargon (Kramsch 1998). We therefore unpack language complexity in two forms: first, we theorize and test how uniquely aligned an articles jargon should be to one scientific community (further called uniqueness) to gain legitimacy and maximize impact. Second, we disentangle the effect of novel recombination of community specific jargon (novelty) and propose theoretical arguments for the effect on impact. Furthermore, are we interested in uncovering the trade-off scientists face when choosing an article’s level of uniqueness and novelty. Our results reveal that an article’s uniqueness increases its perceived legitimacy and thus enables impact in the form of citations. We also show that the relationship between novel recombination of community specific jargon and impact follows an inverted U-shape. Furthermore, we show that for articles with a high level of uniqueness the optimal level of novelty is lower and the overall effect on impact is higher.

We answer our research questions based on a comprehensive sample of entrepreneurship articles. The context of entrepreneurship was chosen for multiple reasons: earlier articles from this discipline often focus on definitions of the field (Gartner 1990) or the potential of the field for research (Shane and Venkataraman 2000; Ireland et al. 2005). Meanwhile entrepreneurship has grown into a mature research field, as indicated by several systematic structuring approaches for the field (Cornelius et al. 2006; Reader and Watkins 2006; Schildt et al. 2006) as well as literature analyzing its evolution (Busenitz et al. 2014; Grégoire et al. 2006). However, the effects of novelty and thus the underlying speed of the evolution has not yet been analyzed in the domain of entrepreneurship science. To do so, we combine traditional bibliometric techniques (Lampe and Hilgers 2015; Schildt et al. 2006) with measures from natural language processing (Robertson 2004). The combination of these techniques allows to detect and analyze the similarity and divergence of articles to scientific communities and thus their specific language or jargon. As entrepreneurship science is regarded as cross-disciplinary in nature it provides a good context for our study as it likely entails sub-communities with disparate language structures. We contribute to recent literature in several ways. First, we contribute to the quickly expanding body of research into the domain of science of science, unpacking potential effects of language complexity on impact (Judge et al. 2007). Second, we add to literature on legitimacy (Garud et al. 2014; Gurses and Ozcan 2015; Taeuscher et al. 2020) by analyzing how scientific articles gain legitimacy and thus impact by uniquely aligning to one community language as well as recombining community language in a novel way. We further ad to literature on novelty and its varying effects on research impact (Boudreau et al. 2016; Lee et al. 2015; Trapido 2015; Uzzi et al. 2013). Fourth, our findings improve our understanding of how citation behavior emerges and how it can influence the evolution and convergence of a scientific field such as entrepreneurship (Busenitz et al. 2014; Grégoire et al. 2006). Our findings also provide several practical implications for scientists seeking to maximize their research impact by showing the effects of using community specific language in combination with novelty.

2 Theoretical background

2.1 Language, jargon, and communities

Previous research has suggested that linguistic characteristics affect an article’s impact. For example, Stremersch et al. (2007) show that an article’s length has a positive effect on impact. The authors further find that the number of keywords has a negative effect on impact. Diving more deeply into article attributes, previous research shows that better readability, associated with a greater writing clarity, positively affects an article’s impact (Judge et al. 2007). Surprisingly others find the opposite effect, leading to the implication that some scientists do not necessarily find more readable research more legitimate (Stremersch et al. 2007).

A closely related research domain focuses on the rhetoric of scientific texts (Gephart 1988; Gross 1990; McCloskey 1994; Simons 1990). Rhetoric is most broadly construed in the Aristotelian tradition, as honest argument intended for an audience (McCloskey 1994). This definition implies that as soon as scientists frame ideas for presentation to an identified audience, they are engaging in rhetoric. Locke and Golden-Biddle (1997) identify rhetorical practices that award credibility of contributions. These so-called markers are depicted by expressing inclusiveness (e.g. “both”, “and”, “not only”) or exclusiveness (e.g. “but”, “else”, “nor”) (Tausczik and Pennebaker 2010; Pennebaker et al. 2015). Readers then may find articles presenting a blend of these rhetorical markers credible and cite their contributions in future work.

However previous literature on potential effects on scientific impact have mostly neglected that communication, especially language and rhetoric, is suggested as a symbol of social identity (Kramsch 1998; Vilhena et al. 2014). The use of a certain language might be understood as affiliation or belonging to a certain community (Kramsch 1998). Thus, language reflects a community’s matters of focus, expertise and special concern, often depicted via artificial words, enunciated via compressed terms—jargon—frequently used as synonyms for more complex constructs. These artificial words or constructs are used to refer in the most efficient manner to familiar as well as common concepts. This linguistic compression, via jargon, for efficient communication between peers of a common community might likewise occur in criminal argot, a subcultural lingo or in regional dialects (Vilhena et al. 2014).

Due to its epistemic cultures, language and rhetoric are of special importance in science (Knorr-Cetina 1999). Scientific jargon allows a more precise and efficient communication with peers within the same scientific community. For example, when the term ‘fitness landscape’ is used by an evolutionary biologist, a comparison of expected relative reproductive successes across multiple genotypes is implied (Vilhena et al. 2014). It is very likely that a scientist from the domain of entrepreneurship would need a bit more of an explanation to understand the full context of this artificial term. Every scientific field has its own ideas, constructs and measures, often expressed via specialized jargon, which might not overlap with those close to other disciplines or communities.

Based on prior research we find that language or jargon is important when distinguishing scientific disciplines. We further argue that this finding applies not only to the distinction of scientific disciplines which are remote from each other, but also in sub-research fields of scientific domains and their underlying communities’. Especially entrepreneurship science, due to its heterogeneous nature, caused by its cross-disciplinarity, represents an ideal context for a more fine-grained analysis of the effects of community-based language or jargon. According to Kuhn’s (1996) argument communities’ underlying differences of language are like proponents of different theories. Thus, analyzing community specific rhetoric does not superficially refer to the language per se, but rather shows deeper layers of content as theories, norms and constructs as well as measures, utilized in a scientific community.

We now derive our hypotheses, elaborating the potential relationship between language complexity in form of uniqueness and novelty with impact, as well as the potential moderating role of uniqueness on the relationship between novelty and impact.

2.2 Uniqueness—choice of community

As described above, language is a powerful instrument of identity and belonging. As the introductory quote points out, addressing somebody in his or her own language might have a profound effect on that person. Thus, we assume that jargon used in scientific publications is highly relevant for an article’s perceived legitimacy in a certain community, followed by impact. Previous research has shown that narrative strategies are important in making meaning of opportunities, allowing them to contextualize innovations and make content meaningful in order become legitimate (Garud et al. 2014; Gurses and Ozcan 2015). Institutional scientists argue that the deviation from a categorical prototype reduces the comprehensibility of a proposed new venture because it prevents audiences from linking the unknown to a familiar cognitive template (Navis and Glynn 2011). McKnight and Zietsma (2018) suggest that the notion of analogies to situate one’s own approach into the current thinking to create a common ground from which to further separate one’s own ideas. A different stream of research, focusing on category spanning, has shown that audience members refer to established categories to make sense of products (Hsu et al. 2009; Kovács and Johnson 2014). This research has also shown that spanning multiple categories has negative effects on audience appreciation and thus legitimacy.

In science, generally, it is well understood that knowing your audience is important for tailoring communication to the expectations of the recipients. This implicitly suggests that the jargon of the target audience is to be used in an article. Hence the question arises to what extent an article should be committed to a community’s language. Should an article serve several communities in terms of jargon or focus on one community only? We therefore focus on articles’ language uniqueness in terms of how clearly an article is assigned to one community (compared to other communities). On the one hand, an article might be assigned to several communities, thus using the jargon of several communities in terms of a more uniform distribution. On the other hand, an article might be focused on a certain community thus using more of that community’s jargon relative to the jargons associated with other communities.

Taking up the example from the previous section, the term ‘fitness landscape’, we argue that when an article uses this jargon associated with a certain community, for scholars e.g. from the domain of psychology it is probably hard to understand, whereas scholars from the associate community—familiar with the term—are more likely to ad hoc understand and see this article as legitimate. This leads to the assumption that talking in a community’s language yields higher impact. Whereas when intermingling jargon this would be more likely to drive away several audiences in form of not being legitimate for them. This again would result in less impact.

We conclude that if an increase in language uniqueness is observed it should increase an article’s understanding (Boudreau et al. 2016; Garud et al. 2014; Gurses and Ozcan 2015) and its legitimacy resulting in higher impact. This leads to our first hypothesis:

Hypothesis 1. An article’s language uniqueness has a positive effect on the article’s impact.

2.3 Article’s novelty

Novelty in science might be understood in terms of Schumpeter’s (1939) concept of a recombinant nature of innovation: explaining innovation as a novel recombination of existing knowledge, which would likely result in a mix of jargons from different communities. The construct of novel recombinations has been central in recent studies on scientific impact (Lee et al. 2015; Trapido 2015). The literature argues that scientific papers that draw on unusual or novel combinations of journals in their references can be thought of as representing relatively more novel knowledge (Uzzi et al. 2013). Among others, this idea has diffused into research areas such as technology (Kaplan and Vakili 2015; Valentini 2012) and science (Boudreau et al. 2016; Trapido 2015; Uzzi et al. 2013). However, previous research often used the constructs of novelty and impact interchangeably (Lee et al. 2015). Uzzi et al. (2013) as well as Lee et al. (2015) are among the first to disentangle the concepts of novelty and impact in the context of science. Although research has begun to understand the relationship between novelty and impact, mixed and contrary results are presented in previous literature, emphasizing the need for a better understanding of this relationship. Lee et al. (2015) argue that the relationship between novelty and impact is positive linear. Boudreau et al. (2016), in the context of research proposals, show that the novelty of proposals has a negative effect on evaluations. As an exploratory part of their article they allow novelty to take on a more flexible relationship with evaluations. Trapido (2015) moves closer to a curvilinear relationship, showing that lower-novelty work is associated with higher citation counts, while higher-novelty work has a negative effect on impact.

We are taking these different perspectives into consideration to derive two hypotheses about the potential effect of novelty on impact. Previous literature has shown that higher levels of novelty make research more interesting and moves it into new unknown territory which may lead to higher impact (Newman and Cooper 1993; Schoenmakers and Duysters 2010). Aldrich et al. (1994) argue that research, especially in the social sciences, is driven by novelty, surprise, controversy and interest. A closely related finding from previous research shows that interdisciplinary research has higher impact in the long run, by combining references from different disciplines (Van Noorden 2015). The argumentation behind this finding is that accessing and combining unusual knowledge domains or relying on a high variety of knowledge increases impact. Thus, a positive effect of an article’s novelty on impact might be assumed (Lee et al. 2015). In this article’s context, the novel recombination of community specific jargon is assumed to have a linear positive effect on article’s impact. Thus, we propose our second hypothesis:

Hypothesis 2. An article’s novelty has a positive effect on the article’s impact.

Despite the assumed positive relationship between novelty and impact, novelty might also have contrasting effects. For example, research in psychology suggests a bias against novelty, arguing that more novel ideas might be more difficult to process (Miller 1986; Mueller et al. 2012). In management science, recent research points toward the negative effects of novelty as well: Boudreau et al. (2016) find that the evaluation of research proposals is negatively biased if the proposal’s content is novel. A closely related finding by Uzzi et al. (2013) shows that high impact science derives for the most part from conventional (common and existing) recombination of knowledge. Furthermore, Van Noorden (2015) shows that interdisciplinary research has lower impact in the short run.

In line with these arguments we argue that adopting more novel recombinations of community jargon, might confuse the reader of such an article and thus make an article’s content more difficult to process. This difficulty in processing articles’ content, might be due to non-understanding of the introduced artificial constructs and terms or a perceived distance and hence lack of interest in the topic. This would lead readers to reject the content of an article and would thus result in less impact. We further argue that this negative relationship increases with higher novelty values. For example, having solely a few artificial terms from different communities to deal with, a reader might be willing to investigate the meaning of these few terms. For higher values of novelty, more and more relatively new combinations of jargon—previously not combined—might be included. This would lead to a more excessive amount of artificial terms (not combined previously), which is most likely to frustrate any reader. We therefore propose that the downsides of novelty tend to increase as novelty increases, resulting in a convex or exponential negative effect on impact.

While there are several benefits of high novelty, with novelty also come escalating disadvantages. After a certain point, these costs start to dominate the linearly increasing benefits of novelty (see Hypothesis 2). An inverted U-shaped relationship between articles’ novelty and impact is therefore predicted resulting in the following hypothesis:

Hypothesis 3. The relationship between an article’s novelty and its impact follows an inverted U-shape.

2.4 The moderating effect of uniqueness on novelty

Having discussed the anticipated effects of uniqueness and novelty on an article’s impact, we now turn towards the potential moderation effect of uniqueness on the relationship between novelty and impact. How important the use of language is when proposing new ideas is exemplified in the case of Isaac Newton, who wrote his revolutionary Principia in Latin. One reason was that his approaches would have sparked too much resistance when written in the English language (Hall 1980; Honig et al. 2014)—as English would have been far more difficult for the audience to understand. Thus, scientists are prone to use not only different languages but rather have to be aware how rhetorical nuances might affect legitimacy when writing about novel ideas. Similarly Uzzi et al. (2013) bring up the example of Darwin’s scientific manifest The Origin of Species, arguing that the combination of convenient domain-level thinking was critical for the link between innovativeness and impact. We argue that it is just this convenient domain-level thinking which is formalized in community specific jargon and thus gaining a community’s legitimacy.

These examples elucidate the important relationship between uniqueness and novelty. As proposed in Hypothesis 3 we expect novelty to have two opposing effects on impact, a linear positive one and a negative convex one (combined resulting in the proposed inverted U-shaped relationship). We thus will now elaborate on how uniqueness is likely to affect both of these effects.

Research into entrepreneurship has identified organizational needs to use narratives to allow them to contextualize novel and innovative content (Garud et al. 2014) and thus make their technology or invention meaningful and legitimate to others (Gurses and Ozcan 2015). Taking these considerations into account, it seems obvious for scientists to be aware of their article’s uniqueness in order to propose novel ideas. We argue that uniqueness, in form of an article’s unique alignment towards one scientific community in form of rhetoric and jargon, seems of vital importance to ‘sell’ novel ideas and thus to ensure the legitimacy of these novel ideas. This is further in line with Uzzi et al. (2013) who argue that conventional knowledge, here the unique alignment to one community, is critical to the link between novelty and impact. We thus argue that an article needs to have a high uniqueness to make a novel contribution legitimate and thus yielding higher impact. In technical terms, a higher uniqueness decreases the negative convex effect of novelty on impact, again due to higher uniqueness, less negative effects are prone due to novelty. This would result in the steepening of the curvilinear effect and thus a stronger mechanism of the inverted U-shaped relationship, followed by a higher effect of novelty on impact in its optimal point (Haans et al. 2016). Thus, we propose Hypothesis 4a:

Hypothesis 4a. An article’s higher level of uniqueness steepens the inverted U-shaped relationship between novelty and impact.

Even though increasing novelty’s legitimacy and impact, uniqueness is likely to affect the positive effect of novelty on impact in a negative manner. Again, we refer to Uzzi et al. (2013) who found that “the highest-impact science is primarily grounded in exceptionally conventional combinations of prior work yet simultaneously features an intrusion of unusual combinations” (p. 468). Thus, it is rather a nuance of novelty which is increasing impact. An alternative reasoning is coming from literature into the Not-Invented-Here (NIH) syndrome (Katz and Allen 1982; Antons and Piller 2015). NIH is defined as the tendency of a stable group (here a scientific community’s jargon) to reject new ideas from outsiders (the recombination with jargon from other communities). Following this argument, we assume that higher uniqueness levels and thus a clearer affiliation to a certain community lead to lower acceptance of novel ideas—lower values of novelty seem to be optimal when an article holds high uniqueness values. High uniqueness is weakening the positive effect of novelty on impact, resulting in a turning point shift of the inverted U-shaped relationship between novelty and impact to a lower optimal level of novelty (for more detailed technical elaboration of a turning point shift see Haans et al. 2016), resulting in the following hypothesis:

Hypothesis 4b. An article’s higher level of uniqueness leads to a turning point shift towards lower optimal values for novelty.

Figure 1 depicts our proposed theoretical model.

Fig. 1
figure 1

Know your audience

3 Data and method

In this section we discuss our data and the methods applied. First, we describe the process of obtaining and preparing data for the analysis. In the second sub-section, we give an overview of the deployed methods. The procedure in this analysis is based on two steps: first, we detect scientific communities, and second, we detect the content similarity between all articles in the sample and the prior detected communities. For the first step we use document co-citation analysis to define sub-research streams of entrepreneurship science, we then match articles not included in the clusters to these clusters based on natural language processing. Lastly, we refer to our regression analysis elucidating details about used variables and model specifications.

3.1 Data

To analyze entrepreneurship research, we use the Thomson Reuters Web of Science (WOS) to retrieve bibliometric data on corresponding publications. WOS is a prominent citation database, covering over 10,000 high impact journals and 120,000 international conference proceedings. In order to capture a broad selection of potentially relevant articles we used the search term ‘entrepre*’ (with * as wildcard).Footnote 1 The query was applied to paper titles, abstracts as well as keywords (both original keywords and keywords generated by WOS). The search was conducted in August 2014 including a timespan from 1945 to August 2014 resulting in 21,973 unique WOS records. Excluding all non-articles (such as book chapters or conference proceedings) resulted in 16,683 records; leaving out all non-English articles and articles with missing values left us with 14,028 documents.

3.2 Methods

Dividing entrepreneurship research into communities (e.g. clusters or sub-research fields) and studying these clusters’ development over time requires a combination of diverse methods. Hence, the following sections give a brief introduction to co-citation analysis and natural language processing. Furthermore, we explain how we linked these two methods to increase the quality of our results.

3.2.1 Delineating scientific communities in entrepreneurship science

In order to delineate scientific communities within the scientific field of entrepreneurship, and thus to be able to assess the uniqueness of each article, we perform a document co-citation analysis (DCA). Such an analysis is particularly relevant for this purpose, because it measures paper relatedness based on the frequency with which two documents are cited together by other documents (Cawkell 1976; Garfield et al. 1978; Small 1973), overcoming subjectivity due to its quantitative analysis of citations (Lampe and Hilgers 2015; Schildt et al. 2006). Due to their strong relatedness, these detected sub-research fields or clusters of a scientific domain might be equated to a scholarly community (Schildt et al. 2006). The cleaning of the data was conducted following Lampe and Hilgers (2015).Footnote 2

We first excluded all papers with less than four references to only include research articles, resulting in 14,657 papers. In a second step we only kept articles which received 15 or more citations to ascertain the analyzed citation behavior to be validated by specialists in this research domain (resulting in 3358 articles). Hence our findings built upon a wide range of expert opinions (scholars’ citations) and thus accepted principles. After building the DCA-network, we deleted isolates (i.e. articles not linked to any other articles) resulting in the final DCA dataset of 2117 articles with 62,511 co-citation links. These steps enable a robust citation analysis, minimizing the possible effect of noise (Lampe and Hilgers 2015).

Following earlier research, we adopt the Jaccard index (Jaccard 1901) as a normalized measure for the connectivity of co-cited articles (Small and Greenlee 1980). This index gives the ratio of the number of co-citations to the total citations of A and B less their common co-citations (Gmür 2003). The value of the Jaccard index (S) ranges from 0 (no co-citations) to 1 (representing perfect co-citation) and is defined as follows:

$$ S = \frac{number\; of\; common \;citations\; to\; articles \;A\; and\; B}{{(Total\; citatins\; to \;A + Total \;citation\;s to B{ - }Co{ - }citations \;of\;A\; and\; B)}} $$
(1)

When defining co-citation clusters and distinguish them from each other we opted for the straight-forward approach of removing weak links. We exclude all links with a Jaccard value lower than 0.2. The cut off value of 0.2 results from a comparison of various cut-off values and the resulting number of disconnected components in the network. We tried to find a value where the number of clusters would not change with a slight change of the threshold (Lampe and Hilgers 2015). Compared to previous research, this cut off value is quite small (Schildt et al. 2006), a necessity following the larger number of articles considered in this dataset. The issue of false positives showing up in the dataset due to the basic search query is mitigated by this step: papers that do not belong to the field of entrepreneurship are unlikely to have been highly co-cited by those papers that do belong to the field. Overall, we identified 35 different sub-research fields of entrepreneurship science (stated with description and metrics in Table 8).

3.2.2 Variables

The dependent variable, impact, is operationalized by citation counts (average per year) in Web of Science. This measure is commonly used by scholars when analyzing patents or publications (Lee et al. 2015; Martin and Irvine 1983; Moed 2005; Wang 2014). We further use the average yearly citation count to allow older article to be more cited.

Two variables, concerning our hypotheses are included: article uniqueness and novelty. In order to determine the uniqueness and novelty of articles, with respect to their affiliation to a cluster, we need to identify each article’s similarity to each cluster first. We therefore use a widely-accepted method in data analysis for weighting the importance of words in text collections, namely tf-idf (term frequency − inverse document frequency) (Robertson 2004).Footnote 3 Given a collection of texts in a corpus \((d \in D)\) the tf-idf weight of a word for one text can be calculated as the product of the frequency of that word in the current text \((f_{w,d} )\) and the inverse document frequency. The inverse document frequency is the logarithm of the number of texts in a corpus \(\left( {\left| D \right|} \right)\) divided by the number of texts containing the word to be weighted \((f_{w,d} )\):

$$ tf - idf = f_{w,D} *\log \left( {\frac{\left| D \right|}{{f_{w,D} }}} \right) $$
(2)

Defining the content similarity and thus the similarity/deviation of articles to each cluster (previously determined by DCA), available abstracts and titles for one cluster are added up to a new document. After appropriate pre-processing,Footnote 4 the term frequencies are calculated per cluster and over the collection of clusters. Subsequently, tf-idf scores are obtained and a list of tf-idf-weighted words is created for each article and cluster.Footnote 5 We used this information to obtain similarities between articles and clusters. We therefore transformed article abstracts into a vector representation allowing us to use the inverse cosine similarity measure to obtain distances between articles and clusters:

$$ distance = 1 - \cos (\theta ) = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} A_{i} B_{i} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} A_{i}^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} B_{i}^{2} } }} $$
(3)

As we are interested in the deviation between articles and their corresponding research sub-fields we only calculate the inverse similarity measures for instances where the article has been published after the beginning of a cluster (earliest publication date of articles defining a cluster). We also ignore articles for which no data (e.g. abstracts) are available. The resulting dataset consists of 9846 articles.

Our first focal independent variable, article’s uniqueness, is measured in terms of the deviation between the highest similarity measure (compared to communities/clusters) and the average of the other similarities between the focal article and communities. This measure allows to detect how unique an article is associated to a certain community. This measure therefore enables the measurement of the uniqueness of jargon and thus the focus on only one community to use jargon from.

Second, we measure an article’s novelty in terms of its inverse cosine similarity of the similarity distribution of an article and the detected scientific communities in entrepreneurship science (detected using DCA). We therefore compare each article’s distribution of community jargon (the similarity between an article and a community’s language) to all distributions of articles published in the same or previous years. Each article is therefore defined as its similarity to each of the 35 communities represented by a vector of length 35 for each article (using tf-idf as explained above). This approach allows to analyze which article is a relatively new recombination of different jargons and thus community languages (defined by DCA clusters). To allow for the emergence of new communities, a similarity of a cluster might only arise when the first publication of clusters’ defining articles is published in the same year as or before the focal one. Furthermore, restricting the comparison to previous articles allows our novelty measure to detect novelty with a changing definition over time. For instance, the same article’s recombination in 1995 and 2005 might be novel for the earlier one whereas the later one is not as novel.

On the one hand the resulting vector assumes relatively high values when the similarity is relatively low and thus an article is rather novel in its nature. On the other hand, relatively low values represent a high similarity between articles. In accordance to Uzzi et al. (2013) we then used the 1 percent and as a robustness test, the 10 percent quantile value of each vector in comparison to all similarity values as an indicator for novelty.

Furthermore, several control variables are incorporated into the model. Our first control variable is seminal inactivity. More precisely an article’s associated community’s stagnation. This variable expresses the distance (in years) between the publication of the focal article and the newest article of the community with the highest similarity. This measure allows to detect how long no seminal article (detected by DCA) emerged in the community, compared to the publication date of the focal article. Further control variables are the age of the paper (in years compared to 2015), the number of authors, the amount of included references in a paper and the number of pages. As proxy for an article’s quality we include 2237 journal dummies. Furthermore, we control for the corresponding sub-research field by including cluster dummies in accordance to the identified 35 clusters.Footnote 6 The descriptive statistics are shown in Table 1.

Table 1 Descriptive statistics and correlations

3.2.3 Analysis

Citations are discrete and typically have a broad distribution, with some articles receiving very high citation counts. Furthermore, our dependent variable cannot assume values smaller than 0 and corresponds to count data. The obvious approach would be a Poisson model (Hausman et al. 1984). However, the citation distribution is over-dispersed, thus many more highly cited articles occur than would be the case for Poisson-distributed data. Therefore, the relationship between citations and our independent variables might be estimated using negative binomial regression, a generalization of the Poisson model that accounts for over-dispersion in the data.Footnote 7

Negative binomial models, like Poisson models (Hausman et al. 1984), assume that the logarithm of the expected value of the dependent variable can be modeled by a linear combination of known predictors. In this sense, it is similar to estimating a regular linear regression with the logarithm of citations as the dependent variable. We therefore follow Foster et al.’s (2015) approach and assume the following:

$$ Citations_{a} \sim NegativeBinomial_{\mu } $$
(4)

where \(Citations_{a}\) depicts the average amount of yearly citations received by article a. We use ordinary least square regression to test for our two hypotheses, using the natural logarithm of the number of citations (plus one) relative to the article’s age as dependent variable. To challenge the robustness of our findings, and in order to correct for an excessive number of zeros in our data we also used zero-inflated negative binomial regression (Long 1997).

4 Results

In order to test our hypotheses, we first had to detect scientific communities in the domain of entrepreneurship research. To determine a quantitative categorization of entrepreneurship sub-research fields or communities, we conduct a document co-citation analysis akin to that of Schildt et al. (2006) to reveal the different clusters of these research areas. Given that we are interested exclusively in the most cited and coherent groups of articles, some of the highly-cited articles will be excluded from this analysis due to their lacking affiliation to a cluster. In total, we found 35 clusters. As expected, top clusters are represented mostly by papers published in the last two decades. Very recent papers may not be available in Web of Science or may not have been cited often enough for co-citation patterns to emerge. But it is surprising that older papers do not seem to be part of these clusters. Intuitively, papers that had more time to be cited and that are upstream in a field of research should receive many co-citations and therefore show up in clusters. A possible explanation is that this intuitive reasoning applies but is moderated by the small yearly publication numbers before 1990, which may in turn be influenced by data coverage of the Web of Science database.

Given the importance of scientific impact, Table 2 shows the results of the regression analysis including the effect of article characteristics on their citations per year (in natural logarithm).

Table 2 Regression results of ordinary least squares estimations on articles’ citation per year

Model 1 is the basic model, including all control variables. All of our control variables have a significant positive effect on article impact. The age of a paper, the number of authors, the number of pages as well as the number of references all affect an article’s impact positively. Seminal inactivity, the distance in years between an article and its associated clusters’ latest article, also has a positive effect on article impact. If a paper associates itself to a community where seminal articles are quite old its impact is likely to be higher than associating a paper to a community where the last seminal article is rather recent.

Our first hypothesis, that an article’s uniqueness has a positive effect on articles’ impact is tested in Model 2. The focal variable of Hypothesis 1, uniqueness is positive significant (β = 10.880; p < 0.001) and thus lends support for Hypothesis 1. An article’s uniqueness has a positive effect on its impact.

Hypothesis 2 predicted a positive linear effect of an article’s novelty on impact. The results, presented in model 3 in Table 2 support this hypothesis showing a significant positive effect (β = 3.274; p < 0.001). Hypothesis 3 predicted an inverted U-shaped relationship between novelty and impact. The results found support for this relationship (β = − 11.382; p < 0.001). To ensure the correct interpretation of our results we follow the three-step procedure to test an inverted U-shaped relationship, proposed by Lind and Mehlum (2010). First, as stated above, β of the squared term needs to be significant and of the expected sign. Furthermore, we test the joint significance of the direct and squared terms of novelty, following Sasabuchi’s (1980) test for an inverted U-shaped relationship for novelty.

Second, the slope must be sufficiently steep at both ends of the data range. Table 3 shows the directions of the slopes at low and high values of novelty. If the slope at the low value of novelty is positive and significant (β = 4.985; p < 0.001) and if the slope at the high value of novelty is negative significant (β = − 5.961; p < 0.001), then preliminary evidence of an inverted U-shape relationship is present.

Table 3 Test of an inverted U-shaped relationship between novelty and impact

Following Lind and Mehlum (2010), the third step to test for U-shaped relationships is to assess whether the turning point is located well within the data range. We therefore estimated the extreme point of the effect of novelty and calculated confidence intervals based on Fieller’s standard error (Lind and Mehlum 2010). In addition, the confidence intervals for the Fieller standard error indicate that the novelty values were within the limits of the data (0.251, 0.295). As shown in Table 3 the inverted U-shaped relationship is significant.

To round out the robustness of our findings, Fig. 1 shows the predicted U-shaped relationship between novelty and impact based on our estimates of Model 4 in Table 2. Overall these findings lend support for Hypothesis 3, an inverted U-shaped effect of novelty on articles’ impact.

Model 5 and 6 take the proposed moderation effect into account. Hypothesis 4a assumed that higher values of uniqueness steepen the inverted U-shaped relationship between novelty and impact. The interaction term between uniqueness and the squared novelty term (B) in model 6 is significant negative (β = − 235.263; p < 0.001), lending support for Hypothesis 4a. Figure 2 graphically displays the effect of novelty on impact with respect of low, medium and high values of uniqueness.Footnote 8 The observed steepening of the inverted U-shape as well as the upwards movement of the optimal point of novelty lend additional support for Hypothesis 4a. Hypothesis 4b assumed decreasing levels of the optimal novelty for increasing uniqueness values of an article, namely a turning point shift. The proposed effect is observable (highlighted graphically), lending support for Hypothesis 4b. Even though the interaction term with the linear term is significant, this is neither a necessary nor a sufficient condition (Haans et al. 2016) for the proposed effect of Hypothesis 4b. Thus, we use a formal test for a turning point shift proposed by Haans et al. (2016). The following equation states the full model specification, including all interaction:

$$ Y = \beta_{0} + \beta_{1} X + \beta_{2} X^{2} + \beta_{3} XZ + \beta_{4} X^{2} Z + \beta_{5} Z $$
(5)
Fig. 2
figure 2

Effect of novelty on impact (95% confidence intervals are displayed)

Haans et al. (2016) set the first derivative of the regression equation with respect to novelty to zero to derive the turning point of the inverted U-shaped effect of novelty on impact. The authors further take the derivative of this equation with respect to the moderator resulting in:

$$ \frac{{\delta X^{*} }}{\delta Z} = \frac{{\beta_{1} \beta_{4} - \beta_{2} \beta_{3} }}{{2(\beta_{2} + \beta_{4} Z)^{2} }} $$
(6)

Evident from the above equation is that a potential turning point shift does not only depend on the first order interaction but also on the second order interaction term (β) (Haans et al. 2016) supporting the use of the full model specification (Model 6). As suggested by Haans et al. (2016), we assess whether above equation as a whole is significantly different from zero, for specific meaningful values for the moderator. We deploy values of our moderator variable uniqueness (Z) ranging from min = 0.000 to max = 0.095 with 0.01 steps. For all of these values the equation is negative and significantly different from zero (p < 0.001) (Table 7). These results lend support for Hypothesis 4b, a moderated turning point shift of the optimal level of novelty with respect to uniqueness. As hypothesized, the moderation effects the turning point shift in a negative manner, thus decreasing the level of optimal novelty with increasing uniqueness of an article.

5 Robustness test

To challenge the robustness of our results we replicate the results adding journal dummies (2237 dummies) to account for the potential effects of being published in different journals. The journal an article is published in might not only affect its impact directly but further indirectly via a quality approval of an article being published in a highly accredited journal. Imitating the results from Table 2 and further adding journal dummies are presented in Table 4. The results mostly stay the same. Hypothesis 1 to 3 again are affirmed by significant effects with the proposed signs. Uniqueness (Model 2) positive significantly affects article’s impact (β = 8.003; p < 0.001) lending support for Hypothesis 1. Hypothesis 2 proposed a positive linear relationship between novelty and impact. The effect in Model 3 of Table 4 supports this hypothesis (β = 2.982; p < 0.001). Hypothesis 3, assumed an inverted U-shaped relationship between novelty and impact. The results are significant (β = 6.590; p < 0.001) further lending support for Hypothesis 3. A difference compared to the above results might be observed for the interaction effect of uniqueness and novelty in model 5 and 6. Whereas the single interaction effect in model 5 is not significant anymore, the double interaction effect in model 6 is slightly significant (β = 46.541; p < 0.05). The negative significant effect of uniqueness on the squared novelty term again lends support for Hypothesis 4a (β = − 127.786; p < 0.05), the steepening of the curvilinear relationship. As the significance of the interaction term is neither a necessary nor a sufficient condition (Haans et al. 2016) for the proposed turning point shift in H4b, we again test this Hypothesis following the formal test of Haans et al. (2016) (Eq. 6). Again, the results lend support for Hypothesis 4b.

Table 4 Regression results of ordinary least squares estimations on articles’ citation per year (including journal dummies)

To further test the robustness of our results we make use of the zero inflated negative binomial model (Tables 5 and 6). We therefore use the rounded publications per year as the dependent variable. The z-value of the Vuong test (Vuong 1989) is significant in all models and thus supports the model (Model 5: z = 10.41; p < 0.000). The results are similar to the ones using ordinary least squares in Table 2. Table 5 states the main part of the zero inflated negative binomial estimation and Table 6 states the zero-inflated part of the zero inflated negative binomial model. Similar to the results including journal dummies (Table 4) Hypothesis 1 (β = 22.473; p < 0.001), Hypothesis 2 (β = 6.612; p < 0.001) and Hypothesis 3 (β = 14.878; p < 0.001) are supported. The significance for the interaction effect between novelty and uniqueness is vanished in the zero inflated negative binomial model specification. For H4b this is neither a necessary nor a sufficient condition (Haans et al. 2016) again. Thus, we again conducted the formal test proposed by Haans et al. (2016). The results lend support for Hypothesis 4b (p < 0.009). In this nonlinear model specification, the significance of the interaction term between uniqueness and the squared novelty term (B) is also neither a necessary nor a sufficient condition (Haans et al. 2016) to test our Hypothesis 4a. Thus, we rely on our previous results (from Tables 3 and 4) to confirm H4a.

Table 5 Main part of the zero-inflated negative binomial estimations
Table 6 Zero-inflated part of the zero-inflated negative binomial estimations

6 Discussion, limitations and future research

In this chapter we discuss the study’s results and their contribution to existing literature. Moreover, we address the limitations of this study and provide suggestions for future research. Overall our findings show that scientists should be aware of their language complexity in form of community specific jargon when writing scientific articles.

Our first result (H1) shows that uniqueness in terms of adopting one scientific communities’ language—and focusing on this jargon with respect to other jargon—is shown to have a positive effect on articles’ impact. This finding is manifested in science by statements as Know Your Audience or evident by our starting quote, suggesting to speak your audience’s or community’s language to increase impact. For authors of scientific articles in the field of entrepreneurship research this further implies that aligning and thus focusing on just one audience in terms of unique jargon is likely to increase impact. This effect is rooted in articles becoming legitimate when talking in one community language uniquely. This is in line with research into legitimacy (Garud et al. 2014; Gurses and Ozcan 2015; Navis and Glynn 2011) and further adds to this literature in the form that epistemic cultures, scientific communities, also underlie the need for legitimacy due to rhetorical strategies as well. For researchers the recommendation is clear: focus uniquely on one community language to increase impact. Thus, authors seeking higher levels of uniqueness in terms of the alignment to one community should increase the use of highly definitional term for a sub-research field’s topic. For example, an author wishing to uniquely align to the sub-research field of entrepreneurship in family firms (Cluster 10 in Table 8) should use terms associated with the clusters topic as “(non)family”, “owner(ship)”, “culture” or “altruism”.Footnote 9 Again, uniqueness is achieved when an article is more aligned to one community/cluster compared to others. Also, in order to link an article to the cluster of entrepreneurship and family firms it may be advisable to avoid terms highly associated with other sub-research fields such as “university”, “spinoff”, “transfer”, “technology”, “academic” or “licensing” (associated with the sub-research field university-industry relations and entrepreneurship—Cluster 2 in Table 8). Again, the unique affiliation to just one sub-research field should be aimed for.

We also show that recombining different scientific communities’ jargon in a novel way, affects articles’ impact in form of an inverted U-shape (H2 and H3). This finding helps to overcome mixed results from previous research on novelty and its effects (Boudreau et al. 2016; Lee et al. 2015; Trapido 2015; Uzzi et al. 2013). On the one hand, for low novelty levels, an increase in an article’s novelty (from low to medium) has a positive effect on impact (Lee et al. 2015; Newman and Cooper 1993; Schoenmakers and Duysters 2010). On the other hand, for high levels of novelty, an increase of novelty has a negative effect on impact (from medium to high) (Boudreau et al. 2016; Mueller et al. 2012). Hence, the level of novelty defines the degree of the effect. Work which is ‘too’ novel might be too distant to the audience’s knowledge and thus does not lead to high impact. A possible explanation for this relation is the view that science is required to advance gradually in order to re-test hypothesis before they can be accepted. Authors should be aware of that, when writing their articles in order to secure impact. As a result, research that does not follow established principles, and thus is assumed to be very novel, may not get a high level of impact (Uzzi et al. 2013; Antons and Piller 2015). It would be of further interest if this finding holds for research in more general management communities (e.g. Academy of Management Journal, Administrative Science Quarterly, etc.) or other disciplines, in particular those which are not characterized by a large degree of heterogeneity. It may also be interesting to compare disciplines or sub-communities with different degrees of maturity to test whether new communities differ in their acceptance of novelty relative to well established clusters.

Additionally, our article presents results taking two different moderating effects of an article’s uniqueness on the relationship between novelty and impact into account. First, we show that higher levels of an article’s uniqueness steepen the inverted U-shaped relationship between novelty and impact (H4a). Thus, an article’s higher level of uniqueness increases the maximum effect of novelty on impact. Second, our results further show that a higher level of uniqueness shifts the turning point of the inverted U-shaped relationship between novelty and impact to lower values of novelty (H4b). Article’s that are more unique have a lower optimal level of novelty to maximize impact. Taking these effects together (graphically displayed in Fig. 3) shows that on the one hand higher levels of uniqueness can increase the overall effect of novelty. Higher uniqueness levels allow novelty to have higher effects on impact, but only to a certain inflection point. It is this combined effect, the steepening of the inverted U-shaped relationship and the turning point shift to lower values of novelty that shows that the advantages of uniqueness, at some point transform to severe downturns. After reaching the inflection point, higher levels of uniqueness yield lower impact for novelty. Articles with high levels of uniqueness are punished for higher novelty compared to articles with less uniqueness. Being aware of this tradeoff is important to understand how much and thus how uniquely to align to a sub-research field before being punished for novelty. This finding further advances research into novelty (Boudreau et al. 2016; Uzzi et al. 2013), validating the important moderating role of uniqueness. It is especially above trade-off which has to be taken into consideration when analyzing the effects of novelty.

Fig. 3
figure 3

Interaction effect of uniqueness and novelty on article’s impact

Thus, scientists should reflect on their level of uniqueness and adjust it to their article’s novelty level. An article’s higher level of uniqueness moves the optimum (the turning point) of novelty to lower levels. Thus, authors aligning uniquely to a certain community should use less novel recombinations of community specific jargon to maximize impact. It further seems that aligning to a certain community hinders higher levels of novelty to increase impact again slowing the advancement of novel scientific findings. Authors should therefore be aware when uniquely aligning to one community in their jargon, leads to less novel levels of language recombination to maximize impact.

Our results further enable insights for the evolution of scientific disciplines, here entrepreneurship science. Several authors argued that entrepreneurship is a mature research field by showing its convergence of different sub-research fields (Busenitz et al. 2014; Grégoire et al. 2006). However, it seems that equally combining language form different communities of the field of entrepreneurship (low values of uniqueness) is punished with less impact. Contrary uniqueness increases impact. Furthermore, the novel recombination of jargon can be understood as a measure for a scientific field’s convergence (Busenitz et al. 2014; Grégoire et al. 2006). Convergence implies that when a research field matures, it is more and more characterized by a set of codified theories, models, and measures. Again, we want to point out that even referring to jargon, this language-culture expresses different theories, norms, measures and constructs (Kuhn 1996). Thus, more novel recombination of jargon/theory from different communities, combines and moves these communities’ theories and norms closer together enabling convergence. Our results therefore show how citation behavior might affect convergence. In particular, we show that high levels of novelty and thus convergence of scientific communities in entrepreneurship research are punished with less impact resulting in a slower pace of convergence for the scientific domain of entrepreneurship. Furthermore, does the trade-off, authors face (the interplay between uniqueness and novelty) lead to higher uniqueness and less novel papers.

However, being aware of these effects might already affect the way reviewers and readers of articles might be rejected or drawn to certain articles. The danger lies in the rejection of articles not uniquely focusing on the readers prioritized jargon—likely to affect reviewers and potential citing readers.

We further think that our findings are not only relevant for the research domain of entrepreneurship. Several scientific domains consist of very heterogeneous communities (e.g. strategic management) which are likely to underlie the here proposed rationales as well. It would therefore be of interest to analyze the effects shown here in different scientific disciplines not only related to management research but further natural sciences, sociology, etc.

Distinct potential limitations need to be highlighted: first, we use the widely accepted approach of document co-citation analysis to quantitatively define scientific communities. This approach is based on citation measurement and thus tends to focus on older articles, contrary to recent publications not having that many citations, yet. Furthermore, would it be of interest to see if our results hold in other research fields or even in the entirety of science. Another limitation is the restriction to abstracts of articles for the identification of linguistic characteristics. Even though being in line with previous research (Kaplan and Vakili 2015) using the full texts might give deeper insights.

Building up on Kuhn’s (1996) argument, that different language-culture communities are like different theories, the used methodological approaches allow to measure the relatedness of theories and norms. Especially the combination of bibliometric approaches, to detect communities, and natural language processing, to match articles to these communities, enables several more research directories in different scientific domains. These potentials therefore do not only exist in the domain of scientific publications but further when analyzing knowledge spillovers for instance between organizations. Previous texts of organizations (publications, patents, external statements) allow to detect an organization’s language or jargon. This further could potentially affect the ability to absorb foreign knowledge. Not only in the dimension of distant technologies (Jung and Lee 2016) but the dimensions of the similarity between the language is likely to have an effect on the ability to adopt foreign knowledge. This could therefore contribute to the scientific construct of absorptive capacity (Cohen and Levinthal 1990) analyzing not only the amount but further the content and the associated content distance of this knowledge.

Individuals might also be an interesting unit of analysis: for instance, detecting the language used by individuals in twitter tweets compared to e.g. mission statements of the organization they work in. Furthermore, comparison between knowledge of different contexts is possible using the methods used in this article: for example, the similarity between scientific knowledge of individuals (e.g. scientific paper publications) compared to hiring companies applicable knowledge base in form of patents. To conclude, several new research directories emerge via the methodological approaches presented here.

Future research could also dive more deeply into category spanning research when taking this study’s findings into account. Research into category spanning has identified that spanning categories leads to less market acceptance (Hsu et al. 2009; Kovács and Johnson 2014). This could be an alternative explanation of our findings: regarding scientific communities in the field of entrepreneurship as different categories an article is aligning to (uniqueness) or combining (novelty) to gain market acceptance and thus impact. Especially the interaction of these two effects could be of interest for research into category spanning, namely answering the research question if categories should be spanned when novel ideas or inventions are in consideration.

Another promising avenue for future research could be potential learning effects of authors. When being rejected several times, one is likely to learn that aligning to a certain community increases an article’s legitimacy and impact and thus is likely to also affect arguments coming up in the review process. Thus, the question arises if authors are aware of these effects. One could assume that more experienced authors probably are aware of this and that less experienced authors will learn, e.g. via review processes, to align uniquely to communities in terms of jargon to increase legitimacy and thus impact (or getting accepted in the journal review process).

Finally, drawing attention towards the effects of language in scientific publications enables authors to reflect on their citation behavior, enabling them to identify potentially useful input to their research despite a lack of proximity. Thus, by showing that citing authors are influenced by language cultures, this implicit bias might better be understood and thus counteracted by author’s when citing other research.

7 Conclusion

Our study aims to advance our understanding of how language complexity in scientific articles affects impact. In the context of entrepreneurship science, which represents an ideal context due to its heterogeneous, cross-disciplinary nature, we conduct a more fine-grained analysis of the effects of different community specific jargons. We disentangle language complexity in two forms: we detect an article’s unique alignment to one community’s jargon (article’s uniqueness) and its novel recombination of community specific jargon (article’s novelty). Our research suggests that articles’ uniqueness affects their impact in a positive manner. We show that novelty of articles affects their impact in form of an inverted U-shape relationship. This inverted U-shape relationship between novelty and impact is moderated by the article’s uniqueness in two ways. First, higher levels of uniqueness steepen the curve of the U-shaped relations between novelty and impact. The second effect shifts the optimal level of novelty for higher values of uniqueness. Combining these moderating relationships shows the trade-off that more uniqueness might increase the positive effect of novelty for lower levels of novelty, but after an inflection point higher novelty levels are punished in form of lower impact, contrary to holding lower levels of uniqueness. These findings not only have implications for authors and reviewer but also increase our understanding on the evolution of scientific fields and their convergence.