Introduction

Ever since the dawn of the Information Age (Castells 1999), data are collected and spread rapidly online. This is especially true for social media, such as Facebook or Twitter, where information gets distributed swiftly. Online media becomes more and more relevant to our daily life, as a 2016 report from the Pew Research Center shows: 66% of Facebook and 59% of Twitter users (47 and 52% in 2013, respectively) are getting news from their social networking site. Because more or less every researcher nowadays is searching for scientific information on the internet, standard databases like PubMed, but also online media have gained enormous impact on dissemination of scientific work (Brossard and Scheufele 2013). This development opens up a new approach of measuring scientific influence and puts traditional measures of scholarly success into question.

The measurement of outstanding achievement in Science has a long tradition. More than 90 years ago, Lotka (1926) published his famous scientometric formula, known as Lotka’s Inverse Square Law of Scientific Productivity. Based on the investigation of the name indexes of standard reference tools at the time (Chemical Abstracts, and Auerbach’s Geschichtstafeln der Physik) he proposed that the relationship between that the number of scientists making at least one contribution (x) and the frequency of their contributions (y) is constant: xny = const., with N = 2 (therefore the name square law). Specifically, in any set of authors, about 60% make one single contribution. In addition, if 100 authors were contributing one paper each, 25 would be contributing two papers each (1/22, i.e., 25%), 11 would be contributing three papers each (1/23, i.e., 11.1%), the number of authors contributing four papers each would be about 6 (1/24, i.e., 6%), and so on. Thus, the number of researchers making n contributions is about 1/2n. Lotka’s law was an approximation for the data he had at hand in 1926. It is still more off in describing publication productivity in more recent years (Coile 1977). It thus is in need of adjustment (Nath and Jackson 1991), but it still informs thinking about the measurement of scientific productivity. For instance, it implies that it is a small percentage of researchers who are responsible for the lion’s share of the work. Dennis (1954) surveyed about 80 years of research in psychology and accordingly found that the top 10% produced about 50% of the publications, and the less productive half contributed 15% or less.

Another classic scientometric rule exists between the quantity and the quality of research output, such that researchers who are most productive also are, on average, most creative (the constant-probability-of-success model; Simonton 1988a, b). This implies that the number of citation a researcher receives is a positive function of his or her total number of publications (Rushton 1984). Interestingly, total productivity is also closely related to number of citations of the three best publications (Cole and Cole 1973).

Rushton (1984) was positive about citation counts, and indeed the now ubiquitous Science Citation Index, created in the 1960s (Garfield 1964) became a central measure of scholarly work in academics (Smith and Fiedler 1970). Thompson Reuter’s yearly Journal Citation Report is one of the largest reports on research influence on the journal and category level, using citations. Measures of scientific publishing are still being developed (e.g., the TRank measure by Zhang et al. 2017), and the influence of indexes like the Social Science Citation Index (S)SCI is still growing, but the growth rate of publication using new channels, like conference proceedings, open archives, blogs, and home pages exceeds that of the traditional channels. This declining coverage in SCI and especially in SSCI is problematic (Larsen and von Ins 2010). Thus, in recent years additional indices have been appearing. These measure the immediate rather than long run impact of scholarly work, not only in academia, but also in popular media, thus tapping a different source of information for evaluating scientific impact. The question is whether these alternative metrics—as implied by the name—really are alternative to traditional metrics of scholarly impact.

Alternative metrics to the traditional scientific metrics, measuring the impact of research on the web, are called Altmetrics, following a proposal by Priem and Hemminger (2010). Altmetrics (similar to Webometric; Almind and Ingwersen 1997; Thelwall et al. 2005) collect bibliometric, scientometric, and informetric data on the World Wide Web. Thus they provide access to various types of information pertaining to scholarly publications, most notably on coverage, density, and intensity. We use these terms as defined by Haustein et al. (2015; p. 5): “Coverage is defined as the percentage of papers with at least one social media event or citation. Density is the average number of social media counts or citations per paper (i.e., considering all publications included in the study), while intensity indicates the average number of social media or citation counts for all documents with at least one event (non-zero counts).” These three measures provide different perspectives on the literature. Coverage indicates the chances for being included in the social media market, presumably being influenced by fads and fashions in science. Density and intensity do provide partly overlapping perspectives as they are correlated, since intensity is measured for the subset of documents with at least one count. We expect the distribution of density to be concentrated at the level of zero, since most documents fail in getting any attention in the social media. Less can be said about the expected distribution of intensity: it depends on how the score is calculated (see below). Note, however, that the notions of coverage, density, and intensity are not used consistently in the literature. The ambiguous use of these measures is problematic and may be one of the reasons why some authors warn of altmetrics as a dangerous idea, especially if it is used for measuring the quality of research or a researcher (e.g., Colquhoun and Plested 2014; Gumpenberger et al. 2016).

Coverage of documents varies with source, and Twitter is the platform providing the highest coverage (Thelwall et al. 2013). Twitter fares well compared to other social web services especially when it comes to science topics. Eysenbach (2011) even went as far as proposing a “twimpact factor” to measure research uptake on Twitter. Various disciplines such as astrophysics (Haustein et al. 2014a) or biomedicine (Haustein et al. 2014b), as well as journals such as PLoS ONE (de Winter 2015) have already been analyzed on their presence on Twitter. Interestingly, correlations between tweets and citations generally were found to be low (Patthi, et al. 2017), implying a difference between impact metrics based on tweets and those based on citations. Indeed, most research suggests that there is little (or moderate at best) relationship between citations and altmetrics for Twitter, as well as for other platforms such as Mendeley (Zahedi et al. 2014; Bar-Ilan et al. 2012), and over all platforms (Costas et al. 2015). To the best of our knowledge, no comparable research exists on the relationship between traditional and altmetrics for the psychological literature. This is reported here. We try to provide a comprehensive picture of the coverage, density, and intensity of psychological research in altmetrics from all over the web, rather than focusing on a single online media platform. This can be done by using something called the Altmetric Score. This score accumulates hits over all altmetrics data types and it can be gathered from Altmetric.com. Altmetric.com is a website that is dedicated to collecting all kinds of altmetrics data for calculating a sum score, the Altmetric Score (AS). The AS expresses the weighted amount of traffic that some publication or research generates on the web. It uses three main factors. (i) volume—measured by the number of people mentioning a paper; (ii) source where the piece is mentioned, with sources weighted differently; and (iii) authors—a count of who mentions something to whom.

The AS has limitations, most notably with respect to transparency, standardization, and consistency (Gumpenberger et al. 2016). However, it is the best measure available to tap various different sorts of activities in the social media. We collected the AS and compared it to traditional scores of scientific impact, to investigate the relationships between scientific fame and popular fame. Haustein et al. (2015) did something similar, but they concentrated on a single year (2012), a broad categorization of fields, and on document types. We concentrate on psychological research, as indexed by publications in the period from 2010 to 2012. Thus, all papers related to Psychology published between 2010 and 2012, identified by a unique digital identifier, constitute our sample. For this sample of papers we extract metrics on four levels: field, journal, article, and source. We will provide analyses of these four different levels of aggregation.

  1. (i)

    Field analysis. This investigates the AS and citation scores for various fields and subfields of Psychology. We hope to identify fields and subfields that are especially popular in the online media. We expect that (sub)fields differ with respect to popularity, and that citation popularity is relatively unrelated to popularity as measured with altmetrics.

  2. (ii)

    Journal analysis. This analysis identifies journals that have the most impact in online media and investigates the correlation between traditional metrics and altmetrics at the level of journals. We expect to replicate that this correlation generally is low, around r = .20.

  3. (iii)

    Article analysis. This analysis measures the relationship between the AS and article impact metrics for individual articles. In a focused analysis we identify the ten highest scoring articles.

  4. (iv)

    Source analysis. This analysis identifies the online sources that are the most receptive for psychological articles and investigates the relationship between the AS and citation counts for each source. In line with Thelwall et al.(2013) we expect Twitter to be the most important source.

Method

Measures

To provide for a broad picture of visibility in terms of altmetrics of the psychological literature, we extracted the Altmetric Score (AS). The AS is calculated by Altmetric.com, a company that specializes in tracking and quantifying the coverage, density, and intensity of content in different alternative sources. It includes a number of different sources, with sources weighted by the likelihood of online sharing. Thus, the weighting reflects the source’s potential impact on the online society. Specifically, news (number of times a paper appears in a news outlet online, such as ZEIT Online or Forbes) gets the highest weight (w = 8.00), followed by blogs (frequency of appearance in a blog; w = 5.00), Twitter (w = 1.00), Google+ (w = 1.00), and Facebook (w = .25). All other sources (e.g., Wikipedia, Reddit, LinkedIn) are merged into one variable named Other, with weightings between .25 and 3.00 (see https://help.altmetric.com/support/solutions/articles/6000060969-how-is-the-altmetric-score-calculated-. For getting an impression the reader can download an app from Altmetric.com called Altmetric it! to get the AS for individual papers and learn about its sources). Although the AS has some qualitative components to it, it is not a measure for the excellence of a researcher’s work, but only indicates a papers’ online attention.

In addition to the AS, we calculate a new measure, called the Score Factor (SF), to compare journals with regard to their influence in alternative online media. The basic problem is that the AS is incomplete as an index of a journal’s alternative impact, since it contains only the papers that have an AS greater than zero, i.e., those that are covered in some online source. However, the problem is that most papers do not make it into any alternative metric, and thus their AS = 0. In contrast, citation coverage is much higher than coverage in any of the social media metrics. For instance, Haustein et al. (2015) report an average citation rate of 3.17, but an average Twitter coverage of only .78, although Twitter has by far the most coverage in social media. Our SF takes this into account by using two different scores acquired from altmetrics: The percentage of all papers which have been scored (AS > 0) for a certain journal (\(P_{\text{Scored}}^{\% }\)); and the mean AS for those papers which have been scored (\(M_{\text{Scored}}^{{\text{AS}}}\)). That is, SF is an altmetric score, weighing density by coverage.

$$\text{SF} = P_{\text{Scored}}^{\% } \times M_{\text{Scored}}^{{\text{AS}}}$$

For a journal to achieve a high SF, a high AS score has to be paired with frequent coverage in the online media.

Data

Data were acquired from the Web of Science (WoS) in June 2016. Eligible papers were articles pertaining to the discipline of Psychology, published between 2010 and 2012. This search resulted in 245,630 single papers. We used the Digital Object Identifier (DOI), or the PubMed-ID for identifying papers, since a DOI (or another unique identifier) is needed for the retrieval of bibliometric information from Altmetric.com. Identifiers were available for 239,910 papers. Papers were matched to fields by using an open-access classification tool acquired from Science-Metrix.com. This tool is based on a hierarchical, three-level classification tree and assigns journals to mutually exclusive categories (Archambault et al. 2011). The highest level in this classification is domain, including, for instance, Applied Sciences, Arts and Humanities, or Economic and Social Sciences. We did not use this level, as we include only papers from psychology. We did, however, use the next two levels, field, and subfield. Classification of papers into fields and subfields was possible for 213,738 papers. Journal-level analysis was done only for journals with a Journal Impact Factor. Journal Impact Factors were taken from the Thomson Reuter’s 2014 Journal Citation Report. The 2014 report helped to deal with the problem of citation lag, since a journal’s impact factor is calculated by the number of citations in the two years to follow the publication year. Journal impact factors were available for 202,432 papers. Finally, we did some data-cleaning by excluding journals that did not reach a minimal count of 20 among the 202,432 papers (i.e., < .01%). The article-level analysis included only papers with AS > 0. These were 57,087 papers, representing a coverage of 28%. Figure 1 displays the selection and classification procedure.

Fig. 1
figure 1

Data selection tree

Results

Fields

The 202,432 papers were classified into 21 different fields, containing 125 different subfields. Table 1 displays the 21 fields, and 17 selected subfields. Subfields were selected if they either (i) contained at least 3000 papers, or (ii) were classified into the field Psychology and Cognitive Sciences. Note that many papers from Clinical Psychology were classified as Clinical Medicine, rendering this by far the most voluminous field (81,762 papers, i.e., 40.4%), considerably larger than Psychology and Cognitive Sciences (46,189 papers, i.e., 22.8%), which was the second ranked field in terms of the number of papers published.

Table 1 Altmetric Scores and citation frequencies for all fields and selected subfields

Since both citations and AS were positively skewed, a log transformation was applied on the data before doing the analyses.Footnote 1 First, we found a strong positive correlation (rS = .503, p = .020) between mean AS and mean citation frequency of all scored articles for the 21 fields. A similar result was found for the subfields (N = 125; rS = .417, p < .001). The average correlation over all fields (see Table 1, excluding Built Environment and Design, since this field had only two articles scored) was rlog/log = .294, with 16 out of 21 correlations being significant at least at p < .05.

In terms of productivity, Clinical Medicine is in the lead, with nearly half of all published papers pertaining to this field. Psychology and Cognitive Sciences is also a very productive field, as is Public Health and Health Services. Note however, that papers were only included if they were related to the discipline of Psychology.Footnote 2 The highest scoring field in terms of the AS (see column AS in Table 1) was General Science and Technology (M = 20.9, SD = 65.1), with 3811 out of 5394 (70.7%) articles being scored (see column \(P_{\text{Scored}}^{\% }\)). Psychology and Cognitive Sciences is in the middle of the pack. Among the 21 fields it covers rank 6 in percentage of articles scored, 15 in AS, 14 in citations, and 12 in correlation.

Mathematics and Statistics stands out with the highest number of citations per paper scored (M = 34.7, SD = 128.4), and with the lowest AS (M = 2.4, SD = 2.6). Although Haustein et al. (2015) used a different classification and separate scores for different alternative media, the findings do closely match: some topics and fields enjoy greater popularity in the social media, presumably because they represent the “softer” sciences and are easier to understand by the lay audience. Formal content does not lend itself to easy online sharing. In addition, general topics may be particularly interesting for being shared online.

Journals

The 202,432 papers were published in 3644 different journals of which 1838 met the threshold of at least 20 papers. Since the score here is a summary score over journals, coverage is high: most of the 1838 journals have been scored at least once (1591, i.e., 86.6%). PloS ONE scored highest, accumulating an AS of 53,597 for 3361 out of 4615 articles (M = 15.9, SD = 56.3). Note the high standard deviation, indicating the long tail that is typical for this type of data. The highest percentage of articles per journal scored was achieved by Cell, with 32 out of 34 articles scored (94.1%). Science had the highest AS (AS = 65.2, SD = 107.6) per article scored. Figure 2 depicts the relationship between percentage of papers scored per journal and AS per article scored.

Fig. 2
figure 2

Scatterplot of all scored journals (N = 1591). Percentage of articles scored by mean AS is plotted. Impact factor of journal is indicated by size of the points. The three journals with the highest Score Factor (see text) are labeled

Figure 2 shows that, even at the level of journals, the data are heavily skewed, since most journals have a small mean AS, often near zero. Indeed, most journals have an AS fairly below 10, and less than half of their articles are scoring in AS. Some journals are outstanding: for instance, PloS Medicine, and Science have more than 80% of their articles scored, Science with a mean AS > 60, PloS Medicine with mean AS > 40. Nature has about 50% of their papers scored, with AS > 60. Note, however, that a big impact factor does not automatically guarantee a high AS, since high impact journals exist in all four quadrants of Fig. 2.

Spearman correlations between alternative metrics and journal impact factor are shown in Table 2. Also reported is the Scoring Factors (SF), which is the weighted AS (coverage × density). AS, \(P_{\text{Scored}}^{\% }\), and IF are correlated in a similar size (r \(\approx\) .40), indicating that they tap, to a degree, similar information. Interestingly, this correlation is about double the correlation reported in Haustein et al. (2014a, b) for the relationship between Twitter metrics and citations, indicating that the SF is a better predictor of citation impact than tweets. Mendeley seems to be an even better predictor than the SF, correlating around .5 (Zahedi et al. 2014). It is important to bear in mind that these correlations are measured at the level of journals, not of individual papers. That is, among the journals scoring at all, if a journal has a high mean AS, or if it has a high percentage of papers scoring, this journal also tends to have a high impact factor. The high correlation of SF with AS and \(P_{\text{Scored}}^{\% }\) is a consequence of the fact that SF is a compound of AS and \(P_{\text{Scored}}^{\% }\). SF does not seem to be considerably better an indicator of IF than \(M_{\text{Scored}}^{\text{AS}}\) or \(P_{\text{Scored}}^{{\text{\% }}}\) alone.

Table 2 Spearman correlations between alternative and classic journal metrics

Table 3 presents a hitlist: the 20 journals with the highest SF. Obviously, some of these journals are not mainstream Psychology journals. They are included in the list because they published papers that were related to psychology, however. The correlation between the SF and the IF among these 20 journals is .522 (p = .020).

Table 3 Descriptives and Impact Factor of the 20 journals with highest Score Factor

Inspection of the highest ranked journals by SF, which indexes weighted social media coverage, shows that journals related to psychology are quite frequent with 5 journals among the top 20. Medical journals are also frequent (6), and we find general journals and journals related to biology and neurosciences. Many other fields are completely missing, however. Note, however, that papers were included only if they were related to the discipline of Psychology, to begin with. Given this, it is somewhat surprising that the dominance of journals containing the word “psychology” in their title is not more pronounced.

Articles

Most papers did not score at all in altmetrics. Indeed, only 57,087 of the 202,432 (28.2%) papers from WoS were mentioned in the online media at least once (AS > 0). A clear trend towards more attention from online media in recent years was noticeable: of the papers published in 2010 only 16.4% were mentioned in the online media at least once. This percentage was 26.1% in 2011, and was 41.1% in 2012 (see Table 4). Clearly, online media are becoming increasingly important as vehicles for disseminating scientific information in psychology.

Table 4 Correlations and descriptives for the Altmetric Score and citations by year and total

At the article level, correlations between citations and the AS are not impressive, with a maximum of (log transformed) rlog/log = .310 in 2011. As expected, papers published earlier also had higher citations, with papers published in 2010 gaining almost thrice the citations of papers published in 2012. Indeed, publication year accounted for about 8% of the total variance in citations (F(2, 57,084) = 2476, p < .001, R2 = .080). Interestingly, the AS was not nearly as influenced by publication year (F(2, 54,784) = 132.3, p < .001, R2 = .005). This mirrors one of the basic differences between alternative and traditional metrics: alternative metrics are relative immediate and short-living, while most traditional metrics are delayed and cumulative. This different temporal dynamics puts a natural limit on the size of the correlation.

Average AS per paper scored was 8.4 (2.4 including articles never mentioned). Note, however, the highly skewed distribution of the AS: 39.8% of the papers mentioned online achieved a score of only 1, with 79.3% of articles scoring below the mean, while the 10 highest scoring articles (.02%, see Fig. 3 and Table 5) make up 2.0% of the total AS.

Fig. 3
figure 3

Number of citations by AS, with regression line for individual papers. Axes are log-transformed. Papers with the 10 highest altmetric scores are numbered (see Table 5)

Table 5 Rank, Altmetric Score, and publication information for 10 highest scoring articles

For the 10 highest scoring articles, contrary to the overall pattern, no significant relationship between citations and the AS was found (rlog/log = − .303, p = .395). Only one of the ten highest scoring articles for AS (# 7) is also found in the ten highest scoring articles for citations (ranked 2 out of all 57,087 articles; the second highest in citations in the top ten for AS (# 6) is ranked only 221 in citations).

Sources

The coverage for the various sources tracked by Altmetric.com varies vastly. An overview showing the results for the most important sources is given in Table 6. As can be seen, Twitter is by far the largest platform, covering 80% of all papers mentioned online. As such, in Twitter the correlation between AS and citations (rlog/log = .096, p < .001) is below the total correlation (rlog/log = .196, p < .001). Overall, Blogs are the best indicator for citations, showing a medium correlation (rlog/log = .258, p < .001). Using Mendeley as a criteria instead of citations seems to yield even stronger results, as it averages a medium correlation with the total of all altmetric sources (rlog/log = .272, p < .001)

Table 6 Descriptives and correlations for sources from Altmetric.com

Discussion

This paper evaluates the relationship between traditional metrics and emerging alternative metrics. The source is published papers that are related to Psychology, published between 2010 and 2012. We extracted number of citations, and altmetric score (altmetrics.com) by June 2016, calculated various metrics, and evaluated their relationship. Out of a sample of nearly 250,000 papers about 240.000 were identified by a DOI. Of these, about 210,000 could be automatically allocated to journals with a discipline classification, and about 200,000 papers could be allocated to a journal with an impact factor. Among those, about 57,000 papers had an AS > 0. Note that all these papers have something to do with Psychology, as identified by the field “research area = Psychology” in Web of Science.

The main finding is that the relationship between traditional metrics and the AS, which measures the coverage of papers, journals, and disciplines in various alternative metrics depends on the level of analysis. An analysis in terms of different research fields (e.g. Biology, Economics and Business, Psychology and Cognitive Sciences) shows strong overlap: the correlation between citation counts and AS for 21 research fields was r = .503. This is impressive, showing that, in terms of entire research fields, traditional and alternative metrics measure similar things. At the level of 125 subfields the relationship was also strong (r = .417). However, there was considerable variability between fields and subfields, with correlations varying between r = .106 (Human Factors) and r = .467 (Communication and Textual Studies). However, the more fine-grained the level of analysis, the smaller the correlation: at the level of individual papers the correlation was only r = .302. This is partly due to the fact that, with aggregation, error variance gets cancelled out. However, it also indicates that alternative metrics are an additional, and largely independent source of information at the level of individual papers. Highly cited papers may easily fail in the short-lived online world, and online star papers may fail in attracting citations. At the level of subfields and fields, or even disciplines (not investigated here), alternative metrics appear to offer less unique insights. Nevertheless, in terms of variance explained, it pays to consider these metrics even at these levels, since a correlation of r = .50 still explains only 25% of the variance.

To assess the importance of altmetrics on the journal level, we proposed a new metric: the Score Factor. The SF measures each journal’s presence in the online media by combining coverage (whether a piece is mentioned at all in the social media) with density (how often a covered piece is mentioned). This makes sense, since most papers do not make it into an altmetric score. In addition, coverage in the online media is not restricted to the scientific community, although, in reality, tweets to scientific papers tend to come from educated individuals (with an over-representation of social and computer scientists, and underrepresentation of mathematical, physical, and life scientists; Ke et al. 2017). In general, the SF offers information on a journal’s importance in the online media different to the traditional Impact Factor. This is important since the journal impact factor might not be as central and harmless as it seems (Seglen 1997; Colquhoun 2003; Bollen et al. 2009). As the Editor-in-Chief of Science Bruce Alberts (2013) phrased it in an editorial on impact factor distortions: “The misuse of the journal impact factor is highly destructive, inviting a gaming of the metric that can bias journals against publishing important papers in fields (such as social sciences and ecology) that are much less cited than others (such as biomedicine). And it wastes the time of scientists by overloading highly cited journals such as Science with inappropriate submissions from researchers who are desperate to gain points from their evaluators” (p. 787).

We found a considerable correlation between SF and IF of about r = .4. Note, however, that correlations between the AS and citation frequency for articles which have been scored are small or even non-existent for the highest scoring papers. This indicates that, although a general relationship exists between alternative and traditional metrics, the relationship declines for individual papers and might easily be non-existent for important papers: what is relevant for the online community needs not be relevant to the scientific community. One of biggest skeptics of bibliometrics, and altmetrics in particular, Colquhoun (2014), in his blog, explains this occurrence as follows: “Scientific works get tweeted about mostly because they have titles that contain buzzwords, not because they represent great science”. This notion was however, not confirmed by Taylor and Plume (2014), who examined highly shared papers using altmetric data. They were interested in examining whether articles attracting social media attention also are successful in getting the attention of scholars and the mass media. In their qualitative analysis of the top .5% of papers for activity in the social media they failed to find a bias for titillating or eye-catching keywords. Rather, their evaluation is more positive with respect to the scientific value. However, they found that most of the traffic in social media is related to summaries of research, rather than primary research articles themselves (but see Haustein et al. 2015, for a somewhat different result).

Although the distribution of scientific research in the online media has been on the rise over the years, impact still is unevenly distributed. Twitter is the largest platform (and presumably the only one that is genuinely relevant, as it is the only platform to reach a coverage above 20% for the distribution of scholarly publications and findings on the web, as is evident from several studies either relying on tweets as the measurement of alternative metrics (de Winter 2015), or evaluating the usage of internet platforms (Thelwall et al. 2013). This, in some sense, is good news, since Twitter is used mainly by non-academics. However, it seems that scientific material is mainly tweeted by scientists (Ke et al. 2017). Thus, the distribution of scientific material via Twitter among the public may be less than optimal.

As for psychology, comparable results to previous studies on other fields of research such as biomedicine (Haustein et al. 2014a) or astrophysics (Haustein et al. 2014b) are existing. The general picture is that correlations between altmetrics and citations are positive but small, indicating different roles of measuring scientific impact for traditional metrics and alternative metrics. Instead of dismissing those discrepancies as incompatible metrics, differing indicators should instead be used to create a framework for the concurrent use of various kinds of scientometric indicators to establish a more extensive assessment of the scientific impact of scholarly publications. Such a ‘scholarly network’ (Taylor 2013) could help to establish a more complete picture of scholarly impact, which at present is still missing (Priem et al. 2012). We want to add, however, that our findings imply that the AS is adequate for evaluating broad research areas, but should be used with caution for evaluating individual scholars, or individual papers. In addition, altmetrics are better seen as a complement rather than a substitute of traditional metrics like the impact factor. Substitution of traditional metrics, most notably of the impact factor may be desirable given a number of problems related to this traditional metrics (e.g., that the impact factor is negotiated, methodologically flawed, and irreproducible; see Brembs et al. 2013; Fernández-Delgado and Gómez 2015), but for the time being, alternative metrics, and the AS in particular, also suffer from serious limitations (Gumpenberger et al. 2016). The number of citations of a paper—not the impact factor of the journal that published the paper—might still be the best single indicator of a paper’s quality. This number can, and will, increase over the years, while any alternative metric, because of its short half-life, will stagnate soon after publication. Thus, citations measure intermediate and long-term academic influence, while alternative metrics measure immediate academic and non-academic influence. Correlations will not be high under those circumstances.

The formation of a scholarly network by the involvement of scholars in the social media could furthermore establish a link between the scientific community and the public. This could help to involve the public in the scientific progress and would be a move from an exclusive scientific community to a truly overarching community with real time relevance. This concept is supported by the results from the MESUR Project (Bollen et al. 2007), indicating that usage-based metrics are indeed of value for the measurement of scholarly impact (Bollen et al. 2008).

All in all, alternative metrics are still on the verge of validation and have yet to prove themselves to be of any use for the scientific community. Most notably, care should be taken when linking them to an individual researcher’s prestige. There are plenty of possibilities for the quantitative exploration of scientific publications, but any quantitative analysis should always bear in mind that scientific progress depends on the quality of papers, rather than on the prestige of outlets. Bear in mind Tressoldi et al’s. (2013) answer to the question whether high impact equals high statistical standards: “not necessarily so” (p. e56180), they say. Whether the same ought to be said about alternative metrics is open to debate. It is unlikely that a high AS would indicate high statistical standards, but one would hope that a high AS indicates high relevance of the scientific work for the academic and non-academic public. As the public gives the resources for research, papers with high AS succeed in reciprocity, giving something interesting back to the public.