Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The scientific process aggregates a large number of scientific results into a common scientific consensus. Meta-analysis performs the aggregation by statistical analysis of numerical values presented across scientific papers. Collaborative systems such as wikis may easily aggregate text and values from multiple sources. However, so far they have had limited ability to apply numerical analysis as required, e.g., by meta-analysis.

Researchers have discussed the advantages and disadvantage of the tools for conducting systematic reviews from “paper and pencil”, over spreadsheets to RevMan and web-based specialized applications [1]: Setup cost, versatility, ability to manage data, etc. In 2009 they concluded that “no single data-extraction method is best for all systematic reviews in all circumstances”. For example, RevMan and Archie of the Cochrane Library provide an elaborate system for keeping track and analyzing textual and numerical data in meta-analyses, but the system could not import information from electronic databases [1]. Our original meta-analyses [2, 3] relied on the Microsoft Excel spreadsheets later distributed on public web sites. Compared to an ordinary spreadsheet a wiki solution provides data entry provenance and collaborative data entry with immediate update. Shareable folders on cloud-based storage systems would help collaboration on spreadsheets. Online services, such as the spreadsheet of Google Docs, may lack meta-analytic plotting facility. Web-based specialized applications for systematic reviews may have a high setup cost [1].

We have previously explored a simple online meta-analysis system—a “fielded wiki”—in connection with personality genetics [4]. As implemented specifically for this scientific area the web service lacks generality for other types of meta-analytic data. Furthermore the system relied on PubMed or Brede Wiki to represent bibliographic information.

Following Ward Cunningham’s quote “What’s the simplest thing that could possibly work?” we present a simple system that allows for mass meta-analysis of numerical data presented as comma-separated values (CSV) in a standard MediaWiki-based wiki, — the Brede Wiki: http://neuro.compute.dtu.dk/wiki/. We present the data in an Open science data fashion where other researchers can easily reuse the data.

2 Data and Data Representation

We use the MediaWiki-based Brede Wiki to represent the data [5]. For our neuroimaging data each data record usually consists of three values (number of subjects, the mean and standard deviation across subjects of the volume of brain structures). The individual study typically compares two such data records, e.g., from a patient and a control group. We also record labels for the data record, e.g., the biographic information, as well as extra subject information about the two groups, such as age, gender and clinical characteristics, so we get seven or more as the total number of data items for each study. Each meta-analysis will usually determine what extra relevant information we may include, and it may differ between studies, e.g., a Y-BOCS value has typically only relevance for obsessive-compulsive disorder (OCD) patients. The functional neuroimaging area has CogPO and Cognitive Atlas ontologies [6, 7] enabling researchers to describe the topic of an experiment, but these efforts do not directly apply to our data. One CSV line carries the information for each study.

Fig. 1.
figure 1

Screenshot from the wiki showing CSV data transcluded on a page and formatted into a table by a MediaWiki extension.

Separate wiki pages store—rather than uploaded files—the CSV data, so the MediaWiki template functionality can transclude the CSV data on other wiki pages. By convention pages with CSV information have the “.csv” extension as part of the title so external scripts can recognize them as special pages and the wiki pages have no wiki markup. Another approach instead of extensions could have used a dedicated MediaWiki namespace for the CSV pages, such that these pages would have the prefix CSV: or similar.

A few MediaWiki extensions can format CSV information: SimpleTable Footnote 1 and TableData. Figure 1 shows the transclusion of CSV data with a modified version of the SimpleTable extension. A MediaWiki template handles the transclusion and also establish web links to download and edit the CSV data.

The Brede Wiki uses the standard template system for recording structured bibliographic data about the publication, and a script that reads through the wiki dumps can represent the data in relational databases [5].

To annotate the CSV information we also use templates. Figure 2 shows the markup of three different CSV pages with information about major depressive, bipolar and obsessive-compulsive disorders, where the title field indicates the title of the CSV page in the wiki and the topic fields link to items in the Brede Wiki topic ontology. Presently, no controlled vocabulary beyond the template fields describes the columns in the CSV. To generate an appropriate content-type (text/csv) a bridging web script functions as a proxy, so a download of the CSV page can spawn a client-side spreadsheet program.

Fig. 2.
figure 2

Template to annotate the CSV data and define the links to the meta-analysis.

The bulk of the data currently presented in the wiki comes from the large mass meta-analysis of volumetric studies on major depressive disorder reporting over 50 separate meta-analyses for individual brain regions [2]. Further data comes from mass meta-analyses across multiple brain regions on bipolar disorder [3] and first-episode schizophrenia [8], a meta-analysis on longitudinal development in schizophrenia [9] as well as data from individual original studies on obsessive-compulsive disorder.

Apart from neuroimaging studies the Brede Wiki also records data from meta-analyses from a few other studies outside neuroimaging [10, 11], allowing us to test the generality of the framework. We distribute the data under Open Data Commons Open Database License (ODbL).

3 Web Script and Meta-Analysis

The web script for meta-analysis reads the CSV information by downloading it from the wiki, identifies the required columns for meta-analysis, performs the statistical computations and makes meta-analytic plots— the so-called forest and funnel plots—in the SVG format, see Fig. 3. From either the title information or a PubMed identifier the script generates back-links from the generated page to pages on the wiki. The script may also export the computed results as JSON or CSV. Furthermore, it may generate a small R script that sets up the data in variables and use the meta R library for meta-analysis. Finally the script itself can return its source code enabling other researchers to readily inspect the code and more easily reproduce the results [12].

The web script attempts to guess the separator used on the CSV page and also tries to match the elements of the column header, e.g., the strings “control n”, “controls number”, “number of controls”, etc. match for number of control subjects. With no matches the user needs to explicitly specify the relevant columns via URL parameters, which in turn a wiki template can setup. Figure 2 displays three templates where the middle template explicitly specify which columns the web script should interpret as “number of experimentals”, “mean of experimentals” and “standard deviation of experiments”.

Standard meta-analysis computes an effect size from each result in a paper and computes a combined meta-analytic effect size and its confidence interval. Although the methodological development continues, there exist established statistical analysis approaches for ordinary meta-analysis [10]. Our system implements computations on the standardized mean difference for continuous variables and on the natural logarithm of the odds ratio for categorical variables with fixed and random effects methods using an inverse-weighted variance model. As an extra option we provide meta-analysis on the natural logarithm of the variance ratio [13], for comparison of the standard deviations between two groups of subjects. Meta-analysts seldom use this type of summary statistics compared to the standard effect size, but in a few areas such statistics generate interest, e.g., in the controversy over whether males show larger variance in intelligence than females [11].

With continuous data where we have \(\bar{x}_e\), \(s_e\) and \(n_e\) as the mean, standard deviation and number of subjects for the experimental group (e.g., patients) and \(\bar{x}_c\), \(s_c\) and \(n_c\) as the corresponding numbers for the control group (e.g., healthy controls) we compute an effect size \(d\) for each study included in the meta-analysis by

$$\begin{aligned} s_{\text {pooled}}= & {} \sqrt{\frac{(n_e-1)s_e^2 + (n_c-1) s_c^2}{n_e +n_c - 2}} \\ g= & {} \frac{\bar{x}_e - \bar{x}_c}{s_{\text {pooled}}} \\ d_{\text {smd}}= & {} \left( 1-\frac{3}{4 (n_e + n_c) - 9} \right) g. \end{aligned}$$

With binary data we compute the logarithm of the odds ratio as

$$\begin{aligned} d_{\text {or}} = \ln \left[ \frac{ c(n_{ee}) / c(n_{e}-n_{ee})}{ c(n_{ce}) / c(n_{c}-n_{ce})} \right] ,\qquad \text {where } c(x) = {\left\{ \begin{array}{ll} 0.5 &{} \text {if x=0} \\ x &{} \text {else} \end{array}\right. } \end{aligned}$$

where \(n_{ee}\) and \(n_{ce}\) indicate the number of experimental events and control events (e.g., success of treatment in each of the groups), while the \(c(\cdot )\) function handles the case where no events appear for either the experimental or control group, resulting in a division by zero. For the variance ratio we compute

$$\begin{aligned} d_{\text {vr}} = \ln \left( \frac{s_e^2}{s_c^2}\right) . \end{aligned}$$

From the effect sizes and their variances we compute \(Z\)-scores, two-sided \(P\)-values and confidence intervals. Our random effects model gets estimated following the DerSimonian-Laird approach and we furthermore compute Cochran’s homogeneity statistics that gives us a value for how much between-study variance affects each meta-analysis. The web script may return all these numerical results in the JSON format.

Certain types of neuroimaging studies report brain coordinates that may require specialized flexible multivariate meta-analysis [14, 15], for a recent review see [16]. Our system does not yet implement these techniques.

Fig. 3.
figure 3

Screenshot of web script showing the meta-analytic results with forest and funnel plots on a dataset on pituitary gland volume differences between obsessive-compulsive disorder patients and healthy controls (Color figure online).

4 Results

We have added 127 pages transcluded with CSV data, — most of which contain data suitable for meta-analysis. For mass meta-analysis we can presently consider 59 meta-analytic datasets. For a single meta-analyses the reading, computation and download finish within seconds. With multiple calls to the web script and JSON output another script can plot multiple meta-analytic results together as in Fig. 5. Generating such a plot takes several minutes: First we use the MediaWiki API of Brede Wiki to get a list of pages which use the meta-analysis template (as shown in Fig. 2). We then download all these pages in the form of raw wiki markup and extract the templates with regular expressions. With this extracted information we call the meta-analytic web script multiple time, letting it return the results in the machine readable JSON format, to get the meta-analytic results that we can plot and tabularize. For generating the page shown in Fig. 3 with results from an individual meta-analysis we need only the CSV data and the web script, while the script that generated Fig. 5 with mass meta-analysis used information defined in templates, CSV data and the web script with no further adaption of MediaWiki.

The specific example shown in Fig. 3 displays results from 5 studies reported in 3 different papers so far entered in the Brede Wiki on the differences between obsessive-compulsive disorder patients and healthy controls in pituitary gland volume [1719]. In this case the forest plot exposes a large variation between the studies. The colored components in the table and the forest plot link back to pages on the wiki associated with the three papers.

Fig. 4.
figure 4

Connections between the different components of the meta-analysis wiki system: In the middle a table on the wiki generated from template information with links to topic ontology pages on the wiki (top), the transcluded data (bottom) and the web script with meta-analysis of the data (right).

Figure 4 shows the links between the pages in the wiki and to the web script. The middle table gets generated from template definitions and data similar to the data displayed in Fig. 2. It provides links to topic ontology pages (with the page for “Major depressive disorder” shown), the data and result page generated by the meta-analysis web script.

Table 1. Top 20 meta-analyses as of June 2012 sorted according to \(P\)-value of a total of 57 meta-analyses. The “ES” column displays the effect sizes.

Table 1 lists a part of the mass meta-analytic results sorted according to \(P\)-value of the effect size. Disregarding the non-neuroimaging results we see that results from meta-analyses on hippocampal and lateral ventricles volumes in first-episode schizophrenia and hippocampal volumes in major depressive disorder reach the highest significance. The original published meta-analyses [2, 8] also mentioned these highly significant results prominently.

Fig. 5.
figure 5

Results from mass meta-analyses shown in a L’Abbé-like plot and constructed by calling the web script multiple times. Each dot corresponds to a meta-analysis. Uncertainty as a function of effect size with size of each dot determined by the number of studies in each meta-analysis. The line indicates 0.05-significance. To avoid overlapping texts labels appear only on a subset of meta-analyses and only with the primary topic.

Fig. 6.
figure 6

Screenshot from online mass meta-analyses with data from structural neuroimaging studies of obsessive-compulsive disorder in a table and a L’Abbé-like effect-uncertainty plot. Each row in the table and each dot in the chart correspond to a meta-analysis with the size of the dots controlled by the number of subjects in each meta-analysis and the color determined from Cochran’s homogeneity test.

Whereas an offline script generated Fig. 5 an online web script has generated Fig. 6 with a mass meta-analysis of obsessive-compulsive disorder neuroimaging studies. This script can read a single wiki page with templates that describes meta-analytic CSV data and direct the meta-analytic web script to analyze all that data and return the results in a machine readable format. The mass meta-analytic web script then formats the results in a sortable table and plot it in the L’Abbé-like effect-uncertainty plot, here presently using the Bubble chart from Google Chart Tools.Footnote 2 It takes around 10 s to download and compute the 25 meta-analyses represented on the page for obsessive-compulsive disorder neuroimaging studies. In this case the meta-analysis for the pituitary gland has the largest magnitude of the effect size and thus placed to the extreme right in the plot. As redness indicates heterogeneity among the included studies in the meta-analysis the plot also shows that this particular region has the highest heterogeneity compared to the other 24 meta-analyses. The forest plot in Fig. 3 also display this heterogeneity.

5 Discussion

By using MediaWiki in our present system we exploit the template facility to capture structured information and use free-form wikitext for annotation and comment on the individual scientific papers, — as in semantic academic annotation wikis AcaWiki Footnote 3 and WikiPapers.Footnote 4 We can also use the pages of the wiki as a simple means to keep track of the status of the papers considered for the meta-analysis: potentially eligible, eligible, partially entered and fully entered.

The Internet Brain Volume Database (IBVD)Footnote 5 also records structural neuroimaging data and presents the data online with visualization [20]. It does not entirely meet our needs as the database does not match associated patient and control groups. Nevertheless, the data recorded in IBVD and our system have many overlapping fields, and on pages for papers in the Brede Wiki we enter the IBVD paper identifier, so we can make deep links at the bibliographic level.

Why not Semantic MediaWiki? Semantic MediaWiki (SMW) may query text and numerical data, though has not had the ability to make complex computations. The Semantic Result Formats Footnote 6 extension includes average, sum, product and count result formats enabling simple computations of a series of numerical values, but insufficiently for the kind of computations we require. The data for meta-analysis form an n-ary data record (mean, standard deviation, number of subjects, labels) so either individual SMW pages should store each data record or we should invoke the n-ary functionality in Semantic Internal Objects Footnote 7 SMW extension, SMW record or the recently-introduced subobject SMW functionality. We have not investigated whether these tools provide convenient means for representing our data. The Brede Wiki can export its ontologies defined in MediaWiki template to SKOS. Our future research can consider RDFication of the CSV information through the SCOVO format [21].

Some methods for systematic reviews make two independent data-extractions that require consolidation and resolution of discrepancies to counter data extraction errors, that sometimes appear in meta-analyses [22]. Our simple system does not directly support such a process.

We wrote the web service in Python, where NumPy makes vector computation available and SciPy provides statistical methods necessary for the computation. In a future PHP implementation the script could more closely integrate with the wiki as either a MediaWiki or a SMW extension. Furthermore, new developments in 2012 in MediaWiki with support for the Lua programming language and Wikidata [23] for structured data in MediaWiki have relevance for our system. With the flexibility of Lua a wiki user can write templates that format CSV information into MediaWiki tables and perform numerical computations on the data, — possibly to such an extent that it will make the table rendering MediaWiki extension and the meta-analytic web script unnecessary.

6 Conclusion

A wiki built from standard components provides a inexpensive solution with means to manage meta-analytic data in a collaborative environment. The general framework allows not only the meta-analysis of neuroimaging-derived data but has the potential for managing and analyzing data from many other domains. The system gives an overview of the important effects and uncertainties across numerous meta-analyses that multiple wiki users may extend collaboratively.

We regard the environment as an Open Science system that makes methods and data immediately available in human and machine readable form.