Towards a Systematic Screening Tool for Quality Assurance and Semiautomatic Fraud Detection for Images in the Life Sciences
The quality and authenticity of images is essential for data presentation, especially in the life sciences. Questionable images may often be a first indicator for questionable results, too. Therefore, a tool that uses mathematical methods to detect suspicious images in large image archives can be a helpful instrument to improve quality assurance in publications. As a first step towards a systematic screening tool, especially for journal editors and other staff members who are responsible for quality assurance, such as laboratory supervisors, we propose a basic classification of image manipulation. Based on this classification, we developed and explored some simple algorithms to detect copied areas in images. Using an artificial image and two examples of previously published modified images, we apply quantitative methods such as pixel-wise comparison, a nearest neighbor and a variance algorithm to detect copied-and-pasted areas or duplicated images. We show that our algorithms are able to detect some simple types of image alteration, such as copying and pasting background areas. The variance algorithm detects not only identical, but also very similar areas that differ only by brightness. Further types could, in principle, be implemented in a standardized scanning routine. We detected the copied areas in a proven case of image manipulation in Germany and showed the similarity of two images in a retracted paper from the Kato labs, which has been widely discussed on sites such as pubpeer and retraction watch.
KeywordsDigital image Ethics Manipulation Image processing Fraud detection
Pictures and images play a key role in the documentation and presentation of results in the life sciences. In cases of fraud, images have often been the key to identifying manipulation and falsification in a scientific work. As a survey by the US Office of Research Integrity (ORI) already showed more than ten years ago, not only the incidence of allegations involving questionable images has increased, but also their incidence relative to other ORI cases (Krueger 2002). Images were also a central issue in cases that garnered broader media attention, such as the Hwang clone fraud case in Korea, the biggest cancer research fraud case by Herrmann/Mertelsmann/Brach in Germany, or the case of former oral cancer research star Jon Sudbø in Norway. In the Hwang case, which is considered “one of the highest profile events in South Korea’s history” (Logan et al. 2010: 172), results such as DNA fingerprinting analyses and photographs of cells in a Science article from 2004 were fabricated (Kakuk 2009: 548). In the German case, 94 publications were found to contain falsified or suspicious data, including many cases of recycling the same images in different contexts and publications, or copying and pasting within a certain image (Couzin and Unger 2006: 39; Abbott and Schwarz 2002). In one of the fraudulent publications from Norway in the prestigious New England Journal of Medicine, one of the paper’s images of mouth lesions was found to be a magnified version of another image in the same article (Couzin and Schirber 2006; for an overview of fraud in oncology: Schraub and Ayed 2010).
Leaving aside such individual and often spectacular cases that have been uncovered, the total number of image manipulations in submitted scientific papers remains unknown and can only be estimated, e.g. by online surveys among scientists. According to such a survey by Martinson et al., 0.3% of 3247 scientists admitted to having “cooked” or falsified research data themselves. About 15% said that they had previously engaged in behaviors such as “dropping observations or data points”, and 4.7% admitted to reusing data in two or more publications (Martinson et al. 2005) (the survey did not explicitly ask about image manipulation). Recently, based on a visual (“by eye”) screening of 20,621 papers in 40 scientific journals, a group of US-researchers estimated the prevalence of the specific case of inappropriate image duplication at 3.8%, with an increasing tendency during the past decade (Bik et al. 2016). This is in line with the observation that in biomedical literature, the number of retractions has increased in the last few years, in many cases due to manipulated images (Krueger 2012). As efficient and systematic screening of image manipulation is not yet available, it can only be assumed that the technical (software) possibilities of image editing may have increased the probability of image manipulation.
Spectacular cases of fraud with broader media attention refuel the debate on how such manipulation could have been avoided and who will be responsible for better quality control in the future. Journal editors have recognized this problem and organized in the “Committee on publication ethics” (COPE, http://publicationethics.org). However, journals such as the Journal of Cell Biology controlling systematically images are rather regarded as an exception (Couzin 2006b). Responsible editors usually point out reviewers’ and editors’ limited possibilities, as did Donald Kennedy, former editor-in-chief at Science: “Peer review cannot detect [fraud] if it is artfully done. (…) And the reported falsifications in the Hwang paper—image manipulation and fake DNA data—are not the sort that reviewers can easily spot” (Couzin 2006a). Concerning the above-mentioned case of oncologist Jon Sudbø, Richard Horton, editor of The Lancet, claimed: “This is all so similar to the Hwang thing that we have just been through. (…) Peer review is a great system for detecting badly done research, but if you have an investigator determined to fabricate an entire study, it is not possible to pick it up” (Butler 2006). Even clearly fabricated papers have a good chance to be accepted, as John Bohannon showed in an experiment with free access journals (Bohannon 2013).
These statements seem to still be true today, at least for more sophisticated manipulations that are undetectable by the bare eye, or for manipulations obviously violating established guidelines such as Nature Journal’s guidelines for “Image integrity and standards” (Nature 2016). Such guidelines provide some orientation to which degree an image may still be regarded as authentic after electronic corrections to brightness, contrast etc. To our knowledge, there is still a lack of widely spread and standardized screening methods for reviewers or editors to routinely verify the authenticity of a submitted scientific image. In principle, such screening tools would be useful for everybody involved in the process of quality control. However, journal editors, in particular, should have a choice from a variety of different methods because falsifiers, who also have access to any given screening tool on the free market, will eventually learn to mask their manipulations and render them undetectable by this specific screening tool.
At least in the case that an image has already been labeled as suspicious, institutions such as ORI offer some tools (called “forensic droplets”) for the examination of “questionable” scientific images (http://ori.hhs.gov/droplets). These tools yield images, but do not offer a measurable or easily comparable result between images (like rankings or probability of manipulation). However, such tools seem to be rather suitable for data that is already questionable, and may be of some help in the daily routines of editors and reviewers. Some software like Rigour1 (http://www.suprocktech.com) offers batch processing of images to detect manipulated areas in images. In this work, we explore and discuss a general procedure and basic statistical algorithms as a first step towards a possible automatic routine control of scientific images in the life sciences, and prospectively, beyond.
Types of Image Manipulation
From a mathematical point of view, according to which images are nothing but a matrix of pixels with different values, the type of potential image manipulation (blots, electrophoretic gels, etc.) is secondary. More important aspects are image characteristics such as color (homogeneous or heterogeneous values inside the matrix), resolution (size of the matrix), etc., which are used to scan for suspicious images. In our approach, we consider images to be data sets that can be systematically scanned for manipulation. Our main goal is to search for similar areas. Therefore, our methods require images without large monochromatic areas in which everything looks similar. Typically, large monochrome areas in themselves are indicative of manipulation or inappropriate post-processing of images (Cromey 2010). On the other hand, large areas of “noisy background” for which copied areas can be searched are extremely valuable. Outside of the background areas, the signal of the image information is usually much stronger (for example dark points on a light background) than the signal coming from a manipulation, making the latter undetectable. Here, we suggest some basic algorithms to detect image manipulation.
A journal’s integrity standards typically define image alteration and manipulation from the author’s perspective. The journals’ image integrity standards usually don’t offer a general and explicit distinction of fraudulent and non-fraudulent (but still unacceptable) image manipulation. For example, Nature’s standards for image integrity (http://www.nature.com/authors/policies/image.html) advise avoiding tools like Adobe Photoshop’s® cloning and healing tools, which alter single areas of an image in a nonlinear way. Global linear transformations (like changes in brightness and contrast) are allowed to a certain degree if they are necessary and mentioned in the description. Other authors distinguish in their digital imaging guidelines between “usually acceptable” (e.g., simple adjustments to the entire image), “questionable” (e.g., manipulations that are specific to one area of an image and are not performed on other areas) and “very questionable” (e.g., cloning or copying objects into a digital image, from other parts of the same image or from a different image) (Cromey 2010). However, the degree to which such transformation is still acceptable, and whether a description of the image treatment is sufficient can only be decided on a case-by-case basis.2
Type 1: Manipulation by deleting unwanted data information (for example using the Photoshop cloning Tool)
Type 2: Duplication by reusing images in different papers or contexts
Type 3: Manipulation by adding information/data points.
Procedures to Detect Images with Added Information
If parts of the information in a given image are added to the original version, this copied-and-pasted area leaves characteristic edges at the border of the copied area. Therefore, it is necessary to spot visible or hidden edges around important image data (e.g. bands in Western Blots) to detect cut points and, in a second step, the origin of the copied information (see Fig. 1, type 3). One problem in detecting suspicious edges is lossy image compression. Most published images use the jpg-format, which employs a lossy compression algorithm based on 8 × 8 pixel blocks (ISO/IEC 10918-1: 1993). When we look at edges, the first step is to discriminate edges caused by compression from edges caused by manipulation. Again, manipulation type 3 is difficult to process automatically because the signal from the added data is typically much stronger than the signal from the edges of copied areas. As our goal is to outline the first steps towards a tool for use by journal editors and/or reviewers as a possible screening method of incoming images, this paper will focus on the first two types of possible manipulations (deletion or duplication of areas). Searching for edges requires other types of algorithms which are not the subject of this work. One way to avoid manipulation type 3 in the future would be for journals to accept only uncompressed image data at submission for quality checks.
Procedures to Detect Deleted Information and Duplicated Images
At first glance, searching for deleted information in a given image seems to be a paradox: How to look for something which is not there anymore? Typically, the deleted information has been replaced by background noise. This can be done by copying and pasting another part of the image in a way that hides the unwanted area (compare the specification step in the flow chart in Fig. 1). Since we cannot search for the deleted data, we must search for the origin of the copied background. In principle, it is possible to detect deleted information by searching for edges, but the above-mentioned problem of compressed images applies here, too.
One proposed method to detect areas with data deletion is to search for background regions which differ from their direct neighborhood in the image, e.g. by changes in luminance or color. An alternative is to search for similar areas, which are indicative of data manipulation by copying and pasting. In this work, we considered data deletion by replacement with background. In a second step, we examined a related problem: finding identical images or identical details.
One strategy to match a copied region to its new environment is changing contrast. After such a change, the copied area is no longer identical with its original. For that reason, we also need an algorithm to detect regions that are similar, although modified.
We provided three different algorithms to detect copied areas in the background. In this section, we first describe data pre-processing, followed by the three algorithms. As a last step, we present a tool to summarize the results. The algorithms are all part of a newly developed R (R Core Team 2015) software package FraudDetTools, which is available from the authors. The package contains a selection of functions written in R. All algorithms work with one or two different images. The package has two core functions: The function readImage collects the pre-processing steps; nN9Var provides the different comparison algorithms. In addition, some functions that output results and some sample data are also part of FraudDetTool.
Depending on the origin of the data that is to be analyzed, some pre-processing steps are necessary. Images can be easily read as JPEG- or PNG- formats. To isolate parts of an image or an image from a bigger figure, the data must be handled with care to prevent data alteration. Formats like JPEG are lossy in their data compression. To avoid data loss, they have to be saved in lossless formats like PNG. To analyze the images, we transform them into an image matrix. Our package includes the function readImage, which uses the two R packages jpeg and png (Urbanek 2013a, b) to create those image matrices and additionally transforms color images into grayscale ones. The image matrix is the basis for all following analyses. Every entry represents one pixel of the original image. The matrix values range from 0 to 1. For a typical 8-bit image, there are 256 possible values. For monochrome areas in the picture, a second image matrix has to be created. The matrix values corresponding to the monochrome areas have to be changed to a new, unique value to prevent false positive matches. Typically, white (1) and black (0) are the values which include monochrome areas. Even after this preparation, the variance algorithm does not work for images that include monochrome areas.
The two images (two different images, or the original image and a (pre-processed) copy) are compared in any possible shift. The parts of the image that do not overlap are compared with the pixels on the other side of the image: e.g., a shift by one line causes the first line of the first image and the last line of the second image to be compared. If we compare images of different sizes, only the range of the smaller image is used. This procedure is the same for all three algorithms.
For a pixel-wise comparison, we count the number of identical superimposed pixels. The nearest neighbor algorithm counts identical 3 × 3 pixel blocks. The variance algorithm computes the variances in every 3 × 3 pixel block and accumulates them for the whole image. All algorithms create a result matrix which contains the results for every shift. The index of the matrix rows and columns indicates a shift by this number of rows and columns. For 3 × 3 pixel blocks, the entry of the result is at the position of the top left pixel.
Localization of Similar Areas
The result matrices of the three algorithms show the number of identical/similar pixel/neighborhoods or the sum of neighborhood variances, respectively, for every shift. An additional approach provides localization matrices. These are implemented for the nearest neighbor and the variance algorithm. Every entry counts the number of identical nearest neighbor areas or variances below the cut-point, respectively, over all shifts. Thus, localization matrices help finding areas with a large number of identities in an image, see visualization in Fig. 5c.
To test our algorithms, we used three different types of data: A test image for such procedures and two real manipulation cases. The first real data example is a simple copy-and-paste manipulation of type 1, the second a more difficult manipulation of type 2 including some data alteration. Despite the fact that they are manipulated, reproduction of the manipulated images is necessary to show the results of our algorithm.
Example 1: ORI Test Image
First, we explored the algorithms on a test image from the ORI (http://ori.dhhs.gov/) consisting of weak background noise. This image was designed by the ORI to test new routines to search for copied areas. We employed all three algorithms to the whole image to find the copied regions. As one would expect for 250,000 pixels and 256 shades of grey, there are many identical pixels. For the ORI test image, every shift has at least 15,020 identical pixel pairs (pixel-wise comparison). If we count an identity only if the nearest neighborhood of 8 pixels including the origin pixels themselves is identical (nearest neighbor), the number of identities for every shift is between 0 and 2187. We are interested in shifts containing a larger number of identical pixel pairs (or 3 × 3 areas) relative to most of the other shifts to avoid random matches. The absolute number of identities is secondary.
Using our algorithms, we obtained different types of similar areas in the ORI test image. This example has shown that our algorithms can work on test data. Next, we looked for applicability in a real life example that had already been identified as a manipulation.
Example 2: Copied Areas
The nearest neighbor algorithm only finds simple copy-and-paste shifts. Nevertheless, it seems to be suitable for practical image analysis. We demonstrated this using an established case with a simple copy-and-paste manipulation as an example. In 1998, Noé and Breer published the paper “Functional and Molecular Characterization of Individual Olfactory Neurons” (Noé and Breer 1998). Five years later, the German Research Foundation (DFG) ascertained that two figures in this publication were manipulated (DFG 2003). According to this report, the authors had replaced the primer bands of the electrophoresis gels with background. We applied our algorithms to one electrophoresis gel from Fig. 6b in the cited paper. In the original image from the research paper, our algorithms cannot detect the copied areas because of low image quality. The image quality of the corrigendum is much better than the image in the original paper. On the image data that was extracted from the corrigendum, the algorithms work very well.
This example has shown that our algorithms are able to retrace previously identified manipulations. In this case, they also provide additional information (in comparison to the naked eye) about the origin of the copied areas.
Example 3: Detecting Duplicated Images
The third example consists of two images from a letter in Nature Cell Biology (Suzawa et al. 2003).
There are many ways to manipulate and reuse images. Developing a screening tool to detect such manipulation requires a systematic classification. Our proposed typology of 3 types of image manipulation may be regarded as a first and useful step for a screening procedure beyond graphical output. With the presented algorithms, we can detect identical areas, large areas which include more identical pixels than expected, and identical areas whose image values are shifted by a constant. However, the detection algorithms cover only a small range of possible manipulations. Our ultimate goal is to create an automated procedure for quality assurance. This will require extending the algorithms and making them sensitive to rotated and scaled images. At this point, the pixel-wise comparison and nearest neighbor algorithms only detect exact identical pixels and 3 × 3 areas, respectively. The nearest neighbor algorithm is more sensitive to small copied areas, whereas the pixel-wise algorithm cannot detect such signals due to the high number of randomly identical pixels. Changes in scale or image quality (e.g. JPEG-compression) render manipulations undetectable to the algorithms. In the original image from the discussed Noé/Breer paper, our algorithms are unable to detect the copied areas because of low quality and changes caused by image compression. The original images from the Kato paper vary minimally in size, also causing the algorithms to fail. However, the tools are a useful addition to the range of existing screening methods and lead to a monitoring system which looks for “outliers” in a collection of images.
Our example shows that it is necessary to use images of good quality. Some journals like the Nature Publishing Group employ the good practice of handling raw data: “In fact, our journals have plans to make this data available to readers, and we expect this measure to increase the overall quality and integrity of the scientific record” (Retraction Blues 2013). This data is important in order to discover manipulated data. Publication of high-quality (raw) data gives scientists the chance to test images using their own procedures, which, of course, is no substitute for a careful image check by the journals.
This is in line with the conclusions derived from our examples. Although more pixels cause longer runtimes for the algorithms, more detail increases the chance of detecting duplicated areas. Lossy image compression should be avoided to ensure correct data representation. The algorithms are too slow to search for duplications in big image archives, but other, more powerful algorithms do exist. However, it is possible to compare all images within a given paper and, and for cases like Sudbø or Herrmann/Mertelsmann/Brach introduced earlier, it is also useful. The algorithms can be part of the quality control routine to avoid duplicating images by mistake. The duplicated images in the recent Tachibana cloning paper cannot be detected at this stage due to incompatibility and changes in scale, but an improved algorithm should be able to manage this type of duplication.
In summary, we can state that all three algorithms are helpful tools for scanning suspicious images. As a next step, they must be supplemented by algorithms which work for rescaled and rotated images. Furthermore, faster implementation is desirable to address the runtime problem. In addition to existing approaches (expert eye and Photoshop procedures), our procedure can generally be used to automatically check large image archives and filter out suspicious images for a precise expert check. To increase the level of automatization, filtering of unusual results (outliers) is possible.
We manually monitored the retractions appearing on “Retraction Watch” for six months, which led us to the assumption that most undetected image manipulation could be avoided if publishers/editors implemented a routine check for the described manipulation. Including the features of our and other algorithms, the next step could be to create a classifier which helps scan for suspicious images. Up to now, the algorithms were tested on examples and on original data from known cases of fraud. For statistical inference, it would be preferable to simulate and model types of image manipulation. The use of algorithms calls for a check of the algorithm itself. Since it is not appropriate to blindly trust a screening tool, we have to investigate the precision and recall of our tools (Rossner 2008).
The goal of this study was to develop a systematic approach to classify different kinds of image manipulation in a suitable form, which can be handled with the basic algorithms we have developed. The proposed classification may also be a means to sharpen awareness of how images should be treated in scientific teaching. As a next step towards using the tool in practice, a quality check by a double blind controlled trial, as recommended by one of our reviewers, is inevitable. However, a set of algorithms that detects suspicious images will have to be continuously extended because image manipulators will continue to find new methods, as well.
Finally, we must point out that an automated scan for suspicious images does not imply an automated judgment. The final decision should always be made by human experts to avoid false positives, but comparison algorithms should support the discussion by providing an initial quality check. Once an algorithm detects a suspicious image, further investigation like the proceeding described in the COPE Flowcharts (publicationethics.org/resources/flowcharts) about fabricated data will be necessary.
Rigour is a closed-source software; public information on the approaches it uses is not available. However, tutorials suggest that this program’s output is processed images.
Further questions, such as a more general definition of fraudulent and non-fraudulent, but still unacceptable treatment of images, touch on the broad topic of authenticity, which cannot be discussed in detail in this article. However, many questions in this context have already been brought up in classical works such as Walter Benjamin’s “The Work of Art in the Age of Mechanical Reproduction” and seem to be more important than ever in the digital age.
LK and KI designed the algorithms, LK implemented the algorithms, HW contributed the sociological and historical framework, LK, HW and KI developed the systematic approach and wrote the paper.
- DFG. German Research Foundation (2003). Rüge für Heinz Breer und Johannes Noé. Press release. http://www.dfg.de/service/presse/pressemitteilungen/2003/pressemitteilung_nr_48/index.html. Accessed October 7, 2015.
- Krueger, J. (2012). What do retractions tell us? Newsletter ORI, 21, 1–6.Google Scholar
- Nature. (2016). Guide to Publication policies of the Nature journals. http://www.nature.com/authors/gta.pdf. Accessed September 8, 2016.
- R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.R-project.org/.
- Retraction Blues. (2013). Nature Medicine, 19(12), 1547–1548.Google Scholar
- Urbanek, S. (2013a). jpeg: Read and write JPEG images. R package version 0.1-6. http://CRAN.R-project.org/package=jpeg.
- Urbanek, S. (2013b). png: Read and write PNG images. R package version 0.1-5. http://CRAN.R-project.org/package=png.
- Wormer, H. (1999, September 3). Fingerabdrücke einer Fälschung. Süddeutsche Zeitung, 55, 11.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.