Assessing microscope image focus quality with deep learning
Large image datasets acquired on automated microscopes typically have some fraction of low quality, out-of-focus images, despite the use of hardware autofocus systems. Identification of these images using automated image analysis with high accuracy is important for obtaining a clean, unbiased image dataset. Complicating this task is the fact that image focus quality is only well-defined in foreground regions of images, and as a result, most previous approaches only enable a computation of the relative difference in quality between two or more images, rather than an absolute measure of quality.
We present a deep neural network model capable of predicting an absolute measure of image focus on a single image in isolation, without any user-specified parameters. The model operates at the image-patch level, and also outputs a measure of prediction certainty, enabling interpretable predictions. The model was trained on only 384 in-focus Hoechst (nuclei) stain images of U2OS cells, which were synthetically defocused to one of 11 absolute defocus levels during training. The trained model can generalize on previously unseen real Hoechst stain images, identifying the absolute image focus to within one defocus level (approximately 3 pixel blur diameter difference) with 95% accuracy. On a simpler binary in/out-of-focus classification task, the trained model outperforms previous approaches on both Hoechst and Phalloidin (actin) stain images (F-scores of 0.89 and 0.86, respectively over 0.84 and 0.83), despite only having been presented Hoechst stain images during training. Lastly, we observe qualitatively that the model generalizes to two additional stains, Hoechst and Tubulin, of an unseen cell type (Human MCF-7) acquired on a different instrument.
Our deep neural network enables classification of out-of-focus microscope images with both higher accuracy and greater precision than previous approaches via interpretable patch-level focus and certainty predictions. The use of synthetically defocused images precludes the need for a manually annotated training dataset. The model also generalizes to different image and cell types. The framework for model training and image prediction is available as a free software library and the pre-trained model is available for immediate use in Fiji (ImageJ) and CellProfiler.
KeywordsImage analysis Deep learning Machine learning Focus Defocus Image quality Open-source ImageJ CellProfiler
Deep neural network
Fiji is just ImageJ
Power log-log slope
Acquiring high quality optical microscopy images reliably can be a challenge for biologists, since individual images can be noisy, poorly exposed, out-of-focus, vignetted or unevenly illuminated, or contain dust artifacts. These types of image degradation may occur on only a small fraction of a dataset too large to survey manually, especially in high-content screening applications .
One specific area, image focus quality, is particularly challenging to identify in microscopy images. As described in Bray et al. , the task of selecting the best-focus image given a focal z-stack of multiple images of the same sample has been previously explored. For a related but different task, Bray et al.  evaluated the performance of several focus metrics operating on a set of single-z-depth images (not focal z-stacks), rated by a human as either in or out-of-focus, and identified the power log-log slope (PLLS) metric to be the best at this task. The PLLS metric is computed by plotting the one-dimensional power spectral density of a given image as a function of frequency on a log-log scale, and fitting a line to the resulting plot; the slope of that line (a single scalar) is the PLLS metric for that image. As described in Bray et al. , this value is always negative, and is lower in images where defocus blur removes high-frequencies in the image. The separation of in-focus from out-of-focus images in a dataset using the PLLS metric requires a user-selected threshold, making it difficult to interpret the absolute value of the metric on any given image. This requirement of a threshold, likely different for each image channel , precludes the possibility of online automated focus quality analysis during image acquisition. Automatic identification of absolute focus quality of a single image in isolation, without any user-supplied, dataset-specific threshold, has remained an unsolved problem.
Recent advances in deep learning have enabled neural networks to achieve human-level accuracy on certain image classification tasks . Such deep learning approaches require minimal human input to use, in terms of hand-engineered features or hand-picked thresholds, have recently been applied to microscopy images of cells as well [5, 6, 7, 8, 9]. Though the automatic detection of low quality images in photographic applications has been explored , microscope images differ from consumer photographic images in several important ways. Most microscope images are shift and rotation invariant, have varying offset (black-level) and pixel gain, photon noise , and a larger (up to 16-bit) dynamic range. In fluorescence microscopy, just one of the various different microscopy imaging modalities, an image may correspond to one of many possible fluorescent markers each labeling a specific morphological feature. Finally, with high resolution microscopy, the much narrower depth-of-field makes it more challenging to achieve a correct focus, and typical microscope hardware autofocus systems will determine focus based on a reference depth which only roughly correlates with the desired focus depth.
To more precisely identify absolute image focus quality issues across image datasets of any size, including single images in isolation, we have trained a deep neural network model to classify microscope images into one of several physically-relatable absolute levels of defocus. Our work here includes several contributions to enable more precise and accurate automatic assessment of microscope focus quality. First, we frame the prediction problem as an ordered multi-class classification task (as opposed to a regression, as in ) on image patches, enabling the expression of prediction uncertainty in image patches with no cells or objects as well as a visualization of focus quality within each image. We then show that a deep neural network trained on synthetically defocused fluorescence images of U2OS cells with Hoechst stain , can generalize and classify real out-of-focus images of both that same stain and an unseen stain, Phalloidin, with higher accuracy than the previous state-of-the-art PLLS approach. The combination of these two contributions enables the novel ability to predict absolute image focus quality within a single image in isolation. Lastly, we show qualitative results on how our model predictions generalize to an unseen cell type, Human MCF-7 cells, with data from .
We first started with a dataset of images consisting of focal stacks (containing both in-focus and multiple out-of-focus images) of U2OS cancer cells with Hoechst stain from Bray et al. , for which we later used to train and evaluate a model’s predictive capabilities. These microscope image datasets have several notable properties: the image focus across a given image can vary but is typically locally consistent, many regions of images consist of just the (typically dark) background, for which there exists no notion of focus quality, and the visible image blur scales approximately linearly with distance from the true focal plane. With these considerations, we sought to train a model that could identify, on a small 84 × 84 image patch (about several times the area of a typical cell), both the severity of the image blur and whether the image blur is even well-defined (e.g. if the image patch is just background).
We set aside half of the images (split by site within a well) for evaluation only, and created a training image dataset by taking the 384 most in-focus (the image within each focal stack with the largest standard deviation across all image pixels) images of the U2OS cancer cells with Hoechst stain from the image set BBBC006v1  from the Broad Bioimage Benchmark Collection . This dataset consists of 32 images of each field of view with 2 μm z-spacing, 696 × 520 image size, 2× binning and 20× magnification. We then synthetically defocused the in-focus images by applying a convolution with the following point spread function evaluated by varying z in 2 μm increments 
Both the most probable class and the prediction certainty can be visualized for each image patch as a colored border, with the hue indicating the predicted class (defocus level) and the lightness denoting the certainty, as shown in Fig. 2b.
The whole-image predicted probability distribution is taken to be the certainty-weighted average of the distributions predicted for the individual patches. The whole-image aggregate certainty is the entropy of that probability distribution. The mean certainty, the average of the individual patch certainties, is plotted against the aggregate certainty in Fig. 2c, for each image in the BBBC021 dataset , allowing the identification of several interesting regimes, shown in Fig. 2d (from top to bottom): images with high patch certainty and consistency, images where individual patch certainty is high but the patch predictions are inconsistent, images with only a few high certainty patches, and images with nothing. Importantly, this dataset differed from the training dataset in that it consisted of single z-depth images acquired with 1280 × 1024 image size, 1× binning, 20× magnification and 0.45 NA of Human MCF-7 cells.
To be more precise, a deep neural network was trained on the following image classification task. Given training examples of 16-bit 84 × 84 pixel input image patches and the corresponding degree of defocus (one of 11 discrete classes or defocus levels ordered from least to most defocused), the model predicts the probability distribution over those classes. The model (Fig. 2a) consists of a convolutional layer with 32 filters of size 5 × 5, a 2 × 2 max pool, a convolutional layer with 64 filters of size 5 × 5, a 2 × 2 max pool, a fully connected layer with 1024 units, a dropout layer with probability 0.5, and finally a fully connected layer with 11 units, one for each of the defocus levels.
To correctly penalize model errors on the ordered class categories, the model was trained using a ranked probability score loss function  instead of cross-entropy loss, for 1 million steps (about one entire day), using 64 replicas, and a learning rate of 5e-6 with the Adam optimizer . In addition, the model was trained with an augmented training dataset generated by applying a random gain and offset, log-uniform in (0.2, 5.0) and (1, 1000), to each image. We found the data augmentation important (see Results section) for training a model to generalize on new images spanning the large range of of both foreground and background intensities within the 16-bit image dynamic range. The model was implemented and trained with TensorFlow .
To assess whether this increase in accuracy might be attributed to our use of a deep neural network model or our framework for aggregating independent quality assessments on smaller image patches, we implemented the PLLS approach using our image patch framework. We first tried to reproduce the previously reported 0.84 f-score, using PLLS on whole images. Due to possible differences in sampling of test and validation images, we observed an f-score of 0.82 instead of 0.84. We then evaluated PLLS with our image patch framework, and observed an F-score of 0.60, suggesting the deep neural network model is responsible for the improved accuracy.
To evaluate the generalization of the model on images representing a novel stain with a qualitatively different appearance from the Hoechst stain training images, the same evaluation procedure in Bray et al.  was applied to the Phalloidin (actin) stain images shown in Fig. 3b, yielding an F-score of 0.86, an improvement over the 0.83 achieved by PLLS reported in Bray et al. .
Prediction time on each new 2048 × 2048 image was 1.1 s compared with 0.9 s with PLLS, for single-threaded python implementations of each method, and scales linearly with increasing image pixel count.
Absolute defocus identification
To assess the model’s ability to generalize on images with different brightnesses and background offsets, we conducted a test with the same held out ground truth images, except each image was additionally augmented with a gain and offset. The resulting confusion matrix across all 11 classes or defocus levels, for each applied gain or offset, is shown in Fig. 4. When trained without data augmentation (first row), the model appears to be biased in predicting an image to be more defocused if the image has a higher background offset. In contrast, with data augmentation, the model predictions do not appear to be biased by the image and background brightness.
Rather than train the model on focal stacks of defocused images acquired by a real microscope or manually labeled images, we trained the model on synthetically defocused versions of real images instead. This enabled the use of known ground truth images for training the model to identify absolute focus quality rather than relative measures of quality. Another advantage of this approach is that we can generate the large number of training examples required for deep learning using only an in-focus image dataset, and that the model might be more robust to overfitting on nuisance factors in the experimental data. However, the success of this approach depends on the extent to which the image simulation accurately represents the real physical image formation. We took advantage of the well-known behavior of light propagation in a microscope to achieve this. We note that in certain applications, the use of a more complex optical model may yield even better results .
Our analysis demonstrated the importance of using data augmentation to train a model to handle the large range of possible foreground and background intensities. Not only does our learning-based approach enable prediction of an absolute measure of image focus quality on a single image, but it also requires no user-specified parameters for preprocessing the input images.
Possible future work includes training the model to predict on even more varied input images, including those spanning multiple spatial scales, additional imaging modalities such as brightfield, cell types, stains and phenotypes. These extensions might be implemented by a combination of a more accurate image simulator and the inclusion of a more diverse and representative dataset of in-focus real training images. In particular, additional image datasets would enable a more comprehensive assessment of model generalization beyond what has been presented here, and, along with an improved assessment methodology, would allow for a better comparison of the methods compared in  and presented in Fig. 3, including statistical significance of accuracy gains, which we did not assess. Optimizing the network size, input image patch dimensions or explicitly modeling background image patches where focus is undefined might improve accuracy further. Lastly, the current model specializes in the task of determining focus quality, but additional measures of image quality could be explored as additional prediction tasks, with simulated data for training.
A deep learning model was trained on synthetically defocused versions of real in-focus microscope images. The model is able to predict an absolute measure of image focus on a single image in isolation, without any user-specified parameters and operates at the image-patch level, enabling interpretable predictions along with measures of prediction uncertainty. Out-of-focus images are identified more accurately compared with previous approaches and the model generalizes to different image and cell types. The software for training the model and making predictions is open source and the pre-trained model is available for download and use in both Fiji (ImageJ) [19, 20] and CellProfiler .
We thank Lusann Yang for reviewing the software, Claire McQuin and Allen Goodman for assistance with CellProfiler integration, Michael Frumkin for supporting the project and Anne Carpenter and Kevin Eliceiri for helpful discussions.
Financial support for this work came from Google, NIH U54 HG008105 (SF), R01 NS083390 (SF), and the Taube/Koret Center for Neurodegeneration Research (SF).
Availability of data and materials
Project name: Microscope Image Quality
Fiji (ImageJ) plugin home page: https://imagej.net/Microscope_Focus_Quality
CellProfiler module home page: https://github.com/CellProfiler/CellProfiler-plugins/wiki/Measure-Image-Focus
Source code: https://github.com/google/microscopeimagequality
Programming Language: Python.
Operating system(s): Platform independent
Other requirements: TensorFlow 1.0 or higher
License: Apache 2.0
The datasets analysed during the current study are available in the Broad Bioimage Benchmark Collection repository, https://data.broadinstitute.org/bbbc/image_sets.html [2, 12].
SJY implemented software, experiments and wrote paper with comments from all other authors. MBe conceived of the project and approach, and, along with DMA and MBa, contributed to experiment design. DMA helped with requirements definition and usage understanding, MBa provided image data (not shown), and CR reviewed software. JH integrated the CellProfiler module and CTR and AS implemented the Fiji (ImageJ) plugin. AN, EC and SH provided helpful discussions, and PN and SF supervised all aspects of the project. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 3.Bray M-A, Carpenter A. Imaging Platform, Broad Institute of MIT and Harvard: Advanced Assay Development Guidelines for Image-Based High Content Screening and Analysis. In: Sittampalam GS, Coussens NP, Brimacombe K, Grossman A, Arkin M, Auld D, Austin C, Baell J, Bejcek B, TDY C, Dahlin JL, Devanaryan V, Foley TL, Glicksman M, Hall MD, Hass JV, Inglese J, Iversen PW, Kahl SD, Kales SC, Lal-Nag M, Li Z, McGee J, McManus O, Riss T, Trask Jr OJ, Weidner JR, Xia M, Xu X, editors. Assay Guidance Manual. Bethesda (MD): Eli Lilly & Company and the National Center for Advancing Translational Sciences; 2012.Google Scholar
- 4.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A: Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:1–9.Google Scholar
- 5.Christiansen E, Yang S, Ando D, Javaherian A, Skibinski G, Lipnick S, Mount E, O'Neil A, Shah K, Lee A, Goyal P, Fedus W, Poplin R, Esteva A, Berndl M, Rubin L, Nelson P, Finkbeiner S. In silico labeling: Predicting fluorescent labels in unlabeled images. Cell. 2018; in press.Google Scholar
- 6.Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow P-M, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Gitter A, Greene CS: Opportunities And Obstacles For Deep Learning In Biology And Medicine. bioRxiv 2017:142760.Google Scholar
- 7.Michael Ando D, McLean C, Berndl M: Improving Phenotypic Measurements in High-Content Imaging Screens. bioRxiv 2017:161422.Google Scholar
- 11.Huang F, Hartwich TMP, Rivera-Molina FE, Lin Y, Duim WC, Long JJ, Uchil PD, Myers JR, Baird MA, Mothes W, Davidson MW, Toomre D, Bewersdorf J. Video-rate nanoscopy using sCMOS camera–specific single-molecule localization algorithms. Nat. Methods. 2013;10:653–8.CrossRefPubMedPubMedCentralGoogle Scholar
- 13.Born M, Wolf E: Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light. Cambridge: CUP Archive; 2000.Google Scholar
- 16.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR). 2015. arXiv:1412.6980 [cs.LG]Google Scholar
- 17.Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. 2016, arXiv:1603.04467 [cs.DC]Google Scholar
- 20.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, Tinevez J-Y, White DJ, Hartenstein V, Eliceiri K, Tomancak P, Cardona A. Fiji: an open-source platform for biological-image analysis. Nat. Methods. 2012;9:676–82.CrossRefPubMedGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.