Spot the match – wildlife photo-identification using information theory
- 24k Downloads
Effective approaches for the management and conservation of wildlife populations require a sound knowledge of population demographics, and this is often only possible through mark-recapture studies. We applied an automated spot-recognition program (I3S) for matching natural markings of wildlife that is based on a novel information-theoretic approach to incorporate matching uncertainty. Using a photo-identification database of whale sharks (Rhincodon typus) as an example case, the information criterion (IC) algorithm we developed resulted in a parsimonious ranking of potential matches of individuals in an image library. Automated matches were compared to manual-matching results to test the performance of the software and algorithm.
Validation of matched and non-matched images provided a threshold IC weight (approximately 0.2) below which match certainty was not assured. Most images tested were assigned correctly; however, scores for the by-eye comparison were lower than expected, possibly due to the low sample size. The effect of increasing horizontal angle of sharks in images reduced matching likelihood considerably. There was a negative linear relationship between the number of matching spot pairs and matching score, but this relationship disappeared when using the IC algorithm.
The software and use of easily applied information-theoretic scores of match parsimony provide a reliable and freely available method for individual identification of wildlife, with wide applications and the potential to improve mark-recapture studies without resorting to invasive marking techniques.
KeywordsSpot Pattern Whale Shark Horizontal Angle Natural Marking Matching Uncertainty
Effective approaches for the management and conservation of wildlife populations require a sound knowledge of population demographics . For many species, such information is provided by studies that recognize individual animals so that their fate can be followed through time, thus allowing for the estimation of demographic rates like survival . Individual recognition may be achieved either by applying an artificial mark to an animal or by using an animal's natural markings . The former technique is pervasive in ecological studies addressing questions from the purely theoretical [e.g., ] to the highly applied , and it has been used on both marine and terrestrial species of vastly different sizes [e.g., [6, 7]].
Applying artificial marks to wildlife can, however, alter natural behaviour and reduce individual performance [e.g., ]. The marking process itself may be disruptive  due to the necessity of handling and restraining for mark application . The loss of marks over time  and the non-reporting of retrieved marks  can also compromise the estimation of demographic parameters. Additionally, there are often a host of ethical and welfare issues that can arise from the application of permanent or temporary marks [13, 14].
To address some of these problems, the identification of individual animals from their natural markings has become a major tool for the study of some animal populations , and has been applied to an equally wide range of animals from badgers  to whales [17, 18]. One of the more popular techniques of recording the natural markings of an animal is photo-identification as this allows storage of photos in a library for subsequent cross-matching and generation of capture-history matrices [17, 19]. These libraries can be examined manually to develop a suite of individual matches ; however, as the number of photos in a library increases beyond a person's capacity to process the suite of candidate matches manually, the development of faster, automated techniques to compare new photographs to those previously obtained is required [20, 21]. Several automated matching algorithms have been trialled with some success [e.g., [20, 22, 23, 24, 25, 26]], but these are generally highly technical, specialized and target a particular taxon or unique morphological feature of the species in question (e.g., dorsal fin shape and markings in cetaceans). Furthermore, uncertainty in the matching algorithms themselves have never been contextualized within a multi-model inferential framework , and so subjective manual matching is still required to assess reliability .
An example taxon that lends itself well to the development and application of a generalist algorithm for photo matching is the world's largest fish – the whale shark (Rhincodon typus). This species has been the recent subject of several photo-identification studies [e.g., [19, 20, 29]], some of which have already provided valuable information on population size, structure  and demography  under the supported assertion that the spot and stripe patterns of animals are individually unique and temporally stable . The initial assessment of the demography of one population (Ningaloo Reef, Western Australia)  has been complicated by the addition of many hundreds of photographs taken during analogous research programmes in other parts of Australia, Belize, USA, Philippines and Mexico , and elsewhere (Djibouti, Seychelles and Mozambique). Consequently, the number of photographs available has exceeded the number that can be reliably matched by eye, thereby necessitating an automated system of matching. One such system has been developed from an algorithm originally designed for stellar pattern recognition, and is currently being employed by the ECOCEAN whale shark database . This system has great potential; however, the procedure for entering and matching patterns is complex, and neither the algorithm nor results are publicly available. Therefore, a simple, yet reliable algorithm accessible to the public is needed to incorporate effectively a large number of photographs from a wide range of researchers, tourist operators and private organizations. Such a software package has recently been developed and is known as Interactive Individual Identification System (I3S) [31, 32].
I3S (Interactive Individual Identification Software) matching validation
Assessing 'by-eye' matches using I3S
Of the 33 individuals re-sighted between years in the database used by Meekan et al. , 10 individuals could not be matched with I3S because their images were not amenable to I3S fingerprinting (absence of reference points) or their match was not present in the database. This was because the Meekan et al.  study also used images from a separate database and included scar-identified individuals that were not available for photographic matching using I3S. Thus, we could only re-assess 23 of these by-eye matches that included 13 LS matches and 16 RS matches (58 images total).
There was an exponential decline of median ER1 with increasing angle (Fig. 6b). Median ER1 ranged from 69.16 (± 52.24) for images of subjects at 10°, to 1.56 (± 2.81) for images of subjects at 40°. The distribution of ER1 for images of subjects at 30° approached that for non-matching pairs, and the distribution of ER1 for images of subjects at 40° overlapped the ER1 distribution for non-matching pairs.
Number of spot pairs
Consistent, non-intrusive and ethically acceptable methods of mark-recapture are essential for estimating reliable demographic rates for wildlife populations, particularly for threatened species [29, 33]. Photo-identification has become a widely accepted method of mark-recapture that has been empirically tested over a broad range of species [e.g., [16, 17, 34]]. Despite the advantages of this technique, there is the potential for large photographic databases to compromise the reliability of matches made by eye, which can subsequently jeopardize reliable estimates of population demographics. This problem has been largely overcome for several species by computer-aided image-matching algorithms that match various unique features of individuals [20, 28, 35, 36, 37]. However, most of these programs have limited applications, may be complex to operate, or are not freely available.
Software inaccessibility and the corresponding isolation of potentially useful photographic datasets will likely compromise parameter estimation and lead to higher uncertainty for calculated vital rates. For example, centralized photographic catalogues are common in the field of cetacean research, with new photographs from observers being compared to those previously obtained and the results sent to collaborators worldwide . This type of data sharing for large, long-lived and wide-ranging species is an essential component of effective population management. Open-source matching software coupled with matching algorithms exploiting the power of information theory will make this process more efficient and less prone to error. Our main objective was to provide a procedure for incorporating full matching uncertainty into the photo-identification process using a freely available and simple software package. Despite the relatively low number of photographs with which we tested our approach, the performance of the system is satisfactory from the perspective of estimating reliable demographic information for a host of wildlife species.
Our assessment of a simple, freely available spot pattern-matching software package coupled with an information-theoretic incorporation of matching uncertainty was particularly effective for whale sharks given that their natural spot patterns were ideally suited for assessment using the I3S program. Validation of I3S matches using the Information Criterion algorithm provided a threshold w1 for known matched pairs of approximately 0.2, below which w1 for non-matched pairs fell. Known matched pairs not matched by I3S, or that were matched with low (i.e., <0.2) w1, likely resulted from poor clarity or high angles of yaw. This emphasizes the need to select images of the highest quality for matching purposes . The validation process is necessary with most computer-aided matching algorithms because this alleviates much of the subjectivity associated with the final stage of matching. In the case of whale sharks, the 0.2 threshold proved to be a robust and conservative measure of certainty, but the particular value of the threshold will likely vary among species. Nonetheless, in the absence of validation data we suggest that using this threshold value is a good first approximation.
The validation stage of photographic matching can be further confirmed by using genetic tagging to identify individuals , and this approach is proliferating in mark-recapture studies. Genetic tagging also has the advantage of providing additional individual- and population-level information (e.g., genetic diversity, parent-offspring relationships, etc.) . Because whale sharks are highly photographed and tissue sampling may be difficult, it is unlikely that genetic tagging will replace photographic identification in the near future, even though genetic information will provide further validation of photographic matching success.
The open-source program I3S  was effective at confirming past matches made by eye in the majority of instances. Images that were successfully confirmed using our Information Criterion algorithm received relatively low w1 and ER1 overall, most likely as a result of a considerably smaller sample size than that used for validation. I3S was also a useful tool for identifying image matches that were assigned incorrectly (i.e., both false positives and false negatives). When matching whale shark patterns by eye, the observer generally does not focus on the spot pattern per se; rather, attention is usually paid to the intricate lines and whirls (see Fig. 1a) on the flank of the shark. As such, I3S provides an unbiased method of matching natural markings that is relatively immune to user subjectivity.
We found strong evidence that horizontal angle of subjects within images affects the ability of the I3S algorithm to make reliable matches. As the horizontal angle of subjects in images increases, the matching likelihood decreases. Angles of yaw up to 30° compromise the matching process even though many of these images were still matched correctly. Conversely, images with angles of yaw ≥40° will more than likely be incorrectly assigned. Due to the linear algorithm used by I3S to match spot patterns it is important to use only those photos with as little contortion of the reference area as possible. Likewise, the number of spots annotated in fingerprints can also potentially affect the I3S matching process. The higher the number of spot pairs matched, the lower the I3S score and hence, the higher the matching certainty. This corroborates similar findings from a study of Carcharias taurus  and emphasizes the benefit of using information-theoretic measures of matching parsimony because the updated algorithm takes relative match uncertainty into account.
The number of suitable images from our database for use in I3S was considerably reduced due to the absence of reference points, poor image quality and oblique angles of subjects in many images. The rejection rate is inflated particularly by the use of photographs taken without the explicit aim of photographic matching because many are derived from ecotourism operations. However, the efficiency and reliability of matching with I3S more than compensated for the reduced sample size. The number and size of images in an I3S database can potentially slow down the program's operating speed; therefore, it is ideal to scale down the size of photographs and only include the best image of a particular animal. In addition to horizontal angle, roll and pitch of sharks in images may affect the matching process. Pitch seems likely to be only a minor problem because digital photos can be rotated so that the animal is aligned with the horizontal. We had few images of the same individual at varying angles of roll, so we were unable to examine this potential problem.
The application of I3S to any animal with a unique, stable spot pattern holds particular promise for mark-recapture studies. The program is particularly well suited to organisms that have minimal contortion in the desired reference area and have spots that are relatively homogenous in diameter and size. Large, irregular spots may cause problems during fingerprinting because the centre of the spot may vary according to the user's preference. For example, a species with a spot pattern that may not be well suited to I3S is the manta ray (Manta birostris) due to its large, sparsely spaced and irregular ventral spot patterns . However, other species of ray such as the white spotted eagle ray (Aetobatus narinari) have evenly spaced and relatively homogenous spot patterns on the dorsal surface that would lend themselves more readily to the fingerprinting process. Other organisms that are potentially suitable candidates include: felids, some cetaceans, many birds, amphibians and reptiles, and other elasmobranchs.
The benefits of non-intrusive mark-recapture studies are numerous, not only in terms of animal welfare, but also from a logistical perspective. The software availability and applicability of I3S for a wide range of animals will enable researchers to store and match images for mark-recapture purposes, thus hopefully contributing to robust and more precise estimates of key life history parameters. Reliable, effective photo-identification for animals with stable, natural markings is now possible for anyone armed with a digital camera.
Whale shark photo library
The library contains 797 photos taken by researchers and tour operators during the months of March–July from 1992–2006 at Ningaloo Reef (22º 50’ S, 113º 40’ E), Western Australia. The method of image capture varied over time, so that still, video and digital images were all included in the library. A 'by-eye' comparison of 581 images in this photo library, (this total excludes several images collected in the 2001 season, as well as all photos collected between 2003 and 2006), was originally completed. During analysis, photos were sorted into quality classes on the basis of clarity, angle, distinctiveness, partial image and overall quality . More details of the manual matching procedure are provided in reference .
Matching software and fingerprint creation
The software we used to generate potential image matches was originally designed to match natural variation in spot patterns of grey nurse sharks (Carcharias taurus – also known as the "ragged-tooth" in South Africa and the "sand tiger" shark in North America) . This software – Interactive Individual Identification Software (I3S) – creates 'fingerprint' files and matches individuals by comparing particular areas demonstrating consistent spot patterns. We chose to examine the area on the flank directly behind the 5th gill slit as the most appropriate for the individual identification of whale sharks. This decision was based on spot consistency identified in previous studies and due to the ease with which photographers can view this area [19, 20]. The positioning of spots in this area was also less likely to be distorted due to undulation of the caudal fin, which may affect the software's matching success.
At least three reference points are required by I3S to construct a fingerprint ; we chose the most easily identifiable and consistent reference points visible in flank photographs: 1) the top of the 5th gill slit, 2) the point on the flank corresponding to the posterior point of the pectoral fin and 3) the bottom of the 5th gill slit (Fig. 1a). The requirement of all three reference points to be visible in the photograph for a fingerprint to be created meant that not all 797 photos could be used. As such, we could compare 433 (54%) of the original photographs, of which 212 were of the left side (LS) and 221 were of the right side (RS) of the shark.
In this updated database, images were matched by an operator highlighting spots within the reference area on a computer screen. Three initial reference points for each image were entered (Fig. 1a), followed by the manual adding of a digital point to the centre of the most obvious spots within the reference frame. Using a search function, the software compares the new fingerprint file against all other fingerprint files in the database by using a two-dimensional linear algorithm, which is simply the sum of the distances between spot pairs divided by the square of the number of spot pairs . The matched spot pairs with the minimum overall score (ranging from 0 [perfect match] to a value <1) is the most likely match. The program also lists the next 49 most likely image matches, which it ranks in decreasing order of likelihood. A search result output text file provides a list of the 50 matches, spot pairs compared, as well as a matching score. We then incorporated the I3S text output into the R Package  for further analysis [see Additional file 1].
Information criterion algorithm
To provide a measure of match parsimony based on the philosophy of information theory and to compare possible image matches in a multi-model inferential framework , we modified the match score in the following manner: (1) we first back-transformed the spot-averaged sum of distances to a residual sum of distances, which was simply the spot score (SS) multiplied by the square of the number of matching spots (n); (2) we then created an information criterion (IC) analogous to the Akaike and Bayesian Information Criteria [43, 44]:
where k = an assumed number of parameters under a simple linear model (set to 1 for all models) and n' = 100/n that accounts for the fact that an increasing number of spots automatically leads to a higher SS (the 100 multiplier scales the term to be >1); (3) finally, we calculated the IC weight (w) as:
where ΔIC = IC - IC min for the ith image (ith 'model') from 1 through m (where m = 49). We also calculated the information-theoretic evidence ratio (ER)  for each matched image relative to the top-ranked image based on the w to provide a likelihood ratio of match performance. Here, ER1 is the w of the top-ranked matched photograph divided by the next most highly ranked photograph's w, ER2 is the w of the top-ranked match divided by the w of the third-best match, and so on. Therefore, ER1 provides a likelihood ratio for the match of the top-ranked photograph relative to the next most highly ranked photograph.
To establish the ability of the w i and ER indices to assign reliable matching, we endeavoured to establish a threshold value of w1 and ER1 below which matching uncertainty was too high to match photographs reliably. We therefore validated the approach by applying our algorithms to a sample of 200 images; 25 known matched pairs (i.e., matched by eye) from both the LS and RS databases (100 images total), and 25 non-matched pairs from both LS and RS databases (100 images total). The LS and RS images were analyzed separately, using text outputs from I3S that report the candidate matching image names, I3S matching scores and the number of spot pairs matched. A match was considered successful if the corresponding image was ranked at the top of the list of potential matches (i.e., number 1 of 50).
Assessing 'by-eye' matches using I3S
Thirty-three individual sharks were re-sighted inter-annually during the manual 'by-eye' analysis of the raw photo library. Of any two by-eye matched images, one of the pair was entered into either the LS or RS database and searched. A match using I3S was considered successful if the by-eye matched images were ranked as the most likely match (as with the validation test) and confirmed using the IC algorithm.
Horizontal angle (yaw)
Footage of 10 different sharks (5 LS and 5 RS) was used to capture sequences of five images of each shark, where subjects were on varying horizontal angles (0°, 10°, 20°, 30° and 40° – Fig. 2). The angles of yaw were estimated using Screen Protractor™ software. Fingerprints were created for each image with 20 spots annotated per fingerprint. The 10° images were searched against the 0° images and 10 non-matching images. This process was repeated, substituting images where subjects were on angles of 20°, 30° and 40° for both LS and RS image sequences. Five random, non-matching pairs were also searched against 0° and 10° images, and then repeated for 20°, 30°, and 40° images. This allowed for a comparison between matching and non-matching pairs while testing for the effects of horizontal angle in images. Results were analyzed using the IC algorithm applied to the match validation and by-eye comparison tests.
Number of spot pairs
Fifty known-matching pairs were compared to one another in I3S. Of these matching pairs, only those successfully confirmed during validation of I3S matches were included in this test. I3S scores were compared against the number of spot pairs matched. The w1 for each image was also compared against the number of spot pairs matched by the I3S algorithm. A complementary log-log transformation (clog-log) was applied to normalize the distribution of I3S scores and w1, and a log10 transformation was used to normalize the distribution of spot pairs. We tested for a linear relationship between the transformed variables using least-squares regression and information-theoretic evidence ratios. Goodness-of-fit was assessed using the least-squares R2 value.
We acknowledge the support of the whale shark ecotourism industry based in Exmouth and Coral Bay (Western Australia), the Natural Heritage Trust (NHT) Marine Species Recovery Protection fund administered by the Department of Environment and Heritage (Australia), Hubbs-SeaWorld Research Institute, BHP Billiton Petroleum, Woodside Energy, the U.S. NOAA Ocean Exploration Program, the Whale Shark Research Fund administered by the Western Australia Department of Environment and Conservation (DEC), the Australian Institute of Marine Science, NOAA Fisheries and CSIRO Marine and Atmospheric Research. We particularly thank E. Wilson, C. Simpson, J. Cary, R. Mau and B. Fitzpatrick of DEC, and the logistical support and advice of C. McLean, M. Press, A. Richards, I. Field, S. Quasnichka, J. Polovina, B. Stewart, K. Wertz, T. Maxwell, J. Stevens, S. Wilson and J. Taylor, as well as assistance with I3S by Jurgen den Hartog and Renate Reijns (I3S developers). This research was reviewed and approved by the Charles Darwin University Animal Ethics Committee, the Institutional Animal Care and Use Committee of Hubbs-SeaWorld Research Institute and the animal ethics committee of DEC. We thank D. Lohman, G. Taylor, D. Bickford and J. Kirwan for supplying images.
- 1.Caughley G, Gunn A: Conservation Biology in Theory and Practice. 1996, Cambridge, MA., Blackwell ScienceGoogle Scholar
- 3.Whitehead H, Christal J, Tyack PL: Studying cetacean social structure in space and time. Cetacean Societies: Field Studies of Dolphins and Whales. Edited by: Mann J, Connor RC, Tyack PL and Whitehead H. 2000, Chicago and London, University of Chicago Press, 65-86.Google Scholar
- 10.Ogutu JO, Piepho HP, Dublin HT, Reid RS, Bhola N: Application of mark-recapture methods to lions: satisfying assumptions by using covariates to explain heterogeneity. Journal of Zoology. 2006, 269: 161-174.Google Scholar
- 15.Stevick PT, Palsbøll PJ, Smith TD, Bravington MV, Hammond PS: Errors in identification using natural markings: rates, sources, and effects on capture–recapture estimates of abundance. Canadian Journal of Fisheries and Aquatic Sciences. 2001, 58: 1861-1870. 10.1139/cjfas-58-9-1861.Google Scholar
- 18.Sears R, Williamson JM, Wenszel FW, Berube M, Gendron D, Jones P: Photographic identification of the blue whale (Balaenoptera musculus) in the Gulf of St. Lawrence, Canada. Report of the International Whaling Commission. 1990, 335-342.Google Scholar
- 21.Mizroch SA, Beard JA, Lynde M: Computer-assisted photo-identification of humpback whales. Report of the International Whaling Commission. 1990, 63-70.Google Scholar
- 22.Wilkin DJ, Debure KR, Roberts ZW: Query by sketch in DARWIN: digital analysis to recognize whale images on a network. Storage and Retrieval for Image and Video Databases VII Proceedings of the International Society for Optical Engineering (SPIE) Vol 3656. Edited by: Yeung MM, Yeo BL and Bouman CA. 1998, Bellingham, Washington, SPIE, 3656: 41-48.Google Scholar
- 23.Evans PGH: EUROPHLUKES Database Specifications Handbook. http://www.europhlukes.net. 2003Google Scholar
- 24.Lapolla F: The Dolphin Project http://thedolphinproject.org. 2005Google Scholar
- 25.Urian K: Mid-Atlantic Bottlenose Dolphin Catalog http://moray.ml.duke.edu/faculty/read/mabdc.html. 2005Google Scholar
- 27.Burnham KP, Anderson DR: Model Selection and Multimodal Inference: A Practical Information-Theoretic Approach. 2002, New York, USA, Springer-Verlag, 488-2ndGoogle Scholar
- 29.Bradshaw CJA, Mollet HG, Meekan MG: Inferring population trends of the world's largest fish from mark-recapture estimates of survival. Journal of Animal Ecology. 2007, DOI: 10.1111/j.1365-2656.2007.01201.x:Google Scholar
- 30.CITES: CITES Appendix II nomination of the Whale Shark, Rhincodon typus. Proposal 12.35. 2002, Santiago, Chile, CITES Resolutions of the conference of the parties in effect after the 12th MeetingGoogle Scholar
- 31.Van Tienhoven AM, Den Hartog JE, Reijns R, Peddemors VM: A computer-aided program for pattern-matching of natural marks of the spotted raggedtooth shark Carcharias taurus (Rafinesque, 1810). Journal of Applied Ecology. 2007, In press:Google Scholar
- 32.Interactive Individual Identification Software (I3S). [http://www.reijns.com/i3s]
- 36.Kehtarnavaz N, Peddigari V, Chandan C, Syed W, Hillman G, Wursig B: Photo-identification of humpback and gray whales using affine moment invariants. Image Analysis, Proceedings. 2003, 2749: 109-116. Google Scholar
- 39.Agler BA: Testing the reliability of photographic identification of individual fin whales (Balaenoptera physalus). Report of the International Whaling Commission. 1992, 42: 731-737.Google Scholar
- 42.Last PR, Stevens JD: Sharks and Rays of Australia. 1994, , CSIROGoogle Scholar
- 43.R Core Development Team: R: A Language and Environment for Statistical Computing. 2004, Vienna, Austria, R Foundation for Statistical ComputingGoogle Scholar
- 44.Akaike H: Information theory as an extension of the maximum likelihood principle. Proceedings of the Second International Symposium on Information Theory. Edited by: Petrov BN and Csaki F. 1973, Budapest, Hungary, , 267-281.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.