Natural Computing

, Volume 8, Issue 1, pp 101–120 | Cite as

Observer-invariant histopathology using genetics-based machine learning

  • Xavier Llorà
  • Anusha Priya
  • Rohit Bhargava


Prostate cancer accounts for one-third of noncutaneous cancers diagnosed in US men and is a leading cause of cancer-related death. Advances in Fourier transform infrared spectroscopic imaging now provide very large data sets describing both the structural and local chemical properties of cells within prostate tissue. Uniting spectroscopic imaging data and computer-aided diagnoses (CADx), our long term goal is to provide a new approach to pathology by automating the recognition of cancer in complex tissue. The first step toward the creation of such CADx tools requires mechanisms for automatically learning to classify tissue types—a key step on the diagnosis process. Here we demonstrate that genetics-based machine learning (GBML) can be used to approach such a problem. However, to efficiently analyze this problem there is a need to develop efficient and scalable GBML implementations that are able to process very large data sets. In this paper, we propose and validate an efficient GBML technique—\({\tt NAX}\)—based on an incremental genetics-based rule learner. \({\tt NAX}\) exploits massive parallelisms via the message passing interface (MPI) and efficient rule-matching using hardware-implemented operations. Results demonstrate that \({\tt NAX}\) is capable of performing prostate tissue classification efficiently, making a compelling case for using GBML implementations as efficient and powerful tools for biomedical image processing.


Observer-invariant histopathology Genetics-based machine learning Learning Classifier Systems Hardware acceleration Vector instruction SSE2 MPI Massive parallelism 



We would like to thank David E. Goldberg for his continual support and encouragement, allowing us to have access to the IlliGAL resources. Thanks also to Kumara Sastry for hallway discussions and to the Automated Learning Group and the Data-Intensive Technologies and Applications at the National Center for Supercomputing Applications for hosting this joint collaboration.

This work was sponsored by the Air Force Office of Scientific Research, Air Force Materiel Command, USAF, under grant FA9550-06-1-0370, the National Science Foundation under grant IIS-02-09199, and the National Institute of Health. The US Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon.

The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Office of Scientific Research, the National Science Foundation, or the US Government.

Rohit Bhargava would like to acknowledge collaborators over the years, especially Dr. Stephen M. Hewitt and Dr. Ira W. Levin of the National Institutes of Health, for numerous useful discussions and guidance. Funding for this work was provided in part by University of Illinois Research Board and by the Department of Defense Prostate Cancer Research Program. This work was also funded in part by the National Center for Supercomputing Applications and the University of Illinois, under the auspices of the NCSA/UIUC faculty fellows program.


  1. Amdahl G (1967) Validity of the single processor approach to achieving large-scale computing capabilities. In Proceedings of the American federation of information processing societies conference (AFIPS). 30:483–485 AFIPSGoogle Scholar
  2. Bacardit J, Butz M (2006) Advances at the frontier of Learning Classifier Systems. Chapter data mining in Learning Classifier Systems: Comparing XCS with GAssist, vol I. SpringerGoogle Scholar
  3. Bacardit J, Krasnogor N (2006) Biohel: Bioinformatics-oriented hierarchical evolutionary learning (Nottingham ePrints). University of NottinghamGoogle Scholar
  4. Barry A, Drugowitsch J (1997) LCSWeb: the LCS wiki.
  5. Bernadó E, Llorà X, Garrell J (2001) Advances in Learning Classifier Systems: 4th international workshop (IWLCS 2001). Chapter XCS and GALE: a comparative study of two Learning Classifier Systems with six other learning algorithms on classification tasks. Springer Berlin, Heidelberg, pp 115–132Google Scholar
  6. Bhargava R, Fernandez D, Hewitt S, Levin I (2006) High throughput assessment of cells and tissues: Bayesian classification of spectral metrics from infrared vibrational spectroscopic imaging data. Biochemica et Biophisica Acta 1758(7):830–845CrossRefGoogle Scholar
  7. Cantú-Paz E (2000) Efficient and accurate parallel genetic algorithms. Kluwer Academic PublishersGoogle Scholar
  8. Cordón O, Herrera F, Hoffmann F, Magdalena L (2001) Genetic fuzzy systems. Evolutionary tuning and learning of fuzzy knowledge bases. World ScientificGoogle Scholar
  9. Fernandez D, Bhargava R, Hewitt S, Levin I (2005) Infrared spectroscopic imaging for histopathologic recognition. Nat Biotechnol 23(4):469–474CrossRefGoogle Scholar
  10. Flockhart I (1995) GA-MINER: parallel data mining with hierarchical genetic algorithms (final report). (Technical Report Technical Report EPCCAIKMS-GA-MINER-REPORT 1.0). University of EdinburghGoogle Scholar
  11. Gabriel E, Fagg G, Bosilca G, Angskun T, Dongarra J, Squyres J, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain R, Daniel D, Graham R, Woodall T (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In Proceedings of the 11th European PVMMPI Users’ group meeting SpringerGoogle Scholar
  12. Goldberg D (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley ProfessionalGoogle Scholar
  13. Goldberg D (2002) The design of innovation: lessons from and for competent genetic algorithms. SpringerGoogle Scholar
  14. Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing. Addison-WesleyGoogle Scholar
  15. Holte R (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11:63–91MATHCrossRefGoogle Scholar
  16. Lattouf J-B, Saad F (2002) Gleason score on biopsy: is it reliable for predcting the final grade on pathology? BJU Int 90:694–699CrossRefGoogle Scholar
  17. Levin I, Bhargava R (2005) Fourier transform infrared vibrational spectroscopic imaging: integrating microscopy and molecular recognition. Annu Rev Phys Chem 56: 429–474CrossRefGoogle Scholar
  18. Llorà X (2002) Genetics-based machine learning using fine-grained parallelism for data mining. Doctoral dissertation, Enginyeria i Arquitectura La Salle. Ramon Llull University, Barcelona, Catalonia, European UnionGoogle Scholar
  19. Llorà X (2006) Learning Classifier Systems and other genetics-based machine learning Blog.
  20. Llorà X, Garrell J (2001) Knowledge-independent data mining with fine-grained parallel evolutionary algorithms. In Proceedings of the genetic and evolutionary computation conference (GECCO’2001). Morgan Kaufmann Publishers, pp 461–468Google Scholar
  21. Llorà X, Goldberg D (2003) Bounding the effect of noise in multiobjective Learning Classifier Systems. Evol Comput J 11(3):279–298CrossRefGoogle Scholar
  22. Llorà X, Sastry K (2006) Fast rule matching for Learning Classifier Systems via vector instructions. In Proceedings of the 2006 genetic and evolutionary computation conference. ACM Press, pp 1513–1520Google Scholar
  23. Llorà X, Sastry K, Goldberg D (2005) The compact classifier system: motivation, analysis and first results. In Proceedings of the congress on evolutionary computation, vol 1. IEEE press, (Also as IlliGAL TR No 2005019, pp 596–603)Google Scholar
  24. Llorà X, Sastry K, Goldberg D, de la Ossa L (2007) The χ-ary extended compact classifier system: linkage learning in Pittsburgh LCS. In Advances at the frontier of Learning Classifier Systems, vol II. IlliGAL report no 2006015. Springer, pp (in preparation)Google Scholar
  25. Merz CJ, Murphy PM (1998) UCI repository for machine learning data-bases.
  26. Mitchell T (1997) Machine learning. McGraw HillGoogle Scholar
  27. Orriols-Puig A, Bernadó-Mansilla E (2006) A further look at UCS classifier system. In Proceedings of the 8th annual conference on genetic and evolutionary computation workshop program. ACM PressGoogle Scholar
  28. Quinlan JR (1993) C4.5: Programs for machine learning. Morgan KaufmannGoogle Scholar
  29. Stone C, Bull L (2003) For real! XCS with continuous-valued inputs. Evol Comput J 11(3):279–298CrossRefGoogle Scholar
  30. Wilson S (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175CrossRefGoogle Scholar
  31. Wilson S (2000a) Get real! XCS with continuous-valued inputs. Lect Notes Comput Sci 1813:209–219CrossRefGoogle Scholar
  32. Wilson S (2000b) Mining oblique data with xcs. In Revised papers of the 3th international workshop on Learning Classifier Systems (IWLCS 2000). Springer, pp 158–176Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  1. 1.National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.Department of BioengineeringUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  3. 3.Beckman Institute for Advanced Science and TechnologyUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations