Skip to main content

Data Mining the Yeast Genome in a Lazy Functional Language

  • Conference paper
  • First Online:
Practical Aspects of Declarative Languages (PADL 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2562))

Included in the following conference series:

Abstract

Critics of lazy functional languages contend that the languages are only suitable for toy problems and are not used for real systems. We present an application (PolyFARM) for distributed data mining in relational bioinformatics data, written in the lazy functional language Haskell. We describe the problem we wished to solve, the reasons we chose Haskell and relate our experiences. Laziness did cause many problems in controlling heap space usage, but these were solved by a variety of methods. The many advantages of writing software in Haskell outweighed these problems. These included clear expression of algorithms, good support for data structures, abstraction, modularity and generalisation leading to fast prototyping and code reuse, parsing tools, profiling tools, language features such as strong typing and referential transparency, and the support of an enthusiastic Haskell community. PolyFARM is currently in use mining data from the Saccharomyces cerevisiae genome and is freely available for non-commercial use at http://www.aber.ac.uk/compsci/Research/bio/dss/polyfarm/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  2. Mannila, H.: Methods and problems in data mining. In: International Conference on Database Theory. (1997)

    Google Scholar 

  3. Muggleton, S., ed.: Inductive Logic Programming. Academic Press (1992)

    Google Scholar 

  4. Wrobel, S., Džeroski, S.: The ILP description learning problem: Towards a general model-level definition of data mining in ILP. In: FGML-95 Annual Workshop of the GI Special Interest Group Machine Learning (GI FG 1.1.3). (1995)

    Google Scholar 

  5. King, R., Muggleton, S., Srinivasen, A., Sternberg, M.: Structure-activity relationships derived by machine learning: The use of atoms and their bond connectives to predict mutagenicity by inductive logic programming. Proc. Nat. Acad. Sci. USA 93 (1996) 438–442

    Google Scholar 

  6. Goffeau, A., Barrell., B., Bussey, H., Davis, R., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J., Jacq, C., Johnston, M., Louis, E., Mewes, H., Murakami, Y., Philippsen, P., Tettelin, H., Oliver, S.: Life with 6000 genes. Science 274 (1996) 563–7

    Article  Google Scholar 

  7. King, R., Karwath, A., Clare, A., Dehaspe, L.: Genome scale prediction of protein functional class from sequence using data mining. In: KDD 2000. (2000)

    Google Scholar 

  8. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th International Conference on Very Large Databases (VLDB 94). (1994) Expanded version: IBM Research Report RJ9839, June 1994.

    Google Scholar 

  9. Dehaspe, L., De Raedt, L.: Mining association rules in multiple relations. In: 7th International Workshop on Inductive Logic Programming. (1997)

    Google Scholar 

  10. Utgo., P.: Shift of bias for inductive concept learning. In Michalski, R., Carbonell, J., Mitchell, T., eds.: Machine Learning: An Artificial Intelligence Approach, Volume II. Morgan Kaufmann (1986)

    Google Scholar 

  11. Park, J.S., Chen, M., Yu, P.: Effcient parallel data mining for assocation rules. In: CIKM’ 95. (1995)

    Google Scholar 

  12. Agrawal, R., Shafer, J.: Parallel mining of assocation rules. IEEE Trans. on Knowledge and Data Engineering 8(6) (1996) 962–969

    Article  Google Scholar 

  13. Cheung, D., Ng, V., Fu, A., Fu, Y.: Effcient mining of assocation rules in distributed databases. IEEE Trans. on Knowledge and Data Engineering 8(6) (1996) 911–922

    Article  Google Scholar 

  14. Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for assocation rules. In: SIGMOD’ 97. (1997)

    Google Scholar 

  15. Parthasrathy, S., Zaki, M., Ogihara, M., Li, W.: Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems 3(1) (2001) 1–29

    Article  Google Scholar 

  16. Ullman, J.D.: Principles of Database and Knowledge-Base Systems, Vol. 1 and 2. Computer Science Press, Rockville, Md. (1988)

    Google Scholar 

  17. Dehaspe, L.: Frequent Pattern Discovery in First Order Logic. PhD thesis, Department of Computer Science, Katholieke Universiteit Leuven (1998)

    Google Scholar 

  18. Wallace, M., Runciman, C.: The bits between the lambdas: Binary data in a lazy functional language. In: Proceedings of the International Symposium on Memory Management. (1998)

    Google Scholar 

  19. Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Ramon, J., Vandecasteele, H.: Improving the effciency of Inductive Logic Programming through the use of query packs. Journal of Arti.cial Intelligence Research 16 (2002) 135–166

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Clare, A., King, R.D. (2003). Data Mining the Yeast Genome in a Lazy Functional Language. In: Dahl, V., Wadler, P. (eds) Practical Aspects of Declarative Languages. PADL 2003. Lecture Notes in Computer Science, vol 2562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36388-2_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-36388-2_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00389-2

  • Online ISBN: 978-3-540-36388-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics