Data Mining the Yeast Genome in a Lazy Functional Language

Clare, Amanda; King, Ross D.

doi:10.1007/3-540-36388-2_4

Amanda Clare⁶ &
Ross D. King⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2562))

Included in the following conference series:

International Symposium on Practical Aspects of Declarative Languages

288 Accesses
11 Citations

Abstract

Critics of lazy functional languages contend that the languages are only suitable for toy problems and are not used for real systems. We present an application (PolyFARM) for distributed data mining in relational bioinformatics data, written in the lazy functional language Haskell. We describe the problem we wished to solve, the reasons we chose Haskell and relate our experiences. Laziness did cause many problems in controlling heap space usage, but these were solved by a variety of methods. The many advantages of writing software in Haskell outweighed these problems. These included clear expression of algorithms, good support for data structures, abstraction, modularity and generalisation leading to fast prototyping and code reuse, parsing tools, profiling tools, language features such as strong typing and referential transparency, and the support of an enthusiastic Haskell community. PolyFARM is currently in use mining data from the Saccharomyces cerevisiae genome and is freely available for non-commercial use at http://www.aber.ac.uk/compsci/Research/bio/dss/polyfarm/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Mannila, H.: Methods and problems in data mining. In: International Conference on Database Theory. (1997)
Google Scholar
Muggleton, S., ed.: Inductive Logic Programming. Academic Press (1992)
Google Scholar
Wrobel, S., Džeroski, S.: The ILP description learning problem: Towards a general model-level definition of data mining in ILP. In: FGML-95 Annual Workshop of the GI Special Interest Group Machine Learning (GI FG 1.1.3). (1995)
Google Scholar
King, R., Muggleton, S., Srinivasen, A., Sternberg, M.: Structure-activity relationships derived by machine learning: The use of atoms and their bond connectives to predict mutagenicity by inductive logic programming. Proc. Nat. Acad. Sci. USA 93 (1996) 438–442
Google Scholar
Goffeau, A., Barrell., B., Bussey, H., Davis, R., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J., Jacq, C., Johnston, M., Louis, E., Mewes, H., Murakami, Y., Philippsen, P., Tettelin, H., Oliver, S.: Life with 6000 genes. Science 274 (1996) 563–7
Article Google Scholar
King, R., Karwath, A., Clare, A., Dehaspe, L.: Genome scale prediction of protein functional class from sequence using data mining. In: KDD 2000. (2000)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th International Conference on Very Large Databases (VLDB 94). (1994) Expanded version: IBM Research Report RJ9839, June 1994.
Google Scholar
Dehaspe, L., De Raedt, L.: Mining association rules in multiple relations. In: 7th International Workshop on Inductive Logic Programming. (1997)
Google Scholar
Utgo., P.: Shift of bias for inductive concept learning. In Michalski, R., Carbonell, J., Mitchell, T., eds.: Machine Learning: An Artificial Intelligence Approach, Volume II. Morgan Kaufmann (1986)
Google Scholar
Park, J.S., Chen, M., Yu, P.: Effcient parallel data mining for assocation rules. In: CIKM’ 95. (1995)
Google Scholar
Agrawal, R., Shafer, J.: Parallel mining of assocation rules. IEEE Trans. on Knowledge and Data Engineering 8(6) (1996) 962–969
Article Google Scholar
Cheung, D., Ng, V., Fu, A., Fu, Y.: Effcient mining of assocation rules in distributed databases. IEEE Trans. on Knowledge and Data Engineering 8(6) (1996) 911–922
Article Google Scholar
Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for assocation rules. In: SIGMOD’ 97. (1997)
Google Scholar
Parthasrathy, S., Zaki, M., Ogihara, M., Li, W.: Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems 3(1) (2001) 1–29
Article Google Scholar
Ullman, J.D.: Principles of Database and Knowledge-Base Systems, Vol. 1 and 2. Computer Science Press, Rockville, Md. (1988)
Google Scholar
Dehaspe, L.: Frequent Pattern Discovery in First Order Logic. PhD thesis, Department of Computer Science, Katholieke Universiteit Leuven (1998)
Google Scholar
Wallace, M., Runciman, C.: The bits between the lambdas: Binary data in a lazy functional language. In: Proceedings of the International Symposium on Memory Management. (1998)
Google Scholar
Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Ramon, J., Vandecasteele, H.: Improving the effciency of Inductive Logic Programming through the use of query packs. Journal of Arti.cial Intelligence Research 16 (2002) 135–166
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computational Biology Group, Department of Computer Science, University of Wales Aberystwyth, SY23 3DB, Penglais, Aberystwyth, UK
Amanda Clare & Ross D. King

Authors

Amanda Clare
View author publications
You can also search for this author in PubMed Google Scholar
Ross D. King
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department Logic and Functional Programming Group, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby B.C., Canada
Veronica Dahl
Avaya Labs, 233 Mount Airy Road, Basking Ridge, 07920, NJ, USA
Philip Wadler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Clare, A., King, R.D. (2003). Data Mining the Yeast Genome in a Lazy Functional Language. In: Dahl, V., Wadler, P. (eds) Practical Aspects of Declarative Languages. PADL 2003. Lecture Notes in Computer Science, vol 2562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36388-2_4

Download citation

DOI: https://doi.org/10.1007/3-540-36388-2_4
Published: 16 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00389-2
Online ISBN: 978-3-540-36388-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics