Abstract
Critics of lazy functional languages contend that the languages are only suitable for toy problems and are not used for real systems. We present an application (PolyFARM) for distributed data mining in relational bioinformatics data, written in the lazy functional language Haskell. We describe the problem we wished to solve, the reasons we chose Haskell and relate our experiences. Laziness did cause many problems in controlling heap space usage, but these were solved by a variety of methods. The many advantages of writing software in Haskell outweighed these problems. These included clear expression of algorithms, good support for data structures, abstraction, modularity and generalisation leading to fast prototyping and code reuse, parsing tools, profiling tools, language features such as strong typing and referential transparency, and the support of an enthusiastic Haskell community. PolyFARM is currently in use mining data from the Saccharomyces cerevisiae genome and is freely available for non-commercial use at http://www.aber.ac.uk/compsci/Research/bio/dss/polyfarm/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (1999)
Mannila, H.: Methods and problems in data mining. In: International Conference on Database Theory. (1997)
Muggleton, S., ed.: Inductive Logic Programming. Academic Press (1992)
Wrobel, S., Džeroski, S.: The ILP description learning problem: Towards a general model-level definition of data mining in ILP. In: FGML-95 Annual Workshop of the GI Special Interest Group Machine Learning (GI FG 1.1.3). (1995)
King, R., Muggleton, S., Srinivasen, A., Sternberg, M.: Structure-activity relationships derived by machine learning: The use of atoms and their bond connectives to predict mutagenicity by inductive logic programming. Proc. Nat. Acad. Sci. USA 93 (1996) 438–442
Goffeau, A., Barrell., B., Bussey, H., Davis, R., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J., Jacq, C., Johnston, M., Louis, E., Mewes, H., Murakami, Y., Philippsen, P., Tettelin, H., Oliver, S.: Life with 6000 genes. Science 274 (1996) 563–7
King, R., Karwath, A., Clare, A., Dehaspe, L.: Genome scale prediction of protein functional class from sequence using data mining. In: KDD 2000. (2000)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th International Conference on Very Large Databases (VLDB 94). (1994) Expanded version: IBM Research Report RJ9839, June 1994.
Dehaspe, L., De Raedt, L.: Mining association rules in multiple relations. In: 7th International Workshop on Inductive Logic Programming. (1997)
Utgo., P.: Shift of bias for inductive concept learning. In Michalski, R., Carbonell, J., Mitchell, T., eds.: Machine Learning: An Artificial Intelligence Approach, Volume II. Morgan Kaufmann (1986)
Park, J.S., Chen, M., Yu, P.: Effcient parallel data mining for assocation rules. In: CIKM’ 95. (1995)
Agrawal, R., Shafer, J.: Parallel mining of assocation rules. IEEE Trans. on Knowledge and Data Engineering 8(6) (1996) 962–969
Cheung, D., Ng, V., Fu, A., Fu, Y.: Effcient mining of assocation rules in distributed databases. IEEE Trans. on Knowledge and Data Engineering 8(6) (1996) 911–922
Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for assocation rules. In: SIGMOD’ 97. (1997)
Parthasrathy, S., Zaki, M., Ogihara, M., Li, W.: Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems 3(1) (2001) 1–29
Ullman, J.D.: Principles of Database and Knowledge-Base Systems, Vol. 1 and 2. Computer Science Press, Rockville, Md. (1988)
Dehaspe, L.: Frequent Pattern Discovery in First Order Logic. PhD thesis, Department of Computer Science, Katholieke Universiteit Leuven (1998)
Wallace, M., Runciman, C.: The bits between the lambdas: Binary data in a lazy functional language. In: Proceedings of the International Symposium on Memory Management. (1998)
Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Ramon, J., Vandecasteele, H.: Improving the effciency of Inductive Logic Programming through the use of query packs. Journal of Arti.cial Intelligence Research 16 (2002) 135–166
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Clare, A., King, R.D. (2003). Data Mining the Yeast Genome in a Lazy Functional Language. In: Dahl, V., Wadler, P. (eds) Practical Aspects of Declarative Languages. PADL 2003. Lecture Notes in Computer Science, vol 2562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36388-2_4
Download citation
DOI: https://doi.org/10.1007/3-540-36388-2_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00389-2
Online ISBN: 978-3-540-36388-0
eBook Packages: Springer Book Archive