Bioinformatics Adventures in Database Research

  • Jinyan Li
  • See-Kiong Ng 
  • Limsoon Wong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2572)


Informatics has helped launch molecular biology into the genomic era. It appears certain that informatics will remain a major contributor to molecular biology in the post-genome era.We discuss here data integration and datamining in bioinformatics, as well as the role that database theory played in these topics. We also describe LIMS as a third key topic in bioinformatics where advances in database system and theory can be very relevant.


Query Language Database Research Sequence Segment Translation Initiation Site Laboratory Information Management System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB’94, pp 487–499.Google Scholar
  2. 2.
    P. G. Baker and A. Brass. Recent development in biological sequence databases. Curr. Op. Biotech., 9:54–58, 1998.CrossRefGoogle Scholar
  3. 3.
    R. J. Bayardo. Efficiently mining long patterns from databases. In SIGMOD’98, pp 85–93.Google Scholar
  4. 4.
    P. Buneman et al. Comprehension syntax. SIGMOD Record, 23:87–96, 1994.CrossRefGoogle Scholar
  5. 5.
    P. Buneman et al. Principles of programming with complex objects and collection types. TCS, 149:3–48, 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    J. Chen et al. The Kleisli query system as a backbone for bioinformatics data integration and analysis. In Bioinformatics: Managing Scientific Data, Morgan Kaufmann. To appear.Google Scholar
  7. 7.
    T.M. Cover and P.E. Hart. Nearest neighbour pattern classification. IEEE Trans. Info. Theory, 13:21–27, 1967.zbMATHCrossRefGoogle Scholar
  8. 8.
    L. Damas and R. Milner. Principal type-schemes for functional programs. In POPL’82, pp 207–212.Google Scholar
  9. 9.
    S. Davidson et al. BioKleisli:A digital library for biomedical researchers. Intl. J. Digit. Lib., 1:36–53, 1997.Google Scholar
  10. 10.
    Department of Energy. DOE Informatics Summit Meeting Report, 1993.Google Scholar
  11. 11.
    G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD’99, pp 15–18.Google Scholar
  12. 12.
    J. Li et al. The space of jumping emerging patterns and its incremental maintenance algorithms In ICML’00, pp 551–558.Google Scholar
  13. 13.
    U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In IJCAI’93, pp 1022–1029Google Scholar
  14. 14.
    D. Gerhold et al. DNA chips: promising toys have become powerful tools. Trends Biochem. Sci., 24:168–173, 1999.CrossRefGoogle Scholar
  15. 15.
    T.R. Golub et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.CrossRefGoogle Scholar
  16. 16.
    L.M. Haas et al. DiscoveryLink:A system for integrated access to life sciences data sources. IBM Systems Journal, 40:489–511, 2001.CrossRefGoogle Scholar
  17. 17.
    A.G. Hatzigeorgiou. Translation initiation start prediction in human cDNAs with high accuracy. Bioinformatics, 18:343–350, 2002.CrossRefGoogle Scholar
  18. 18.
    G. Jaeschke and H. J. Schek. Remarks on the algebra of non-first-normal-form relations. In PODS’82, pp 124–138.Google Scholar
  19. 19.
    M. Kozak. An analysis of 5’-noncoding sequences from 699 vertebrate messenger RNAs. NAR, 15:8125–8148, 1987.CrossRefGoogle Scholar
  20. 20.
    E.S. Lander et al. Initial sequencing and analysis of the human genome. Nature, 409:861–921, 2001.CrossRefGoogle Scholar
  21. 21.
    P. Langley et al. An analysis of Bayesian classifier. In AAAI’92, pp 223–228.Google Scholar
  22. 22.
    J. Li et al. Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients. Bioinformatics, 2002. To appear.Google Scholar
  23. 23.
    J. Li and L. Wong. Geography of differences between two classes of data. In PKDD’02, pp 325–337.Google Scholar
  24. 24.
    L. Libkin and L. Wong. Query languages for bags and aggregate functions. JCSS, 55(2):241–272, October 1997.zbMATHMathSciNetGoogle Scholar
  25. 25.
    H. Liu and R. Sentiono. Chi2: Feature selection and discretization of numeric attributes. In Proc. IEEE 7th Intl. Conf. on Tools with Artificial Intelligence, pp 338–391, 1995.Google Scholar
  26. 26.
    A. Makinouchi. A consideration on normal form of not necessarily normalised relation in the relational data model. In VLDB’77, pp 447–453.Google Scholar
  27. 27.
    H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1:241–258, 1997.CrossRefGoogle Scholar
  28. 28.
    Y. Papakonstantinou et al. Object exchange across heterogenous information sources. In ICDE’95, pp 251–260.Google Scholar
  29. 29.
    P. Pearson et al. The GDB human genome data base anno 1992. NAR, 20:2201–2206, 1992.Google Scholar
  30. 30.
    A.G. Pedersen and H. Nielsen. Neural network prediction of translation initiation sites in eukaryotes: Perspectives for EST and genome analysis. ISMB, 5:226–233, 1997.Google Scholar
  31. 31.
    J.R. Quinlan. C4.5: Program for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
  32. 32.
    D. E. Rumelhart et al. Learning representations by back-propagating errors. Nature, 323:533–536, 1986.CrossRefGoogle Scholar
  33. 33.
    G. D. Schuler et al. Entrez: Molecular biology database and retrieval system. Methods Enzymol., 266:141–162, 1996.CrossRefGoogle Scholar
  34. 34.
    D.B. Searls. Using bioinformatics in gene and drug discovery. DDT, 5:135–143, 2000.Google Scholar
  35. 35.
    S.J. Thomas and P.C. Fischer. Nested relational structures. In Advances in Computing Research: The Theory of Databases, pp 269–307, 1986.Google Scholar
  36. 36.
    V.N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.Google Scholar
  37. 37.
    P. Wadler. Comprehending monads. Math. Struct. Comp. Sci., 2:461–493, 1992.zbMATHMathSciNetCrossRefGoogle Scholar
  38. 38.
    L. Wong. Normal forms and conservative extension properties for query languages over collection types. JCSS, 52:495–505, 1996.zbMATHGoogle Scholar
  39. 39.
    L. Wong. Kleisli, a functional query system. JFP, 10:19–56, 2000.CrossRefGoogle Scholar
  40. 40.
    L. Wong. Kleisli, its exchange format, supporting tools, and an application in protein interaction extraction. In BIBE’00, pp 21–28.Google Scholar
  41. 41.
    E.J. Yeoh et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1:133–143, 2002.CrossRefGoogle Scholar
  42. 42.
    F. Zeng et al. Using feature generation and feature selection for accurate prediction of translation initiation sites. In GIW’02. To appear.Google Scholar
  43. 43.
    A. Zien et al. Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics, 16:799–807, 2000.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Jinyan Li
    • 1
  • See-Kiong Ng 
    • 1
  • Limsoon Wong
    • 1
  1. 1.Laboratories for Information TechnologySingapore

Personalised recommendations