Data Mining in Crystallography

  • D. W. M. Hofmann
  • Liudmila N. Kuleshova

Part of the Structure and Bonding book series (STRUCTURE, volume 134)

Table of contents

  1. Front Matter
    Pages i-xii
  2. Joannis Apostolakis
    Pages 1-35
  3. Christian Buchsbaum, Sabine Hãhler-Schlimm, Silke Rehme
    Pages 37-58
  4. Krishna Rajan
    Pages 59-87
  5. Detlef W. M. Hofmann
    Pages 89-134
  6. Haitao Cheng, Taner Z. Sen, Robert L. Jernigan, Andrzej Kloczkowski
    Pages 135-167
  7. Back Matter
    Pages 169-172

About this book

Introduction

Humans have been “manually” extracting patterns from data for centuries, but the increasing volume of data in modern times has called for more automatic approaches. Early methods of identifying patterns in data include Bayes’ theorem (1700s) and Regression analysis (1800s). The proliferation, ubiquity and incre- ing power of computer technology has increased data collection and storage. As data sets have grown in size and complexity, direct hands-on data analysis has - creasingly been augmented with indirect, automatic data processing. Data mining has been developed as the tool for extracting hidden patterns from data, by using computing power and applying new techniques and methodologies for knowledge discovery. This has been aided by other discoveries in computer science, such as Neural networks, Clustering, Genetic algorithms (1950s), Decision trees (1960s) and Support vector machines (1980s). Data mining commonlyinvolves four classes of tasks: • Classi cation: Arranges the data into prede ned groups. For example, an e-mail program might attempt to classify an e-mail as legitimate or spam. Common algorithmsinclude Nearest neighbor,Naive Bayes classi er and Neural network. • Clustering: Is like classi cation but the groups are not prede ned, so the algorithm will try to group similar items together. • Regression: Attempts to nd a function which models the data with the least error. A common method is to use Genetic Programming. • Association rule learning: Searches for relationships between variables. For example, a supermarket might gather data of what each customer buys.

Keywords

Data Basis Protein Structure Secondary structure clustering crystallography data analysis data mining knowledge discovery neural networks

Editors and affiliations

  • D. W. M. Hofmann
    • 1
  • Liudmila N. Kuleshova
    • 2
  1. 1.FlexCrystUttenreuthGermany
  2. 2.Research & Development in SardiniaCenter for Advanced Studies,PulaItaly

Bibliographic information

  • DOI https://doi.org/10.1007/978-3-642-04759-6
  • Copyright Information Springer-Verlag Berlin Heidelberg 2010
  • Publisher Name Springer, Berlin, Heidelberg
  • eBook Packages Chemistry and Materials Science
  • Print ISBN 978-3-642-04758-9
  • Online ISBN 978-3-642-04759-6
  • Series Print ISSN 0081-5993
  • Series Online ISSN 1616-8550
  • About this book
Industry Sectors
Pharma
Chemical Manufacturing
Biotechnology
Consumer Packaged Goods
Energy, Utilities & Environment