Data mining (DM) is a powerful information technology (IT) tool in today’s competitive business world, especially as our human society entered the Big Data era. From academic point of view, it is an area of the intersection of human intervention, machine learning, mathematical modeling and databases. In recent years, data mining applications have become an important business strategy for most companies that want to attract new customers and retain existing ones. Using mathematical techniques, such as, neural networks, decision trees, mathematical programming, fuzzy logic and statistics, data mining software can help the company discover previously unknown, valid, and actionable information from various and large sources (either databases or open data sources like internet) for crucial business decisions. The algorithms of the mathematical models are implemented through some sort of computer languages, such as C++, JAVA, structured query language (SQL), on-line analysis processing (OLAP) and R. The process of data mining can be categorized as selecting, transforming, mining, and interpreting data. The ultimate goal of doing data mining is to find knowledge from data to support user’s decision. Therefore, data mining is strongly related with knowledge and knowledge management.

According to the definition of Wikipedia, knowledge is a familiarity with someone or something. Knowledge contains “specific” facts, information, descriptions, or skills acquired through experience or education. Generally, knowledge can be divided as “implicit” (hard to be transformed) or “explicit” (easy to be transformed). Knowledge Management (KM) refers to strategies and practices for individual or an organization to find, transmit, and expand knowledge. How to use human knowledge into the data mining process has drawn challenging research problems over the last 30 years when data mining became important knowledge discovery mechanism.

This chapter reviews the trend of research on data mining and knowledge management as the preliminary findings for intelligent knowledge, the key contribution of this book. In Sect. 1.1, the fundamental concepts of data mining is briefly outlined, while Sect. 1.2 provides a high-level description of knowledge management mainly from personal point of view. Section 1.3 summarizes three popular existing research directions about how to use human knowledge in the process of data mining: (1) knowledge used for data preprocessing, knowledge for post data mining and domain-driven data mining.

1.1 Data Mining

The history of data mining can be traced back to more than 200 years ago when people used statistics to solve real-life problems . In the area of statistics, Bayes’ Theorem has been playing a key role in develop probability theory and statistical applications. However, it was Richard Price (1723–1791), the famous statistician, edited Bayes’ Theorem after Thomas Bayes’ death (Bayes and Price 1763). Richard Price is one of scientists who initiated the use of statistics in analyzing social and economic datasets. In 1783, Price published “Northampton table”, which collected observations for calculating of the probability of the duration of human life in England. In this work, Price showed the observations via tables with rows for records and columns for attributes as the basis of statistical analysis. Such tables now are commonly used in data mining as multi-dimensional tables. Therefore, from historical point of view, the multi-dimensional table should be called as “Richard Price Table” while Price can be honored as a father of data analysis, late called data mining. Since 1950s, as computing technology has gradually used in commercial applications, many corporations have developed databases to store and analyze collected datasets. Mathematical tools employed to handle datasets evolutes from statistics to methods of artificial intelligence , including neural networks and decision trees . In 1990s, the database community started using the term “data mining”, which is interchangeable with the term “Knowledge Discovery in Databases” (KDD) (Fayyad et al. 1996). Now data mining becomes the common technology of data analysis over the intersection of human intervention, machine learning , mathematical modeling and databases .

There are different versions of data mining definitions varying from deferent disciplines. For data analysts, data mining discovers the hidden patterns of data from a large-scale data warehouse by precise mathematical means. For practitioners, data mining refers to knowledge discovery from the large quantities of data that stored in computers. Generally speaking, data mining is a computing and analytical process of finding knowledge from data by using statistics, artificial intelligence, and/or various mathematics methods .

In 1990s, mining useful information or discovering knowledge from large databases has been a key research topic for years (Agrawal et al. 1993; Chen et al. 1996; Pass 1997). Given a database containing various records, there are a number of challenging technical and research problems regarding data mining. These problems can be discussed as data mining process and methodology, respectively.

From the aspect of the process, data mining consists of four stages: (1) selecting, (2) transforming, (3) mining, and (4) interpreting. A database contains various data, but not all of which relates to the data mining goal (business objective). Therefore, the related data has to first be selected as identification. The data selection identifies the available data in the database and then extracts a subset of the available data as interested data for the further analysis. Note that the selected variables may contain both quantitative and qualitative data. The quantitative data can be readily represented by some sort of probability distributions, while the qualitative data can be first numericalized and then be described by frequency distributions. The selection criteria are changed with the business objective in data mining. Data transformation converts the selected data into the mined data through certain mathematical (analytical data) models. This type of model building is not only technical, but also a state-of-art (see the following discussion). In general, the consideration of model building could be the timing of data processing, the simple and standard format, the aggregating capability, and so on. Short data processing time reduces a large amount of total computation time in data miming . The simple and standard format creates the environment of information sharing across different computer systems. The aggregating capability empowers the model to combine many variables into a few key variables without losing useful information. In data mining stage, the transformed data is mined using data mining algorithms . These algorithms developed according to analytical models are usually performed by computer languages, such as C++, JAVA, SQL, OLAP and/or R. Finally, the data interpretation provides the analysis of the mined data with respect to the data mining tasks and goals. This stage is very critical. It assimilates knowledge from different mined data. The situation is similar to playing “puzzles”. The mined data just like “puzzles”. How to put them together for a business purpose depends on the business analysts and decision makers (such as managers or CEOs). A poor interpretation analysis may lead to missing useful information , while a good analysis can provide a comprehensive picture for effective decision making.

From the aspect of methodology, data mining can be achieved by Association, Classification, Clustering, Predictions, Sequential Patterns, and Similar Time Sequences (Cabena et al. 1998). In Association, the influence of some item in a data transaction on other items in the same transaction is detected and used to recognize the patterns of the selected data. For example, if a customer purchases a laptop PC (X), then he or she also buys a Mouse (Y) in 60 % cases. This pattern occurs in 5.6 % of laptop PC purchases. An association rule in this situation can be “X implies Y, where 60 % is the confidence factor and 5.6 % is the support factor”. When the confidence factor and support factor are represented by linguistic variables “high” and “low”, respectively (Jang et al. 1997), the association rule can be written as a fuzzy logic form: “X implies Y is high, where the support factor is low”. In the case of many qualitative variables, the fuzzy association is a necessary and promising technique in data mining .

In Classification , the methods intend to learn different functions that map each item of the selected data into one of predefined classes . Given a set of predefined classes, a number of attributes, and a “learning (or training) set”, the classification methods can automatically predict the class of other unclassified data of the learning set. Two key research problems related to classification results are the evaluation of misclassification and the prediction power. Mathematical techniques that are often used to construct classification methods are binary decision trees , neural networks, linear programming, and statistics . By using binary decision trees, a tree induction model with “Yes-No” format can be built to split data into different classes according to the attributes. The misclassification rate can be measured by either statistical estimation (Breiman et al. 1984) or information entropy (Quinlan 1986). However, the classification of tree induction may not produce an optimal solution in which the prediction power is limited. By using neural networks, a neural induction model can be built on a structure of nodes and weighted edges. In this approach, the attributes become input layers while the classes associated with data are output layers. Between input layers and output layers, there are a larger number of hidden layers processing the accuracy of the classification . Although the neural induction model has a better result in many cases of data mining, the computation complexity of hidden layers (since the connection is nonlinear) can create the difficulty in implementing this method for data mining with a large set of attributes. In linear programming approaches, the classification problem is viewed as a linear program with multiple objectives (Freed and Glover 1981; Shi and Yu 1989) . Given a set of classes and a set of attribute variables, one can define a related boundary value (or variables) separating the classes. Then each class is represented by a group of constraints with respect to a boundary in the linear program. The objective function can be minimizing the overlapping rate of the classes and maximizing the distance between the classes (Shi 1998). The linear programming approach results in an optimal classification. It is also very feasible to be constructed and effective to separate multi-class problems . However, the computation time may exceed that of statistical approaches. Various statistical methods, such as linear discriminant regression, the quadratic discriminant regression, and the logistic discriminant regression are very popular and commonly used in real business classifications. Even though statistical software has been well developed to handle a large amount of data, the statistical approaches have disadvantage in efficiently separating multi-class problems, in which a pair-wise comparison (i.e., one class vs. the rest of classes) has to be adopted.

Clustering analysis uses a procedure to group the initially ungrouped data according to the criteria of similarity in the selected data. Although Clustering does not require a learning set, it shares a common methodological ground with Classification . In other words, most of mathematical models mentioned above for Classification can be applied to Clustering analysis. Predictions are related to regression techniques. The key idea of Prediction analysis is to discover the relationship between the dependent and independent variables, the relationship between the independent variables (one vs. another; one vs, the rest; and so on). For example, if the sales are an independent variable, then the profit may be a dependent variable. By using historical data of both sales and profit, either linear or nonlinear regression techniques can produce a fitted regression curve which can be used for profit prediction in the future. Sequential Patterns want to find the same pattern of data transaction over a business period. These patterns can be used by business analysts to study the impact of the pattern in the period. The mathematical models behind Sequential Patterns are logic rules, fuzzy logic, etc. As an extension of Sequential Patterns, Similar Time Sequences are applied to discover sequences similar to a known sequence over the past and current business periods. Through the data mining stage, several similar sequences can be studied for the future trend of transaction development. This approach is useful to deal with the databases which have time-series characteristics .

1.2 Knowledge Management

Even before data mining, knowledge management is another field which brings numerous impacts on human society. Collecting and disseminating knowledge has been human beings’ important social activity for thousands of years. In Western culture, Library of Alexandria in Egypt (200 B.C.) collected more than 500,000 works and hard written copies. The Bible also contains knowledge and wisdom in addition to the religious contents. In Chinese culture, the Lun Yu, Analects of Confucius, the Tao Te Ching of Lao Tsu, and The Art of War of Sun Tzu have been affecting human beings for generations. All of them have served as knowledge sharing functions.

The concepts of the modern knowledge management started from twentieth century and the theory of knowledge management gradually formulated in the last 30 years. Knowledge Management can be regarded as an interdisciplinary business methodology within the framework of an organization as its focus (Awad and Ghaziri 2004). In the category of management, the representations of the knowledge can be (1) state of mind; (2) object; (3) process; (4) access to information ; and (5) capacity. Furthermore, knowledge can be classified as tacit (or implicit) and explicit (Alavi 2000; Alavi and Leidner 2001). For a corporation, the tasks of knowledge management inside organization consist of knowledge innovation, knowledge sharing, knowledge transformation and knowledge dissemination. Since explicit knowledge may be converted into different digital forms via a systematical and automatics means, such as information technology, development of knowledge management naturally relates with applications of information technology , including data mining techniques. Basic arguments between knowledge management and data mining can be shown as in Fig. 1.1. Data can be a fact of an event or record of transaction. Information is data that has been processed in some way. Knowledge can be useful information. It changes with individual, time and situation (see Chap. 2 for definitions).

Fig. 1.1
figure 1

Relationship of Data, Information and Knowledge

Fig. 1.2
figure 2

Data Mining and Knowledge Management

Although data mining and knowledge management have been developed independently as two distinct fields in academic community, data mining techniques have playing a key role in the development of corporative knowledge management systems. In terms of support business decision making, their general relationship can be demonstrated by Fig. 1.2. Figure 1.3, however, is used to shown how they can act each other with business intelligence in a corporative decision support system (Awad and Ghaziri 2004).

Fig. 1.3
figure 3

Data Mining, Business Intelligence and Knowledge Management

1.3 Knowledge Management Versus Data Mining

Data mining is a target-oriented knowledge discovering process. Given a business objective, the analysts have to first transfer it into certain digital representation which can be hopefully discovered from the hidden patterns resulted from data mining. This knowledge can be considered as the target knowledge. The purpose of data mining is to discover such knowledge. We note that in order to find it in the process of using and analyzing available data, the analysts have to use other related knowledge to achieve target knowledge the different working stages. Researchers have been extensively studied how to incorporate knowledge in the data mining process for the target knowledge. This section will briefly review the following approaches that differ from the proposed intelligent knowledge .

1.3.1 Knowledge Used for Data Preprocessing

In terms of data mining process, the four stages mentioned in Sect. 1.1 can be reviewed as three categories: (1) data preprocessing that encloses selecting and transforming stages; (2) mining, and (3) post mining analysis which is interpreting. Data preprocessing is not only important, but also tedious due to the variety of tasks have to carry out, such as data selections, data cleaning, data fusion on different data sources (especially in the case of Big Data where semi-structural and non-structural data come with traditionally structural data), data normalization, etc. The purpose of data preprocessing is to transfer dataset into a multi-dimensional table or pseudo multi-dimensional table which can be calculated by available data mining algorithms. There are a number of technologies to deal with the components of data preprocessing. However, the existing research problem is how to choose or employ an appropriate technique or method for a given data set so as to reach the better trade-off between the processing time and quality.

From the current literature, either direct human knowledge (e.g., the experience of data analysts) or knowledge agent (e.g., computer software) may be used to both save the data preprocessing time and maintain the quality. The automated intelligent agent of Eliza (Weizenbaum 1966) is one of the earlier knowledge agent versions, which performs natural language processing to ask users questions and used those answers to create subsequent questions. This agent can be applied to guide the analysts who may lack the understanding of data to complete the processing tasks. Recently, some researcher implement well-known methods to design particular knowledge based agent for data preprocessing. For example, Othman et al. (2009) applied the Rough Sets to construct knowledge based agent method for creating the data preprocessing agent’s knowledge. This method first to create the preprocessing agent’s Profile Data and then use rough set modeling to build agent’s knowledge for evaluating of known data processing techniques over different data sets. Some particular Profile Data are the number of records, number of attributes, number of nominal attributes, number of ordinal attribute, number of continuous attributes, number of discrete attributes, number of classes and type of class attribute. These meta data formed a structure of a multi-dimensional table as a guided map for effective data preprocessing.

1.3.2 Knowledge for Post Data Mining

Derive knowledge from the results of data mining (it is called Interpreting stage in this chapter) has been crucial for the whole process of data mining. All experts of data mining agree that data mining provides “hidden patterns” , which may not be regarded as “knowledge” although it is later called “rough knowledge” in this book. The basic reason is that knowledge is changed with not only individuals, but also situations. To one person, it is knowledge while it not knowledge for another person. Knowledge is for someone today, but not tomorrow. Therefore, conducting post data mining analysis for users to identify knowledge from the results of data mining has drawn a great deal of research interests. The existing research findings, however, related how to develop automatic algorithms to find knowledge in the domain of computing areas, which differs from our main topics of intelligent knowledge in the book. There are a number of particular methods by designing the algorithms for knowledge from post data mining.

A general approach in post data mining is to define the measurements of “interestingness” on the results of data mining that can provide a strong interests, such as “high ranked rules”, “high degree of correlations” and so on for the end users as their knowledge (for instance, see Shekar and Natarajan 2004). Based on interestingness, model evaluation of data mining is supposed to identify the real interesting knowledge model, while knowledge representation is to use visualization and other techniques to provide users knowledge after mining (Guillet and Hamilton 2007). Interestingness can be divided into objective measure and subjective measure. Objective measure is mainly based on the statistical strength or attributes of models found, while subjective measure derives from the users’ belief or expectation (Mcgarry 2005).

There is no unified view about how the interestingness should be used. Smyth and Goodman (1992) proposed a J-Measure function that can be quantified information contained in the rules. Toivonen et al. (1995) used cover rules that is a division of mining association rule sets based on consequent rules as interestingness. Piatetsky-Shapiro et al. (1997) studied rules measurement by the independence of events. Aggarwal and Yu (1998) explored a collection of intensity by using the idea of "greater than expected” to find meaningful association rules. Tan et al. (2002) investigated correlation coefficient for interestingness. Geng and Hamilton (2007) provided nine standards of most researchers’ concerns and 38 common objective measurement methods. Although these methods are different in forms, they all concern about one or several standards of measuring interestingness. In addition, many researchers think a good interestingness measure should include generality and reliability considerations (Klosgen 1996; Piatetsky et al. 1997; Gray and Orlowska 1998; Lavrac et al. 1999; Yao 1999; Tan et al. 2002). Note that objective measurement method is based on original data, without any additional knowledge about these data. Most of the measurement methods are based on probability, statistics or information theory, expressing the correlation and the distribution in strict formula and rules. Mathematical nature is easy to analyze and be compared with, but these methods do not consider the detailed context of application, such as decision-making objectives, the users’ background knowledge and preferences into account (Geng and Hamilton 2007).

In the aspect of subjective interestingness, Klemettinen et al. (1994) studied rule templates so that users can use them to define one certain type of rules that is valuable to solve the value discriminant problem of rules. Silberschatz and Tuzhilin (1996) used belief system to measure non-anticipatory. Kamber and Shinghal (1996) provided necessity and sufficiency to evaluate the interest degree of characteristic rules and discriminant rules. Liu et al. (1997) proposed the rules which could identify users’ interest through the method of users’ expectations. Yao et al. (2004) proposed a utility mining model to find the rules of greatest utility for users. Note that subjective measure takes into account users as well as data. In the definition of a subjective measure, the field and background knowledge of users are expressed as beliefs or expectations. However, the expression of users’ knowledge by the subjective measure is not an easy task. Since the effectiveness of using the subjective measure depends on users’ background knowledge, users who have more experiences in a data mining process could be efficient than others.

Because these two measurement methods have their own advantages and disadvantages, a combination of objective and subjective measure were merged (Geng and Hamilton 2007). Freitas (1999) even considered the objective measure can be used as the first-level filter to select the mode of potential interest and then use subjective measure for second-level screening. In this way, knowledge that users feel genuinely interested in can be formed.

While there are a number of research papers contributing to the interestingness of associations, few can be found for the interestingness of classification except for using the accuracy rate to measure the results of classification algorithms . This approach lacks the interaction with users. Arias et al. (2005) constructed a framework for evaluation of classification results of audio indexing. Rachkovskij (2001) constructed DataGen to generate datasets used to evaluate classification results.

The clustering results are commonly evaluated from two criteria. One is to maximize the intra-class similarity and another is to minimize inter-class similarity. Dunn (1974) proposed an indicator for discovering the separate and close clustering based on the basic criteria. The existing data mining research on model evaluation of data mining and knowledge representation indicates that in order to find knowledge for specific users from the results of data mining , more advanced measurements that combine the preferences of users should be developed, in conjunction with some concepts of knowledge management . A variety of methods have been proposed along with this approaches. For example, Zhang et al. (2003) studied a post data mining method by transferring infrequent itemsets to frequent itemsets, which implicitly used the concept of “interestingness” measure to describe the knowledge from the results of data mining. Gibert et al. (2013) demonstrated a tool to bridge logistic regression and the visual profile’s assessment grid methods for indentifying decision support (knowledge) in medical diagnosis problems. Yang et al. (2007) considered how to convert the decision tree results into the users’ knowledge, which may not only keep the favorable results like desired results, but also change unfavorable ones into favorable ones in post data mining analysis. These findings are close to the concept of intelligent knowledge proposed in this book. They, however, did not get into the systematic views of how to address the scientific issues in using human knowledge to distinguish the hidden patterns for decision support.

1.3.3 Domain Driven Data Mining

There has been a conceptual research approach called “domain driven data mining” , which considers multiple aspects by incorporating human knowledge into the process of data mining (see Cao et al. 2006, 2010; Cao and Zhang 2007). This approach argues that knowledge discovered from algorithm-dominated data mining process is generally not interesting to business needs. In order to identify knowledge for taking effective actions on real-world applications, data mining, conceptually speaking, should involve domain intelligence in the process. The modified data mining process has six characteristics: (i) problem understanding has to demonstrate domain specification and domain intelligence, (ii) data mining is subject to constraint-based context, (iii) in-depth patterns can result in knowledge, (iv) data mining is a loop-closed iterative refinement process, (v) discovered knowledge should be actionable in business, and (vi) a human-machine-cooperated infrastructure should embedded in the mining process (Cao and Zhang 2007).

Although this line of research provided a macro view of the framework to address how important human (here called domain) knowledge can play in the process of data mining to assist in identifying actionable decision support to the interested users, it did not show the theoretical foundation how to combine domain knowledge with data mining in abstract format, which can give a guidance to analysts to construct an automatic way (the algorithm associated with any know data mining algorithm that can be embedded in the data mining process) if the domain knowledge is quantitatively presented. One of goals of this book is to fill this open research problem.

1.3.4 Data Mining and Knowledge Management

There are some cross-field study between data mining and knowledge management in the literature. For example, Anand et al. (1996) proposed that the prior knowledge of the users and previously discovered knowledge should be jointly considered to discover new knowledge. Piatesky-Shapiro and Matheus (1992) explored how domain knowledge can be used in initial discovery and restrictive searching. Yoon and Kerschberg (1993) discussed the coordination of new and old knowledge in a concurrent evolution thinking of knowledge and database. However, there is no a systematic study and concrete theoretical foundation for the cross-field study between data mining and knowledge management.

Management issues, such as expert systems and decision support systems , have been discussed by some data mining scholars. Fayyad et al. (1996) described knowledge discovery project based on the knowledge through data mining. Cauvin et al. (1996) studied knowledge expression based on data mining. Lee and Stolfo (2000) constructed an intrusion detection system based on data mining . Polese et al. (2002) established a system based on data mining to support tactical decision-making . Nemati et al. (2002) constructed a knowledge warehouse integrating knowledge management, decision support, artificial intelligence and data mining technology. Hou et al. (2005) studied an intelligent knowledge management model, which is different from what we discuss in the book.

We observe that the above research of knowledge (we late call rough knowledge) generated from data mining has attracted academic and users’ attention, and in particular, the research of model evaluation has been investigated, but is not fully adaptable for the proposed study in the paper based on the following reasons. First, the current research concentrates on model evaluation, and pays more attention to the mining of association rules, especially the objective measure. As we discussed before, objective measurement method is based on original data, without any additional knowledge about these data. Most of the measurement methods are based on probability, statistics or information theory, expressing the correlation and the distribution in strict formula and rules. They are hardly to be combined with expertise. Second, the application of domain knowledge is supposed to relate with research of actionable knowledge that we will discuss late, but should not be concentrated in the data processing stage as the current study did. The current study favored more on technical factors than on the non-technical factors, such as scenario, expertise, user preferences, etc. Third, the current study shows that there is no framework of knowledge management technology to well support analytical original knowledge generated from data mining , which to some extent means that the way of incorporating knowledge derived from data mining into knowledge management areas remains unexplored. Finally, there is lack of systematic theoretical study in the current work from the perspective of knowledge discovery generated from data based on the organizational level. The following chapter will address the above problems.