Abstract
The rapid and constant change in information technologies (IT), organizational forms, and social structures is challenging our existing theories of the impact IT on organizations and society. A basic problem for researchers is how to generate testable hypotheses about the given area of research. However, new IT offer opportunities for information processing and problem solving that could extend the capacity of researchers to generate hypotheses and systematically explore the limitations of any theory. The idea of using IT to support IS research is not new. In this chapter, we explore and illustrate how data mining techniques could be applied to assist researchers in systematic theory testing and development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Science: Conjectures and Refutations, in Popper, K. Conjectures and Refutations, 2002 Edition, pp. 70–71.
- 2.
A discussion of the validation of the new instrument is beyond the scope of this paper. However, in validating the instrument, we used principal axis factoring and varimax rotation with Kaiser normalization. The reliability (alphas) for the groups of items were: System Quality, 0.9603; Documentation, 0.9337; Ease of Use 0.9266; Performance, 0.927; Utilization, 0.438; System Reliability, 0.8860; and Authorization 0.7618.
References
Benbasat I, Zmud R (1999) Empirical research in information systems: the practice of relevance. MIS Q 23(1):3–16
Brusic V, Zeleznikow J (1999) Knowledge discovery and data mining in biological databases. Knowl Eng Rev 14:257–277
Chalmers AF (1994) What is this thing called science? 3rd edn. Hackett Publishing
Dipert R (1995) Peirce’s underestimated role in the history of logic. In: Ketner K (ed) Peirce and contemporary thought. Fordham University Press, New York
Doll WJ, Torkzadeh G (1988) The measurement of end-user computing satisfaction. MIS Q 12(2):259–274
Etezadi-Amoli J, Farhoomand AF (1996) A structural model of end user computing satisfaction and user performance. Inf Manage 30(2):65–73
Fann KT (1970) Peirce’s theory of abduction. Martinus Nijhoff, Amsterdam
Grimes TR (1990) Truth, content, and the Hypothetico-Deductive method. Philosophy of Science, 57, 514–522
Goodhue DL, Thompson RL (1995) Task-technology fit and individual performance. MIS Q 19(2):213–236
Hanson NR (1961) Is there a logic of discovery. In: Feigle H, Maxwell G (eds) Current issues in the philosophy of science. Holt, Rinehart and Winston, pp 20–35
Harman G (1965) Inference to the best explanation. Philos Rev 74:88–95
Hintikka J (1968) The varieties of information and scientific explanation. In: van Rootselaar B, Staal JF (eds) Logic, methodology and philosophy of science III. North Holland, pp 151–171
Hintikka J (1997) The place of CS Peirce in the history of logical theory. In: Lingua Universalis vs Calculus Ratiocinator, selected papers 2. Kluwer, pp 140–161
Kim H, Koehler G (1995) Theory and practice of decision tree induction. Omega 23(6):637–652
Ko M, Osei-Bryson K-M (2004a) Exploring the relationship between information technology investments and firm performance productivity using regression splines analysis. Inf Manage 42:1–13
Ko M, Osei-Bryson K-M (2004b) Using regression splines to assess the impact of information technology investments on productivity in the health care industry. Inf Syst J 14(1):43–63
Lee C, Irizarry K (2001) The GeneMine system for genome/proteome annotation and collaborative data mining. IBM Syst J 40(2):592–603
Niiniluoto I (1993) Peirce’s theory of statistical explanation. In: Moore EC (ed) Charles S Peirce and the philosophy of science. The University of Alabama Press, Tuscaloosa, pp 186–207
Niiniluoto I (1999) Defending abduction. Proc Philos Sci 66:S436–S451
Palys TS (2003) Research decisions: quantitative and qualitative perspectives, 3rd edn. Nelson, Scarborough
Popper KR (1957) The aim of science. Ratio 1
Popper K (1963) Conjectures and refutations: the growth of scientific knowledge. Routledge and Kegan Paul, London, UK
Popper KR (1968) The logic of scientific discovery. Harper Torch Books, New York
Putnam H (1982) Peirce the logician. Historia Math 9:290–301
Quine WV (1995) Peirce’s Logic. In: Ketner KL (ed) Peirce and contemporary thought. Fordham, New York, pp 23–31
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Tursman R (1987) Peirce’s theory of scientific discovery. Indiana University Press, Bloomington
Acknowledgments
Some of the material in this chapter previously appeared in the paper “Using Decision Tree Modelling to Support Peircian Abduction in IS Research: A Systematic Approach for Generating and Evaluating Hypotheses for Systematic Theory Development,” Information Systems Journal 21:5, 407–440 (2011).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A
1.1 A Procedure for Implementing the Methodology
-
Step 1:
Preparation (Researcher)
-
1.
Identify Dependent variables: Identify dependent variables (e.g., Performance).
-
2.
Identify Possible Mediator variables: Identify possible mediator variables (e.g., Utilization).
-
3.
Identify Independent variables: Identify independent variables (e.g., System Quality).
-
4.
Specify the Discretization Method: An example of such a method is presented in Table 1.
-
5.
Identify the Target Events for the Dependent and Mediator Variables: For each dependent variable, identify target events (e.g., Performance is High ≡ the value of Performance is in the [5.5–7] interval) that may be of interest. Do similarly for the mediator variable(s).
-
6.
Discretize All Ordinal Variables: For each ordinal variable, discretize the given variable using the specified discretization method.
-
7.
Specify DT Generation Parameters: Specify relevant values for data partitioning parameters (e.g., distribution of cases in training, validation, and test datasets; stratification variables), DT induction parameters (e.g., splitting method options, minimum observations per leaf).
-
8.
Specify Thresholds: Specify α, the significance level for statistical testing of the hypotheses, and τ 0, the threshold for p 0. A Strong Single Rule Hypothesis will not be generated for any rule for which the highest p 0 that is supported by the data is below τ 0.
-
9.
Specify the Set of Decision Rules for Abducting and Evaluating Sibling Rules Hypotheses: A candidate Sibling Rules Hypothesis is formulated based on the relative frequency distribution of the target event for the set of sibling rules that are associated with the predictor variable. Appendix B provides an example of a Set of Decision Rules for Formulating and Evaluating Candidate Sibling Rules Hypotheses.
-
1.
-
Step 2:
Hypotheses Variables (Automatic)
-
For each dependent variable:
-
Substep 2a: Generate DTs for the given Dependent Variable: Generate a set of DTs using the discretized dataset and the combination of DT generation parameter values that were specified in Step 1.
-
Substep 2b: Abduct and Evaluate Sibling Rules Hypotheses for the given Dependent Variable: In this substep, for each DT, a Sibling Rules Hypothesis will be abducted for each set of sibling rules if the relationship between the associated relative frequencies is included in the Set of Decision Rules for Formulating and Evaluating Candidate Sibling Rules Hypotheses. Each such abducted Sibling Rules Hypothesis is indirectly evaluated by subjecting the associated set of surrogate hypotheses to statistical testing. If they are all accepted, then there is good reason to believe that the Sibling Rules Hypothesis will not be rejected, and so it is not rejected.
-
Substep 2c: Abduct Strong Single Rule Hypotheses for the given Dependent Variable: For each sibling rule that is statistically different from each of its other sibling rules at significance level α, determine whether it is a strong rule by testing the surrogate hypothesis: p 0 ≥ τ 0. For each such surrogate hypothesis that is accepted, use the given rule to abduct a corresponding Strong Single Rule Hypothesis.
-
-
-
Step 3:
Identify Mediator Variables (Automatic)
-
Examine the set of supported hypotheses from Step 2 to determine whether any potential mediator variable is included at least one of the abducted hypotheses for one of the dependent variables.
-
-
Step 4:
Generate Hypotheses for Mediator Variables (Automatic)
-
Step 4 is executed for each mediator variable that was included in a condition event of a hypothesis for one of the dependent variables. If there is no such mediator variable, then this step is bypassed.
-
For each mediator variable:
-
Substep 4a: Generate DTs for Predicting the given Mediator Variable. Similar to Substep 2a.
-
Substep 4b: Abduct Sibling Rules Hypotheses for the given Mediator Variable. Similar to Substep 2b.
-
Substep 4c: Abduct Strong Single Rule Hypotheses for the given Mediator Variable. Similar to Substep 2c.
-
-
-
Step 5:
Step 5: Abduction of Theoretical Model (Automatic):
-
Generate a theoretical model by integrating the set of causal links between predictor and target variables that are associated with the abducted directional and Single Rule Hypotheses.
-
Appendix B
2.1 Examples of Decision Rules for Formulating and Evaluating
2.1.1 Candidate Sibling Rules Hypotheses
Assuming a discretized target variable with 3 bins, Table 9 could be used to formulate and test a candidate Sibling Rules Hypothesis of the form: Predictor variable X has { Impact Type } on Target variable Y.
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Osei-Bryson, KM., Ngwenyama, O. (2014). An Approach for Using Data Mining to Support Theory Development. In: Osei-Bryson, KM., Ngwenyama, O. (eds) Advances in Research Methods for Information Systems Research. Integrated Series in Information Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-9463-8_4
Download citation
DOI: https://doi.org/10.1007/978-1-4614-9463-8_4
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-9462-1
Online ISBN: 978-1-4614-9463-8
eBook Packages: Business and EconomicsBusiness and Management (R0)