Skip to main content

An Approach for Using Data Mining to Support Theory Development

  • Chapter
  • First Online:
Advances in Research Methods for Information Systems Research

Part of the book series: Integrated Series in Information Systems ((ISIS,volume 34))

Abstract

The rapid and constant change in information technologies (IT), organizational forms, and social structures is challenging our existing theories of the impact IT on organizations and society. A basic problem for researchers is how to generate testable hypotheses about the given area of research. However, new IT offer opportunities for information processing and problem solving that could extend the capacity of researchers to generate hypotheses and systematically explore the limitations of any theory. The idea of using IT to support IS research is not new. In this chapter, we explore and illustrate how data mining techniques could be applied to assist researchers in systematic theory testing and development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Science: Conjectures and Refutations, in Popper, K. Conjectures and Refutations, 2002 Edition, pp. 70–71.

  2. 2.

    A discussion of the validation of the new instrument is beyond the scope of this paper. However, in validating the instrument, we used principal axis factoring and varimax rotation with Kaiser normalization. The reliability (alphas) for the groups of items were: System Quality, 0.9603; Documentation, 0.9337; Ease of Use 0.9266; Performance, 0.927; Utilization, 0.438; System Reliability, 0.8860; and Authorization 0.7618.

References

  • Benbasat I, Zmud R (1999) Empirical research in information systems: the practice of relevance. MIS Q 23(1):3–16

    Article  Google Scholar 

  • Brusic V, Zeleznikow J (1999) Knowledge discovery and data mining in biological databases. Knowl Eng Rev 14:257–277

    Article  Google Scholar 

  • Chalmers AF (1994) What is this thing called science? 3rd edn. Hackett Publishing

    Google Scholar 

  • Dipert R (1995) Peirce’s underestimated role in the history of logic. In: Ketner K (ed) Peirce and contemporary thought. Fordham University Press, New York

    Google Scholar 

  • Doll WJ, Torkzadeh G (1988) The measurement of end-user computing satisfaction. MIS Q 12(2):259–274

    Article  Google Scholar 

  • Etezadi-Amoli J, Farhoomand AF (1996) A structural model of end user computing satisfaction and user performance. Inf Manage 30(2):65–73

    Article  Google Scholar 

  • Fann KT (1970) Peirce’s theory of abduction. Martinus Nijhoff, Amsterdam

    Book  Google Scholar 

  • Grimes TR (1990) Truth, content, and the Hypothetico-Deductive method. Philosophy of Science, 57, 514–522

    Google Scholar 

  • Goodhue DL, Thompson RL (1995) Task-technology fit and individual performance. MIS Q 19(2):213–236

    Article  Google Scholar 

  • Hanson NR (1961) Is there a logic of discovery. In: Feigle H, Maxwell G (eds) Current issues in the philosophy of science. Holt, Rinehart and Winston, pp 20–35

    Google Scholar 

  • Harman G (1965) Inference to the best explanation. Philos Rev 74:88–95

    Article  Google Scholar 

  • Hintikka J (1968) The varieties of information and scientific explanation. In: van Rootselaar B, Staal JF (eds) Logic, methodology and philosophy of science III. North Holland, pp 151–171

    Google Scholar 

  • Hintikka J (1997) The place of CS Peirce in the history of logical theory. In: Lingua Universalis vs Calculus Ratiocinator, selected papers 2. Kluwer, pp 140–161

    Google Scholar 

  • Kim H, Koehler G (1995) Theory and practice of decision tree induction. Omega 23(6):637–652

    Google Scholar 

  • Ko M, Osei-Bryson K-M (2004a) Exploring the relationship between information technology investments and firm performance productivity using regression splines analysis. Inf Manage 42:1–13

    Article  Google Scholar 

  • Ko M, Osei-Bryson K-M (2004b) Using regression splines to assess the impact of information technology investments on productivity in the health care industry. Inf Syst J 14(1):43–63

    Article  Google Scholar 

  • Lee C, Irizarry K (2001) The GeneMine system for genome/proteome annotation and collaborative data mining. IBM Syst J 40(2):592–603

    Article  Google Scholar 

  • Niiniluoto I (1993) Peirce’s theory of statistical explanation. In: Moore EC (ed) Charles S Peirce and the philosophy of science. The University of Alabama Press, Tuscaloosa, pp 186–207

    Google Scholar 

  • Niiniluoto I (1999) Defending abduction. Proc Philos Sci 66:S436–S451

    Article  Google Scholar 

  • Palys TS (2003) Research decisions: quantitative and qualitative perspectives, 3rd edn. Nelson, Scarborough

    Google Scholar 

  • Popper KR (1957) The aim of science. Ratio 1

    Google Scholar 

  • Popper K (1963) Conjectures and refutations: the growth of scientific knowledge. Routledge and Kegan Paul, London, UK

    Google Scholar 

  • Popper KR (1968) The logic of scientific discovery. Harper Torch Books, New York

    Google Scholar 

  • Putnam H (1982) Peirce the logician. Historia Math 9:290–301

    Article  Google Scholar 

  • Quine WV (1995) Peirce’s Logic. In: Ketner KL (ed) Peirce and contemporary thought. Fordham, New York, pp 23–31

    Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  • Tursman R (1987) Peirce’s theory of scientific discovery. Indiana University Press, Bloomington

    Google Scholar 

Download references

Acknowledgments

Some of the material in this chapter previously appeared in the paper “Using Decision Tree Modelling to Support Peircian Abduction in IS Research: A Systematic Approach for Generating and Evaluating Hypotheses for Systematic Theory Development,” Information Systems Journal 21:5, 407–440 (2011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kweku-Muata Osei-Bryson .

Editor information

Editors and Affiliations

Appendices

Appendix A

1.1 A Procedure for Implementing the Methodology

  1. Step 1:

    Preparation (Researcher)

    1. 1.

      Identify Dependent variables: Identify dependent variables (e.g., Performance).

    2. 2.

      Identify Possible Mediator variables: Identify possible mediator variables (e.g., Utilization).

    3. 3.

      Identify Independent variables: Identify independent variables (e.g., System Quality).

    4. 4.

      Specify the Discretization Method: An example of such a method is presented in Table 1.

    5. 5.

      Identify the Target Events for the Dependent and Mediator Variables: For each dependent variable, identify target events (e.g., Performance is High ≡ the value of Performance is in the [5.5–7] interval) that may be of interest. Do similarly for the mediator variable(s).

    6. 6.

      Discretize All Ordinal Variables: For each ordinal variable, discretize the given variable using the specified discretization method.

    7. 7.

      Specify DT Generation Parameters: Specify relevant values for data partitioning parameters (e.g., distribution of cases in training, validation, and test datasets; stratification variables), DT induction parameters (e.g., splitting method options, minimum observations per leaf).

    8. 8.

      Specify Thresholds: Specify α, the significance level for statistical testing of the hypotheses, and τ 0, the threshold for p 0. A Strong Single Rule Hypothesis will not be generated for any rule for which the highest p 0 that is supported by the data is below τ 0.

    9. 9.

      Specify the Set of Decision Rules for Abducting and Evaluating Sibling Rules Hypotheses: A candidate Sibling Rules Hypothesis is formulated based on the relative frequency distribution of the target event for the set of sibling rules that are associated with the predictor variable. Appendix B provides an example of a Set of Decision Rules for Formulating and Evaluating Candidate Sibling Rules Hypotheses.

  2. Step 2:

    Hypotheses Variables (Automatic)

    • For each dependent variable:

      • Substep 2a: Generate DTs for the given Dependent Variable: Generate a set of DTs using the discretized dataset and the combination of DT generation parameter values that were specified in Step 1.

      • Substep 2b: Abduct and Evaluate Sibling Rules Hypotheses for the given Dependent Variable: In this substep, for each DT, a Sibling Rules Hypothesis will be abducted for each set of sibling rules if the relationship between the associated relative frequencies is included in the Set of Decision Rules for Formulating and Evaluating Candidate Sibling Rules Hypotheses. Each such abducted Sibling Rules Hypothesis is indirectly evaluated by subjecting the associated set of surrogate hypotheses to statistical testing. If they are all accepted, then there is good reason to believe that the Sibling Rules Hypothesis will not be rejected, and so it is not rejected.

      • Substep 2c: Abduct Strong Single Rule Hypotheses for the given Dependent Variable: For each sibling rule that is statistically different from each of its other sibling rules at significance level α, determine whether it is a strong rule by testing the surrogate hypothesis: p 0 ≥ τ 0. For each such surrogate hypothesis that is accepted, use the given rule to abduct a corresponding Strong Single Rule Hypothesis.

  3. Step 3:

    Identify Mediator Variables (Automatic)

    • Examine the set of supported hypotheses from Step 2 to determine whether any potential mediator variable is included at least one of the abducted hypotheses for one of the dependent variables.

  4. Step 4:

    Generate Hypotheses for Mediator Variables (Automatic)

    • Step 4 is executed for each mediator variable that was included in a condition event of a hypothesis for one of the dependent variables. If there is no such mediator variable, then this step is bypassed.

    • For each mediator variable:

      • Substep 4a: Generate DTs for Predicting the given Mediator Variable. Similar to Substep 2a.

      • Substep 4b: Abduct Sibling Rules Hypotheses for the given Mediator Variable. Similar to Substep 2b.

      • Substep 4c: Abduct Strong Single Rule Hypotheses for the given Mediator Variable. Similar to Substep 2c.

  5. Step 5:

    Step 5: Abduction of Theoretical Model (Automatic):

    • Generate a theoretical model by integrating the set of causal links between predictor and target variables that are associated with the abducted directional and Single Rule Hypotheses.

Appendix B

2.1 Examples of Decision Rules for Formulating and Evaluating

2.1.1 Candidate Sibling Rules Hypotheses

Assuming a discretized target variable with 3 bins, Table 9 could be used to formulate and test a candidate Sibling Rules Hypothesis of the form: Predictor variable X has { Impact Type } on Target variable Y.

Table 9 Example of decision rules for formulating candidate Sibling Rules Hypotheses

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Osei-Bryson, KM., Ngwenyama, O. (2014). An Approach for Using Data Mining to Support Theory Development. In: Osei-Bryson, KM., Ngwenyama, O. (eds) Advances in Research Methods for Information Systems Research. Integrated Series in Information Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-9463-8_4

Download citation

Publish with us

Policies and ethics