An Approach for Using Data Mining to Support Theory Development

Osei-Bryson, Kweku-Muata; Ngwenyama, Ojelanki

doi:10.1007/978-1-4614-9463-8_4

Kweku-Muata Osei-Bryson⁵ &
Ojelanki Ngwenyama⁶

Part of the book series: Integrated Series in Information Systems ((ISIS,volume 34))

2043 Accesses
1 Citations

Abstract

The rapid and constant change in information technologies (IT), organizational forms, and social structures is challenging our existing theories of the impact IT on organizations and society. A basic problem for researchers is how to generate testable hypotheses about the given area of research. However, new IT offer opportunities for information processing and problem solving that could extend the capacity of researchers to generate hypotheses and systematically explore the limitations of any theory. The idea of using IT to support IS research is not new. In this chapter, we explore and illustrate how data mining techniques could be applied to assist researchers in systematic theory testing and development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Science: Conjectures and Refutations, in Popper, K. Conjectures and Refutations, 2002 Edition, pp. 70–71.
2.
A discussion of the validation of the new instrument is beyond the scope of this paper. However, in validating the instrument, we used principal axis factoring and varimax rotation with Kaiser normalization. The reliability (alphas) for the groups of items were: System Quality, 0.9603; Documentation, 0.9337; Ease of Use 0.9266; Performance, 0.927; Utilization, 0.438; System Reliability, 0.8860; and Authorization 0.7618.

References

Benbasat I, Zmud R (1999) Empirical research in information systems: the practice of relevance. MIS Q 23(1):3–16
Article Google Scholar
Brusic V, Zeleznikow J (1999) Knowledge discovery and data mining in biological databases. Knowl Eng Rev 14:257–277
Article Google Scholar
Chalmers AF (1994) What is this thing called science? 3rd edn. Hackett Publishing
Google Scholar
Dipert R (1995) Peirce’s underestimated role in the history of logic. In: Ketner K (ed) Peirce and contemporary thought. Fordham University Press, New York
Google Scholar
Doll WJ, Torkzadeh G (1988) The measurement of end-user computing satisfaction. MIS Q 12(2):259–274
Article Google Scholar
Etezadi-Amoli J, Farhoomand AF (1996) A structural model of end user computing satisfaction and user performance. Inf Manage 30(2):65–73
Article Google Scholar
Fann KT (1970) Peirce’s theory of abduction. Martinus Nijhoff, Amsterdam
Book Google Scholar
Grimes TR (1990) Truth, content, and the Hypothetico-Deductive method. Philosophy of Science, 57, 514–522
Google Scholar
Goodhue DL, Thompson RL (1995) Task-technology fit and individual performance. MIS Q 19(2):213–236
Article Google Scholar
Hanson NR (1961) Is there a logic of discovery. In: Feigle H, Maxwell G (eds) Current issues in the philosophy of science. Holt, Rinehart and Winston, pp 20–35
Google Scholar
Harman G (1965) Inference to the best explanation. Philos Rev 74:88–95
Article Google Scholar
Hintikka J (1968) The varieties of information and scientific explanation. In: van Rootselaar B, Staal JF (eds) Logic, methodology and philosophy of science III. North Holland, pp 151–171
Google Scholar
Hintikka J (1997) The place of CS Peirce in the history of logical theory. In: Lingua Universalis vs Calculus Ratiocinator, selected papers 2. Kluwer, pp 140–161
Google Scholar
Kim H, Koehler G (1995) Theory and practice of decision tree induction. Omega 23(6):637–652
Google Scholar
Ko M, Osei-Bryson K-M (2004a) Exploring the relationship between information technology investments and firm performance productivity using regression splines analysis. Inf Manage 42:1–13
Article Google Scholar
Ko M, Osei-Bryson K-M (2004b) Using regression splines to assess the impact of information technology investments on productivity in the health care industry. Inf Syst J 14(1):43–63
Article Google Scholar
Lee C, Irizarry K (2001) The GeneMine system for genome/proteome annotation and collaborative data mining. IBM Syst J 40(2):592–603
Article Google Scholar
Niiniluoto I (1993) Peirce’s theory of statistical explanation. In: Moore EC (ed) Charles S Peirce and the philosophy of science. The University of Alabama Press, Tuscaloosa, pp 186–207
Google Scholar
Niiniluoto I (1999) Defending abduction. Proc Philos Sci 66:S436–S451
Article Google Scholar
Palys TS (2003) Research decisions: quantitative and qualitative perspectives, 3rd edn. Nelson, Scarborough
Google Scholar
Popper KR (1957) The aim of science. Ratio 1
Google Scholar
Popper K (1963) Conjectures and refutations: the growth of scientific knowledge. Routledge and Kegan Paul, London, UK
Google Scholar
Popper KR (1968) The logic of scientific discovery. Harper Torch Books, New York
Google Scholar
Putnam H (1982) Peirce the logician. Historia Math 9:290–301
Article Google Scholar
Quine WV (1995) Peirce’s Logic. In: Ketner KL (ed) Peirce and contemporary thought. Fordham, New York, pp 23–31
Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Google Scholar
Tursman R (1987) Peirce’s theory of scientific discovery. Indiana University Press, Bloomington
Google Scholar

Download references

Acknowledgments

Some of the material in this chapter previously appeared in the paper “Using Decision Tree Modelling to Support Peircian Abduction in IS Research: A Systematic Approach for Generating and Evaluating Hypotheses for Systematic Theory Development,” Information Systems Journal 21:5, 407–440 (2011).

Author information

Authors and Affiliations

Department of Information Systems, Virginia Commonwealth University, 301 West Main Street, Richmond, VA, 23284, USA
Kweku-Muata Osei-Bryson
Ted Rogers School of Management, Ryerson University, 350 Victoria Street, Toronto, ON, M5B 2K3, Canada
Ojelanki Ngwenyama

Authors

Kweku-Muata Osei-Bryson
View author publications
You can also search for this author in PubMed Google Scholar
Ojelanki Ngwenyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kweku-Muata Osei-Bryson .

Editor information

Editors and Affiliations

Department of Information Systems, Virginia Commonwealth University, Richmond, Virginia, USA
Kweku-Muata Osei-Bryson
Department of Information Systems, Ryerson University, Toronto, Ontario, Canada
Ojelanki Ngwenyama

Appendices

Appendix A

1.1 A Procedure for Implementing the Methodology

Step 1:
Preparation (Researcher)
1. 1.
  Identify Dependent variables: Identify dependent variables (e.g., Performance).
2. 2.
  Identify Possible Mediator variables: Identify possible mediator variables (e.g., Utilization).
3. 3.
  Identify Independent variables: Identify independent variables (e.g., System Quality).
4. 4.
  Specify the Discretization Method: An example of such a method is presented in Table 1.
5. 5.
  Identify the Target Events for the Dependent and Mediator Variables: For each dependent variable, identify target events (e.g., Performance is High ≡ the value of Performance is in the [5.5–7] interval) that may be of interest. Do similarly for the mediator variable(s).
6. 6.
  Discretize All Ordinal Variables: For each ordinal variable, discretize the given variable using the specified discretization method.
7. 7.
  Specify DT Generation Parameters: Specify relevant values for data partitioning parameters (e.g., distribution of cases in training, validation, and test datasets; stratification variables), DT induction parameters (e.g., splitting method options, minimum observations per leaf).
8. 8.
  Specify Thresholds: Specify α, the significance level for statistical testing of the hypotheses, and τ ₀, the threshold for p ₀. A Strong Single Rule Hypothesis will not be generated for any rule for which the highest p ₀ that is supported by the data is below τ ₀.
9. 9.
  Specify the Set of Decision Rules for Abducting and Evaluating Sibling Rules Hypotheses: A candidate Sibling Rules Hypothesis is formulated based on the relative frequency distribution of the target event for the set of sibling rules that are associated with the predictor variable. Appendix B provides an example of a Set of Decision Rules for Formulating and Evaluating Candidate Sibling Rules Hypotheses.
Step 2:
Hypotheses Variables (Automatic)
- For each dependent variable:
  - Substep 2a: Generate DTs for the given Dependent Variable: Generate a set of DTs using the discretized dataset and the combination of DT generation parameter values that were specified in Step 1.
  - Substep 2b: Abduct and Evaluate Sibling Rules Hypotheses for the given Dependent Variable: In this substep, for each DT, a Sibling Rules Hypothesis will be abducted for each set of sibling rules if the relationship between the associated relative frequencies is included in the Set of Decision Rules for Formulating and Evaluating Candidate Sibling Rules Hypotheses. Each such abducted Sibling Rules Hypothesis is indirectly evaluated by subjecting the associated set of surrogate hypotheses to statistical testing. If they are all accepted, then there is good reason to believe that the Sibling Rules Hypothesis will not be rejected, and so it is not rejected.
  - Substep 2c: Abduct Strong Single Rule Hypotheses for the given Dependent Variable: For each sibling rule that is statistically different from each of its other sibling rules at significance level α, determine whether it is a strong rule by testing the surrogate hypothesis: p ₀ ≥ τ ₀. For each such surrogate hypothesis that is accepted, use the given rule to abduct a corresponding Strong Single Rule Hypothesis.
Step 3:
Identify Mediator Variables (Automatic)
- Examine the set of supported hypotheses from Step 2 to determine whether any potential mediator variable is included at least one of the abducted hypotheses for one of the dependent variables.
Step 4:
Generate Hypotheses for Mediator Variables (Automatic)
- Step 4 is executed for each mediator variable that was included in a condition event of a hypothesis for one of the dependent variables. If there is no such mediator variable, then this step is bypassed.
- For each mediator variable:
  - Substep 4a: Generate DTs for Predicting the given Mediator Variable. Similar to Substep 2a.
  - Substep 4b: Abduct Sibling Rules Hypotheses for the given Mediator Variable. Similar to Substep 2b.
  - Substep 4c: Abduct Strong Single Rule Hypotheses for the given Mediator Variable. Similar to Substep 2c.
Step 5:
Step 5: Abduction of Theoretical Model (Automatic):
- Generate a theoretical model by integrating the set of causal links between predictor and target variables that are associated with the abducted directional and Single Rule Hypotheses.

Appendix B

2.1 Examples of Decision Rules for Formulating and Evaluating

2.1.1 Candidate Sibling Rules Hypotheses

Assuming a discretized target variable with 3 bins, Table 9 could be used to formulate and test a candidate Sibling Rules Hypothesis of the form: Predictor variable X has { Impact Type } on Target variable Y.

Table 9 Example of decision rules for formulating candidate Sibling Rules Hypotheses

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Osei-Bryson, KM., Ngwenyama, O. (2014). An Approach for Using Data Mining to Support Theory Development. In: Osei-Bryson, KM., Ngwenyama, O. (eds) Advances in Research Methods for Information Systems Research. Integrated Series in Information Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-9463-8_4

Download citation

DOI: https://doi.org/10.1007/978-1-4614-9463-8_4
Published: 26 November 2013
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-9462-1
Online ISBN: 978-1-4614-9463-8
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics