Abstract
Social science research has recently been subject to considerable criticism regarding the validity and power of empirical tests published in leading journals, and business scholarship is no exception. Transparency and replicability of empirical findings are essential to build a cumulative body of scholarly knowledge. Yet current practices are under increased scrutiny to achieve these objectives. JIBS is therefore discussing and revising its editorial practices to enhance the validity of empirical research. In this editorial, we reflect on best practices with respect to conducting, reporting, and discussing the results of quantitative hypothesis-testing research, and we develop guidelines for authors to enhance the rigor of their empirical work. This will not only help readers to assess empirical evidence comprehensively, but also enable subsequent research to build a cumulative body of empirical knowledge.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In many disciplines contributing to international business research, conventional Type 1 error probabilities are p < 0.05 or 0.01. There are situations where a higher Type 1 error probability, such as p < 0.10, might be justified (Cascio and Zedeck 1983; Aguinis et al. 2010), for example, when the dataset is small and a larger dataset is unrealistic to obtain.
- 2.
Note that according to Dalton et al. (2012), the selection bias (or file-drawer problem) does not appear to affect correlation tables in published versus unpublished papers.
- 3.
A “true” p-value would be the p-value observed in a regression analysis that was designed based on all available theoretical knowledge (e.g., regarding the measurement of variables and the inclusion of controls), and not changed after seeing the first regression results.
- 4.
Brodeur et al. (2016) extensively test whether this assumption holds, as well as the sensitivity of the overall distribution to issues like rounding, the number of tests performed in each article, number of tables included, and many more. Similar to Brodeur et al. (2016), we explored the sensitivity of the shape of the distribution to such issues, and we have no reason to assume that the final result in Figure 4.1 is sensitive to these issues.
- 5.
The spikes at z-scores of 3, 4, and 5 are the result of rounding and are an artefact of the data. As coefficients and standard errors reported in tables are rounded – often at 2 or 3 digits – very small coefficients and standard errors automatically imply ratios of rounded numbers, and as a consequence, result in a relatively large number of z-scores with the integer value of 3, 4, or 5. This observation is in line with the findings reported for Economics journals by Brodeur et al. (2016).
- 6.
The data on which the graph is based are taken from Beugelsdijk et al. (2014).
- 7.
If authors believe that certain suggested additional tests are not reasonable or not feasible (for example, because certain data do not exist), then they should communicate that in their reply. The editor then has to evaluate the merits of the arguments of authors and reviewers, if necessary bringing in an expert on a particular methodology at hand. If the latter is required, this can be indicated in the Manuscript Central submission process.
- 8.
A laudable exception is the recent special issue of Strategic Management Journal on replication (Bettis et al. 2016b).
- 9.
The grand total is heavily influenced by SMJ with 362 tested hypotheses, vis-à vis 164 in JIBS and 185 in Organization Science.
- 10.
An interesting alternative may be abduction. For example, see Dikova, Parker, and van Witteloostuijn (2017), who define abduction as “as a form of logical inference that begins with an observation and concludes with a hypothesis that accounts for the observation, ideally seeking to find the simplest and most likely explanation.” See also, e.g., Misangyi and Acharya (2014).
References
Aguinis, H., S. Werner, J.L. Abbott, C. Angert, J.H. Park, and D. Kohlhausen. 2010. Customer-centric research: Reporting significant research results with rigor, relevance, and practical impact in mind. Organizational Research Methods 13 (3): 515–539.
Andersson, U., A. Cuervo-Cazurra, and B.B. Nielsen. 2014. Explaining interaction effects within and across levels of analysis. Journal of International Business Studies 45 (9): 1063–1071.
Angrist, J.D., and A. Krueger. 2001. Instrumental variables and the search for identification: Form supply and demand to natural experiments. Journal of Economic Perspectives 15 (4): 69–85.
Angrist, J.D., and J.S. Pischke. 2010. The credibility revolution in empirical economics: How better research design is taking the con out of econometrics. Journal of Economic Perspectives 24 (2): 3–30.
Antonakis, J., S. Bendahan, P. Jacquart, and R. Lalive. 2010. On making causal claims: A review and recommendations. Leadership Quarterly 21 (6): 1086–1120.
Barley, S.R. 2016. 60th anniversary essay: Ruminations on how we became a mystery house and how we might get out. Administrative Science Quarterly 61 (1): 1–8.
Bedeian, A.G., S.G. Taylor, and A. Miller. 2010. Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning & Education 9 (4): 715–725.
Bettis, R.A. 2012. The search for asterisks: Compromised statistical tests and flawed theory. Strategic Management Journal 33 (1): 108–113.
Bettis, R.A., S. Ethiraj, A. Gambardella, C.E. Helfat, and W. Mitchell. 2016a. Creating repeatable cumulative knowledge in strategic management. Strategic Management Journal 37 (2): 257–261.
Bettis, R.A., C.E. Helfat, and M.J. Shaver. 2016b. Special issue: Replication in strategic management. Strategic Management Journal 37 (11): 2191–2388.
Beugelsdijk, S., H.L.F. de Groot, and A.B.T.M. van Schaik. 2004. Trust and economic growth: A robustness analysis. Oxford Economic Papers 56 (1): 118–134.
Beugelsdijk, S., A. Slangen, M. Onrust, A. van Hoorn, and R. Maseland. 2014. The impact of home-host cultural distance on foreign affiliate sales: The moderating role of cultural variation within host countries. Journal of Business Research 67 (8): 1638–1646.
Bhattacharjee, Y. 2013. The mind of a con man. New York Times Magazine, April 26.
Bobko, P. 2001. Correlation and regression: Applications for industrial organizational psychology and management. 2nd ed. Thousand Oaks: Sage.
Bosco, F.A., H. Aguinis, K. Singh, J.G. Field, and C.A. Pierce. 2015. Correlational effect size benchmarks. Journal of Applied Psychology 100 (2): 431–449.
Bosco, F.A., H. Aguinis, J.G. Field, C.A. Pierce, and D.R. Dalton. 2016. HARKing’s threat to organizational research: Evidence from primary and meta – Analytic sources. Personnel Psychology 69 (3): 709–750.
Brambor, T., W.R. Clark, and M. Golder. 2006. Understanding interaction models: Improving empirical analyses. Political Analysis 14 (1): 63–82.
Branch, M. 2014. Malignant side-effects of null-hypothesis testing. Theory and Psychology 24 (2): 256–277.
Brodeur, A., M. Le, M. Sangnier, and Y. Zylberberg. 2016. Star wars: The empirics strike back. American Economic Journal: Applied Economics 8 (1): 1–32.
Buckley, P., T. Devinney, and J.J. Louviere. 2007. Do managers behave the way theory suggests? A choice-theoretic examination of foreign direct investment location decision-making. Journal of International Business Studies 38 (7): 1069–1094.
Cascio, W.F., and S. Zedeck. 1983. Open a new window in rational research planning: Adjust alpha to maximize statistical power. Personnel Psychology 36 (3): 517–526.
Choi, J., and F. Contractor. 2016. Choosing an appropriate alliance governance mode: The role of institutional, cultural and geographic distance in international research & development (R&D) collaborations. Journal of International Business Studies 47 (2): 210–232.
Cohen, J. 1969. Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cortina, J.M., T. Kohler, and B.B. Nielsen. 2015. Restriction of variance interaction effects and their importance for international business. Journal of International Business Studies 46 (8): 879–885.
Crosswell, J.M., et al. 2009. Cumulative incidence of false positive results in repeated, multimodal cancer screening. Annals of Family Medicine 7 (3): 212–222.
Dalton, D.R., H. Aguinis, C.A. Dalton, F.A. Bosco, and C.A. Pierce. 2012. Revisiting the file drawer problem in meta-analysis: An empirical assessment of published and non-published correlation matrices. Personnel Psychology 65 (2): 221–249.
Dikova, D., S.C. Parker, and A. van Witteloostuijn. 2017. Capability, environment and internationalization fit, and financial and marketing performance of MNEs’ foreign subsidiaries: An abductive contingency approach. Cross-Cultural and Strategic Management 24 (3): 405–435.
Doh, J. 2015. Why we need phenomenon-based research in international business. Journal of World Business 50 (4): 609–611.
Doucouliagos, C., and T.D. Stanley. 2013. Are all economic facts greatly exaggerated? Theory competition and selectivity. Journal of Economic Surveys 27 (2): 316–339.
Economist. 2014. When science gets it wrong: Let the light shine in. June 14. http://www.economist.com/news/science–and–technology/21604089-two-big-recent-scientific-results-are-looking-shakyand-it-open-peer-review. Accessed 23 Mar 2017.
Ferguson, C.J., and M. Heene. 2012. A vast graveyard of undead theories: Publication bias and psychological science’s aversion to the null. Perspectives on Psychological Science 7 (6): 555–561.
Fisher, R.A. 1925. Statistical methods for research workers. Edinburgh: Oliver and Boyd.
Fisher, R., and S. Schwartz. 2011. Whence differences in value priorities? Individual, cultural, and artefactual sources. Journal of Cross-Cultural Psychology 42 (7): 1127–1144.
Fox, P.J., and C.A.W. Glas. 2002. Modeling measurement error in a structural multilevel model. In Latent variable and latent structure models, ed. G.A. Marcoulides and I. Moustaki. London: Lawrence Erlbaum Associates.
Gerber, A.S., D.P. Green, and D. Nickerson. 2001. Testing for publication bias in political science. Political Analysis 9 (4): 385–392.
Gigerenzer, G. 2004. Mindless statistics. Journal of Socio-Economics 33 (5): 587–606.
Goldfarb, B., and A. King. 2016. Scientific Apophenia in strategic management research: Significance tests & mistaken inference. Strategic Management Journal 37 (1): 167–176.
Gorg, H., and E. Strobl. 2001. Multinational companies and productivity spillovers: A meta-analysis with a test for publication bias. Economic Journal 111: F723–F739.
Greene, W. 2010. Testing hypotheses about interaction terms in nonlinear models. Economics Letters 107: 291–296.
Grieneisen, M.L., and M. Zhang. 2012. A comprehensive survey of retracted articles from the scholarly literature. PLoS One 7 (10): e44118. https://doi.org/10.1371/journal.pone.0044118.
Haans, R.F.P., C. Pieters, and Z.L. He. 2016. Thinking about U: Theorizing and testing U-and inverted U-shaped relationships in strategy research. Strategic Management Journal 37 (7): 1177–1196.
Head, M.L., L. Holman, R. Lanfear, A.T. Kahn, and M.D. Jennions. 2015. The extent and consequences of p–hacking in science. PLoS Biology 13 (3): e1002106. https://doi.org/10.1371/journal.pbio.1002106.
Henrich, J., S.J. Heine, and A. Norenzayan. 2010a. The weirdest people in the world? Behavioral and Brain Sciences 33 (2–3): 61–83.
———. 2010b. Most people are not WEIRD. Nature 466: 29.
Hoetker, G. 2007. The use of logit and probit models in strategic management research: Critical issues. Strategic Management Journal 28 (4): 331–343.
Hubbard, R., D.E. Vetter, and E.L. Little. 1998. Replication in strategic management: Scientific testing for validity, generalizability, and usefulness. Strategic Management Journal 19 (3): 243–254.
Hunter, J.E., and F.L. Schmidt. 2015. Methods of meta-analysis: Correcting error and bias in research findings. 2nd ed. Thousand Oaks: Sage.
Husted, B.W., I. Montiel, and P. Christmann. 2016. Effects of local legitimacy on certification decision to global and national CSR standards by multinational subsidiaries and domestic firms. Journal of International Business Studies 47 (3): 382–397.
Ioannidis, J.P.A. 2005. Why most published research findings are false. PLoS Medicine 2 (8): e124.
———. 2012. Why science is not necessarily self-correcting. Perspectives on Psychological Science 7 (6): 645–654.
John, L.K., G. Loewenstein, and D. Prelec. 2012. Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychological Science 23 (5): 524–532.
Kerr, N.L. 1998. HARKIng: Hypothesizing after results are known. Personality and Social Psychology Review 2 (3): 196–217.
Kingsley, A.F., T.G. Noordewier, and R.G. Vanden Bergh. 2017. Overstating and understating interaction results in international business research. Journal of World Business 52 (2): 286–295.
Kirk, R.E. 1996. Practical significance: A concept whose time has come. Educational and Psychological Measurement 56 (5): 746–759.
Leamer, E.E. 1985. Sensitivity analyses would help. American Economic Review 75 (3): 308–313.
Lewin, A.Y., C.Y. Chiu, C.F. Fey, S.S. Levine, G. McDermott, J.P. Murmann, and E. Tsang. 2016. The critique of empirical social science: New policies at Management and Organization Review. Management and Organization Review 12 (4): 649–658.
Lexchin, J., L.A. Bero, B. Djulbegovic, and O. Clark. 2003. Pharmaceutical industry sponsorship and research outcome and quality: Systematic review. British Medical Journal 326 (7400): 1167–1170.
Masicampo, E.J., and D.R. Lalande. 2012. A peculiar prevalence of p-values just below 0.05. Quarterly Journal of Experimental Psychology 65 (11): 2271–2279.
McCloskey, D.N. 1985. The loss function has been mislaid: The rhetoric of significance tests. American Economic Review 75 (2): 201–205.
McCloskey, D.N., and S.T. Ziliak. 1996. The standard error of regressions. Journal of Economic Literature 34: 97–114.
Meyer, K.E. 2006. Asian management research needs more self-confidence. Asia Pacific Journal of Management 23 (2): 119–137.
———. 2009. Motivating, testing, and publishing curvilinear effects in management research. Asia Pacific Journal of Management 26 (2): 187–193.
Misangyi, V.F., and A.G. Acharya. 2014. Substitutes or complements? A configurational examination of corporate governance mechanisms. Academy of Management Journal 57 (6): 1681–1705.
Mullane, K., and M. Williams. 2013. Bias in research: the rule rather than the exception? Elsevier Journal.http://editorsupdate.elsevier.com/issue-40-september-2013/bias-in-research-the-rule-rather-than-the-exception. Accessed 23 Mar 2017.
New York Times. 2011. Fraud case seen as a red flag for psychology research. November 2. http://www.nytimes.com/2011/11/03/health/research/noted-dutch-psychologist-stapel-accused-of-research-fraud.html?-r=1&ref=research. Accessed 15 Jan 2017.
Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science. https://doi.org/10.1126/science.aac4716.
Orlitzky, M. 2012. How can significance tests be deinstitutionalized? Organizational Research Methods 15 (2): 199–228.
Pashler, H., and E.-J. Wagenmakers. 2012. Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science 7 (6): 528–530.
Peterson, M., J.L. Arregle, and X. Martin. 2012. Multi-level models in international business research. Journal of International Business Studies 43 (5): 451–457.
Pfeffer, J. 2007. A modest proposal: How we might change the process and product of managerial research. Academy of Management Journal 50 (6): 1334–1345.
Popper, K. 1959. The logic of scientific discovery. London: Hutchinson.
Reeb, D., M. Sakakibara, and I.P. Mahmood. 2012. From the editors: Endogeneity in international business research. Journal of International Business Studies 43 (3): 211–218.
Rosenthal, R. 1979. The “file drawer problem” and tolerance for null results. Psychological Bulletin 86 (3): 638–641.
Rosnow, R.L., and R. Rosenthal. 1984. Understanding behavioral science: Research methods for customers. New York: McGraw-Hill.
Rothstein, H.R., A.J. Sutton, and M. Borenstein. 2005. Publication bias in meta-analysis, prevention, assessment and adjustment. New York: Wiley.
Sala-i-Martin, X. 1997. I just ran two million regressions. American Economic Review 87 (2): 178–183.
Shadish, W.R., T.D. Cook, and D. Campbell. 2002. Experimental and quasi-experimental designs for generalized causal inference. New York: Houghton Mifflin.
Simmons, J.P., L.D. Nelson, and U. Simonsohn. 2011. Falsepositive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22 (11): 1359–1366.
Sterling, T.D. 1959. Publication decision and their possible effects on inferences drawn from tests of significance – Vice versa. Journal of the American Statistical Association 54 (285): 30–34.
van Witteloostuijn, A. 2015. Toward experimental international business: Unraveling fundamental causal linkages. Cross Cultural & Strategic Management 22 (4): 530–544.
———. 2016. What happened to Popperian falsification? Publishing neutral and negative findings. Cross Cultural & Strategic Management 23 (3): 481–508.
Wasserstein, R. L., and N. A. Lazar. 2016. The ASA’s statement on p-values: Context, process, and purpose. American Statistician, 70(2): 129–133. http://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108. (ASA = American Statistical Association).
Wiersema, M.F., and H.P. Bowen. 2009. The use of limited dependent variable techniques in strategy research: Issues and methods. Strategic Management Journal 30 (6): 679–692.
Williams, R. 2012. Using the margins command to estimate and interpret adjusted predictions and marginal effects. Stata Journal 12 (2): 308.
Wonnacott, T.H., and R.J. Wonnacott. 1990. Introductory statistics for business and economics. New York: Wiley.
Zedeck, S. 2003. Editorial. Journal of Applied Psychology 88 (1): 3–5.
Zellmer-Bruhn, M., P. Caligiuri, and D. Thomas. 2016. From the editors: Experimental designs in international business research. Journal of International Business Studies 47 (4): 399–407.
Zelner, B. 2009. Using simulation to interpret results from logit, probit, and other nonlinear models. Strategic Management Journal 30 (12): 1335–1348.
Acknowledgements
We gratefully acknowledge the constructive comments from Editor-in-Chief Alain Verbeke, eleven editors of JIBS, as well as from Bas Bosma, Lin Cui, Rian Drogendijk, Saul Estrin, Anne-Wil Harzing, Jing Li, Richard Walker, and Tom Wansbeek. We also thank Divina Alexiou, Richard Haans, Johannes Kleinhempel, Sjanne van Oeveren, Britt van Veen, and Takumin Wang for their excellent research assistance. Sjoerd Beugelsdijk thanks the Netherlands Organization for Scientific Research (NWO grant VIDI 452-011-10). All three authors contributed equally to this editorial.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix 1: Stata Do File to Create Fig. 4.2
Appendix 1: Stata Do File to Create Fig. 4.2
Model:
-
Dependent variable = Y
-
Independent variable = X
-
Moderator variable = M
-
Interaction variable = X∗M
To generate Fig. 4.2:
-
predictnl me = _b[X] + _b[X∗M]∗M if e(sample),
-
se(seme)
-
gen pw1 = me–1.96∗seme
-
gen pw2 = me + 1.96∗seme
-
scatter me M if e(sample) || line me pw1 pw2 M if e(sample), pstyle(p2 p3 p3) sort legend(off) ytitle (“Marginal effect of X on Y”).
Rights and permissions
Copyright information
© 2020 The Author(s)
About this chapter
Cite this chapter
Meyer, K.E., van Witteloostuijn, A., Beugelsdijk, S. (2020). What’s in a p? Reassessing Best Practices for Conducting and Reporting Hypothesis-Testing Research. In: Eden, L., Nielsen, B.B., Verbeke, A. (eds) Research Methods in International Business. JIBS Special Collections. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-22113-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-22113-3_4
Published:
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-030-22112-6
Online ISBN: 978-3-030-22113-3
eBook Packages: Business and ManagementBusiness and Management (R0)