What’s in a p? Reassessing Best Practices for Conducting and Reporting Hypothesis-Testing Research

Meyer, Klaus E.; van Witteloostuijn, Arjen; Beugelsdijk, Sjoerd

doi:10.1007/978-3-030-22113-3_4

What’s in a p? Reassessing Best Practices for Conducting and Reporting Hypothesis-Testing Research

Klaus E. Meyer⁷,
Arjen van Witteloostuijn⁸ &
Sjoerd Beugelsdijk⁹

Chapter
First Online: 07 December 2019

3639 Accesses
1 Citations

Part of the book series: JIBS Special Collections ((JIBSSC))

Abstract

Social science research has recently been subject to considerable criticism regarding the validity and power of empirical tests published in leading journals, and business scholarship is no exception. Transparency and replicability of empirical findings are essential to build a cumulative body of scholarly knowledge. Yet current practices are under increased scrutiny to achieve these objectives. JIBS is therefore discussing and revising its editorial practices to enhance the validity of empirical research. In this editorial, we reflect on best practices with respect to conducting, reporting, and discussing the results of quantitative hypothesis-testing research, and we develop guidelines for authors to enhance the rigor of their empirical work. This will not only help readers to assess empirical evidence comprehensively, but also enable subsequent research to build a cumulative body of empirical knowledge.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In many disciplines contributing to international business research, conventional Type 1 error probabilities are p < 0.05 or 0.01. There are situations where a higher Type 1 error probability, such as p < 0.10, might be justified (Cascio and Zedeck 1983; Aguinis et al. 2010), for example, when the dataset is small and a larger dataset is unrealistic to obtain.
2.
Note that according to Dalton et al. (2012), the selection bias (or file-drawer problem) does not appear to affect correlation tables in published versus unpublished papers.
3.
A “true” p-value would be the p-value observed in a regression analysis that was designed based on all available theoretical knowledge (e.g., regarding the measurement of variables and the inclusion of controls), and not changed after seeing the first regression results.
4.
Brodeur et al. (2016) extensively test whether this assumption holds, as well as the sensitivity of the overall distribution to issues like rounding, the number of tests performed in each article, number of tables included, and many more. Similar to Brodeur et al. (2016), we explored the sensitivity of the shape of the distribution to such issues, and we have no reason to assume that the final result in Figure 4.1 is sensitive to these issues.
5.
The spikes at z-scores of 3, 4, and 5 are the result of rounding and are an artefact of the data. As coefficients and standard errors reported in tables are rounded – often at 2 or 3 digits – very small coefficients and standard errors automatically imply ratios of rounded numbers, and as a consequence, result in a relatively large number of z-scores with the integer value of 3, 4, or 5. This observation is in line with the findings reported for Economics journals by Brodeur et al. (2016).
6.
The data on which the graph is based are taken from Beugelsdijk et al. (2014).
7.
If authors believe that certain suggested additional tests are not reasonable or not feasible (for example, because certain data do not exist), then they should communicate that in their reply. The editor then has to evaluate the merits of the arguments of authors and reviewers, if necessary bringing in an expert on a particular methodology at hand. If the latter is required, this can be indicated in the Manuscript Central submission process.
8.
A laudable exception is the recent special issue of Strategic Management Journal on replication (Bettis et al. 2016b).
9.
The grand total is heavily influenced by SMJ with 362 tested hypotheses, vis-à vis 164 in JIBS and 185 in Organization Science.
10.
An interesting alternative may be abduction. For example, see Dikova, Parker, and van Witteloostuijn (2017), who define abduction as “as a form of logical inference that begins with an observation and concludes with a hypothesis that accounts for the observation, ideally seeking to find the simplest and most likely explanation.” See also, e.g., Misangyi and Acharya (2014).

References

Aguinis, H., S. Werner, J.L. Abbott, C. Angert, J.H. Park, and D. Kohlhausen. 2010. Customer-centric research: Reporting significant research results with rigor, relevance, and practical impact in mind. Organizational Research Methods 13 (3): 515–539.
Article Google Scholar
Andersson, U., A. Cuervo-Cazurra, and B.B. Nielsen. 2014. Explaining interaction effects within and across levels of analysis. Journal of International Business Studies 45 (9): 1063–1071.
Article Google Scholar
Angrist, J.D., and A. Krueger. 2001. Instrumental variables and the search for identification: Form supply and demand to natural experiments. Journal of Economic Perspectives 15 (4): 69–85.
Article Google Scholar
Angrist, J.D., and J.S. Pischke. 2010. The credibility revolution in empirical economics: How better research design is taking the con out of econometrics. Journal of Economic Perspectives 24 (2): 3–30.
Article Google Scholar
Antonakis, J., S. Bendahan, P. Jacquart, and R. Lalive. 2010. On making causal claims: A review and recommendations. Leadership Quarterly 21 (6): 1086–1120.
Article Google Scholar
Barley, S.R. 2016. 60th anniversary essay: Ruminations on how we became a mystery house and how we might get out. Administrative Science Quarterly 61 (1): 1–8.
Article Google Scholar
Bedeian, A.G., S.G. Taylor, and A. Miller. 2010. Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning & Education 9 (4): 715–725.
Google Scholar
Bettis, R.A. 2012. The search for asterisks: Compromised statistical tests and flawed theory. Strategic Management Journal 33 (1): 108–113.
Article Google Scholar
Bettis, R.A., S. Ethiraj, A. Gambardella, C.E. Helfat, and W. Mitchell. 2016a. Creating repeatable cumulative knowledge in strategic management. Strategic Management Journal 37 (2): 257–261.
Article Google Scholar
Bettis, R.A., C.E. Helfat, and M.J. Shaver. 2016b. Special issue: Replication in strategic management. Strategic Management Journal 37 (11): 2191–2388.
Article Google Scholar
Beugelsdijk, S., H.L.F. de Groot, and A.B.T.M. van Schaik. 2004. Trust and economic growth: A robustness analysis. Oxford Economic Papers 56 (1): 118–134.
Article Google Scholar
Beugelsdijk, S., A. Slangen, M. Onrust, A. van Hoorn, and R. Maseland. 2014. The impact of home-host cultural distance on foreign affiliate sales: The moderating role of cultural variation within host countries. Journal of Business Research 67 (8): 1638–1646.
Article Google Scholar
Bhattacharjee, Y. 2013. The mind of a con man. New York Times Magazine, April 26.
Google Scholar
Bobko, P. 2001. Correlation and regression: Applications for industrial organizational psychology and management. 2nd ed. Thousand Oaks: Sage.
Book Google Scholar
Bosco, F.A., H. Aguinis, K. Singh, J.G. Field, and C.A. Pierce. 2015. Correlational effect size benchmarks. Journal of Applied Psychology 100 (2): 431–449.
Article Google Scholar
Bosco, F.A., H. Aguinis, J.G. Field, C.A. Pierce, and D.R. Dalton. 2016. HARKing’s threat to organizational research: Evidence from primary and meta – Analytic sources. Personnel Psychology 69 (3): 709–750.
Article Google Scholar
Brambor, T., W.R. Clark, and M. Golder. 2006. Understanding interaction models: Improving empirical analyses. Political Analysis 14 (1): 63–82.
Article Google Scholar
Branch, M. 2014. Malignant side-effects of null-hypothesis testing. Theory and Psychology 24 (2): 256–277.
Article Google Scholar
Brodeur, A., M. Le, M. Sangnier, and Y. Zylberberg. 2016. Star wars: The empirics strike back. American Economic Journal: Applied Economics 8 (1): 1–32.
Google Scholar
Buckley, P., T. Devinney, and J.J. Louviere. 2007. Do managers behave the way theory suggests? A choice-theoretic examination of foreign direct investment location decision-making. Journal of International Business Studies 38 (7): 1069–1094.
Article Google Scholar
Cascio, W.F., and S. Zedeck. 1983. Open a new window in rational research planning: Adjust alpha to maximize statistical power. Personnel Psychology 36 (3): 517–526.
Article Google Scholar
Choi, J., and F. Contractor. 2016. Choosing an appropriate alliance governance mode: The role of institutional, cultural and geographic distance in international research & development (R&D) collaborations. Journal of International Business Studies 47 (2): 210–232.
Article Google Scholar
Cohen, J. 1969. Statistical power analysis for the behavioral sciences. New York: Academic Press.
Google Scholar
Cortina, J.M., T. Kohler, and B.B. Nielsen. 2015. Restriction of variance interaction effects and their importance for international business. Journal of International Business Studies 46 (8): 879–885.
Article Google Scholar
Crosswell, J.M., et al. 2009. Cumulative incidence of false positive results in repeated, multimodal cancer screening. Annals of Family Medicine 7 (3): 212–222.
Article Google Scholar
Dalton, D.R., H. Aguinis, C.A. Dalton, F.A. Bosco, and C.A. Pierce. 2012. Revisiting the file drawer problem in meta-analysis: An empirical assessment of published and non-published correlation matrices. Personnel Psychology 65 (2): 221–249.
Article Google Scholar
Dikova, D., S.C. Parker, and A. van Witteloostuijn. 2017. Capability, environment and internationalization fit, and financial and marketing performance of MNEs’ foreign subsidiaries: An abductive contingency approach. Cross-Cultural and Strategic Management 24 (3): 405–435.
Article Google Scholar
Doh, J. 2015. Why we need phenomenon-based research in international business. Journal of World Business 50 (4): 609–611.
Article Google Scholar
Doucouliagos, C., and T.D. Stanley. 2013. Are all economic facts greatly exaggerated? Theory competition and selectivity. Journal of Economic Surveys 27 (2): 316–339.
Article Google Scholar
Economist. 2014. When science gets it wrong: Let the light shine in. June 14. http://www.economist.com/news/science–and–technology/21604089-two-big-recent-scientific-results-are-looking-shakyand-it-open-peer-review. Accessed 23 Mar 2017.
Ferguson, C.J., and M. Heene. 2012. A vast graveyard of undead theories: Publication bias and psychological science’s aversion to the null. Perspectives on Psychological Science 7 (6): 555–561.
Article Google Scholar
Fisher, R.A. 1925. Statistical methods for research workers. Edinburgh: Oliver and Boyd.
Google Scholar
Fisher, R., and S. Schwartz. 2011. Whence differences in value priorities? Individual, cultural, and artefactual sources. Journal of Cross-Cultural Psychology 42 (7): 1127–1144.
Article Google Scholar
Fox, P.J., and C.A.W. Glas. 2002. Modeling measurement error in a structural multilevel model. In Latent variable and latent structure models, ed. G.A. Marcoulides and I. Moustaki. London: Lawrence Erlbaum Associates.
Google Scholar
Gerber, A.S., D.P. Green, and D. Nickerson. 2001. Testing for publication bias in political science. Political Analysis 9 (4): 385–392.
Article Google Scholar
Gigerenzer, G. 2004. Mindless statistics. Journal of Socio-Economics 33 (5): 587–606.
Article Google Scholar
Goldfarb, B., and A. King. 2016. Scientific Apophenia in strategic management research: Significance tests & mistaken inference. Strategic Management Journal 37 (1): 167–176.
Article Google Scholar
Gorg, H., and E. Strobl. 2001. Multinational companies and productivity spillovers: A meta-analysis with a test for publication bias. Economic Journal 111: F723–F739.
Article Google Scholar
Greene, W. 2010. Testing hypotheses about interaction terms in nonlinear models. Economics Letters 107: 291–296.
Article Google Scholar
Grieneisen, M.L., and M. Zhang. 2012. A comprehensive survey of retracted articles from the scholarly literature. PLoS One 7 (10): e44118. https://doi.org/10.1371/journal.pone.0044118.
Article Google Scholar
Haans, R.F.P., C. Pieters, and Z.L. He. 2016. Thinking about U: Theorizing and testing U-and inverted U-shaped relationships in strategy research. Strategic Management Journal 37 (7): 1177–1196.
Article Google Scholar
Head, M.L., L. Holman, R. Lanfear, A.T. Kahn, and M.D. Jennions. 2015. The extent and consequences of p–hacking in science. PLoS Biology 13 (3): e1002106. https://doi.org/10.1371/journal.pbio.1002106.
Article Google Scholar
Henrich, J., S.J. Heine, and A. Norenzayan. 2010a. The weirdest people in the world? Behavioral and Brain Sciences 33 (2–3): 61–83.
Article Google Scholar
———. 2010b. Most people are not WEIRD. Nature 466: 29.
Article Google Scholar
Hoetker, G. 2007. The use of logit and probit models in strategic management research: Critical issues. Strategic Management Journal 28 (4): 331–343.
Article Google Scholar
Hubbard, R., D.E. Vetter, and E.L. Little. 1998. Replication in strategic management: Scientific testing for validity, generalizability, and usefulness. Strategic Management Journal 19 (3): 243–254.
Article Google Scholar
Hunter, J.E., and F.L. Schmidt. 2015. Methods of meta-analysis: Correcting error and bias in research findings. 2nd ed. Thousand Oaks: Sage.
Google Scholar
Husted, B.W., I. Montiel, and P. Christmann. 2016. Effects of local legitimacy on certification decision to global and national CSR standards by multinational subsidiaries and domestic firms. Journal of International Business Studies 47 (3): 382–397.
Article Google Scholar
Ioannidis, J.P.A. 2005. Why most published research findings are false. PLoS Medicine 2 (8): e124.
Article Google Scholar
———. 2012. Why science is not necessarily self-correcting. Perspectives on Psychological Science 7 (6): 645–654.
Article Google Scholar
John, L.K., G. Loewenstein, and D. Prelec. 2012. Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychological Science 23 (5): 524–532.
Article Google Scholar
Kerr, N.L. 1998. HARKIng: Hypothesizing after results are known. Personality and Social Psychology Review 2 (3): 196–217.
Article Google Scholar
Kingsley, A.F., T.G. Noordewier, and R.G. Vanden Bergh. 2017. Overstating and understating interaction results in international business research. Journal of World Business 52 (2): 286–295.
Article Google Scholar
Kirk, R.E. 1996. Practical significance: A concept whose time has come. Educational and Psychological Measurement 56 (5): 746–759.
Article Google Scholar
Leamer, E.E. 1985. Sensitivity analyses would help. American Economic Review 75 (3): 308–313.
Google Scholar
Lewin, A.Y., C.Y. Chiu, C.F. Fey, S.S. Levine, G. McDermott, J.P. Murmann, and E. Tsang. 2016. The critique of empirical social science: New policies at Management and Organization Review. Management and Organization Review 12 (4): 649–658.
Article Google Scholar
Lexchin, J., L.A. Bero, B. Djulbegovic, and O. Clark. 2003. Pharmaceutical industry sponsorship and research outcome and quality: Systematic review. British Medical Journal 326 (7400): 1167–1170.
Article Google Scholar
Masicampo, E.J., and D.R. Lalande. 2012. A peculiar prevalence of p-values just below 0.05. Quarterly Journal of Experimental Psychology 65 (11): 2271–2279.
Article Google Scholar
McCloskey, D.N. 1985. The loss function has been mislaid: The rhetoric of significance tests. American Economic Review 75 (2): 201–205.
Google Scholar
McCloskey, D.N., and S.T. Ziliak. 1996. The standard error of regressions. Journal of Economic Literature 34: 97–114.
Google Scholar
Meyer, K.E. 2006. Asian management research needs more self-confidence. Asia Pacific Journal of Management 23 (2): 119–137.
Article Google Scholar
———. 2009. Motivating, testing, and publishing curvilinear effects in management research. Asia Pacific Journal of Management 26 (2): 187–193.
Article Google Scholar
Misangyi, V.F., and A.G. Acharya. 2014. Substitutes or complements? A configurational examination of corporate governance mechanisms. Academy of Management Journal 57 (6): 1681–1705.
Article Google Scholar
Mullane, K., and M. Williams. 2013. Bias in research: the rule rather than the exception? Elsevier Journal.http://editorsupdate.elsevier.com/issue-40-september-2013/bias-in-research-the-rule-rather-than-the-exception. Accessed 23 Mar 2017.
New York Times. 2011. Fraud case seen as a red flag for psychology research. November 2. http://www.nytimes.com/2011/11/03/health/research/noted-dutch-psychologist-stapel-accused-of-research-fraud.html?-r=1&ref=research. Accessed 15 Jan 2017.
Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science. https://doi.org/10.1126/science.aac4716.
Orlitzky, M. 2012. How can significance tests be deinstitutionalized? Organizational Research Methods 15 (2): 199–228.
Article Google Scholar
Pashler, H., and E.-J. Wagenmakers. 2012. Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science 7 (6): 528–530.
Article Google Scholar
Peterson, M., J.L. Arregle, and X. Martin. 2012. Multi-level models in international business research. Journal of International Business Studies 43 (5): 451–457.
Article Google Scholar
Pfeffer, J. 2007. A modest proposal: How we might change the process and product of managerial research. Academy of Management Journal 50 (6): 1334–1345.
Article Google Scholar
Popper, K. 1959. The logic of scientific discovery. London: Hutchinson.
Google Scholar
Reeb, D., M. Sakakibara, and I.P. Mahmood. 2012. From the editors: Endogeneity in international business research. Journal of International Business Studies 43 (3): 211–218.
Article Google Scholar
Rosenthal, R. 1979. The “file drawer problem” and tolerance for null results. Psychological Bulletin 86 (3): 638–641.
Article Google Scholar
Rosnow, R.L., and R. Rosenthal. 1984. Understanding behavioral science: Research methods for customers. New York: McGraw-Hill.
Google Scholar
Rothstein, H.R., A.J. Sutton, and M. Borenstein. 2005. Publication bias in meta-analysis, prevention, assessment and adjustment. New York: Wiley.
Book Google Scholar
Sala-i-Martin, X. 1997. I just ran two million regressions. American Economic Review 87 (2): 178–183.
Google Scholar
Shadish, W.R., T.D. Cook, and D. Campbell. 2002. Experimental and quasi-experimental designs for generalized causal inference. New York: Houghton Mifflin.
Google Scholar
Simmons, J.P., L.D. Nelson, and U. Simonsohn. 2011. Falsepositive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22 (11): 1359–1366.
Article Google Scholar
Sterling, T.D. 1959. Publication decision and their possible effects on inferences drawn from tests of significance – Vice versa. Journal of the American Statistical Association 54 (285): 30–34.
Google Scholar
van Witteloostuijn, A. 2015. Toward experimental international business: Unraveling fundamental causal linkages. Cross Cultural & Strategic Management 22 (4): 530–544.
Article Google Scholar
———. 2016. What happened to Popperian falsification? Publishing neutral and negative findings. Cross Cultural & Strategic Management 23 (3): 481–508.
Article Google Scholar
Wasserstein, R. L., and N. A. Lazar. 2016. The ASA’s statement on p-values: Context, process, and purpose. American Statistician, 70(2): 129–133. http://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108. (ASA = American Statistical Association).
Article Google Scholar
Wiersema, M.F., and H.P. Bowen. 2009. The use of limited dependent variable techniques in strategy research: Issues and methods. Strategic Management Journal 30 (6): 679–692.
Article Google Scholar
Williams, R. 2012. Using the margins command to estimate and interpret adjusted predictions and marginal effects. Stata Journal 12 (2): 308.
Article Google Scholar
Wonnacott, T.H., and R.J. Wonnacott. 1990. Introductory statistics for business and economics. New York: Wiley.
Google Scholar
Zedeck, S. 2003. Editorial. Journal of Applied Psychology 88 (1): 3–5.
Article Google Scholar
Zellmer-Bruhn, M., P. Caligiuri, and D. Thomas. 2016. From the editors: Experimental designs in international business research. Journal of International Business Studies 47 (4): 399–407.
Article Google Scholar
Zelner, B. 2009. Using simulation to interpret results from logit, probit, and other nonlinear models. Strategic Management Journal 30 (12): 1335–1348.
Article Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the constructive comments from Editor-in-Chief Alain Verbeke, eleven editors of JIBS, as well as from Bas Bosma, Lin Cui, Rian Drogendijk, Saul Estrin, Anne-Wil Harzing, Jing Li, Richard Walker, and Tom Wansbeek. We also thank Divina Alexiou, Richard Haans, Johannes Kleinhempel, Sjanne van Oeveren, Britt van Veen, and Takumin Wang for their excellent research assistance. Sjoerd Beugelsdijk thanks the Netherlands Organization for Scientific Research (NWO grant VIDI 452-011-10). All three authors contributed equally to this editorial.

Author information

Authors and Affiliations

Western University, London, ON, Canada
Klaus E. Meyer
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Arjen van Witteloostuijn
University of Groningen, Groningen, The Netherlands
Sjoerd Beugelsdijk

Authors

Klaus E. Meyer
View author publications
You can also search for this author in PubMed Google Scholar
Arjen van Witteloostuijn
View author publications
You can also search for this author in PubMed Google Scholar
Sjoerd Beugelsdijk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Klaus E. Meyer .

Editor information

Editors and Affiliations

Department of Management, Texas A&M University, College Station, TX, USA
Lorraine Eden
The University of Sydney Business School, The University of Sydney, Sydney, NSW, Australia
Bo Bernhard Nielsen
Copenhagen Business School, Frederiksberg, Denmark
Bo Bernhard Nielsen
University of Calgary, Calgary, AB, Canada
Alain Verbeke
University of Reading, Reading, UK
Alain Verbeke
Vrije Universiteit Brussel, Brussels, Belgium
Alain Verbeke

Appendix 1: Stata Do File to Create Fig. 4.2

Model:

Dependent variable = Y
Independent variable = X
Moderator variable = M
Interaction variable = X∗M

To generate Fig. 4.2:

predictnl me = _b[X] + _b[X∗M]∗M if e(sample),
se(seme)
gen pw1 = me–1.96∗seme
gen pw2 = me + 1.96∗seme
scatter me M if e(sample) || line me pw1 pw2 M if e(sample), pstyle(p2 p3 p3) sort legend(off) ytitle (“Marginal effect of X on Y”).

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Meyer, K.E., van Witteloostuijn, A., Beugelsdijk, S. (2020). What’s in a p? Reassessing Best Practices for Conducting and Reporting Hypothesis-Testing Research. In: Eden, L., Nielsen, B.B., Verbeke, A. (eds) Research Methods in International Business. JIBS Special Collections. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-22113-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-22113-3_4
Published: 07 December 2019
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-030-22112-6
Online ISBN: 978-3-030-22113-3
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix 1: Stata Do File to Create Fig. 4.2

Appendix 1: Stata Do File to Create Fig. 4.2

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation