Calcular e apresentar tamanhos do efeito em trabalhos científicos (1): As limitações do p < 0,05 na análise de diferenças de médias de dois grupos

Helena Espirito Santo, Fernanda Bento Daniel

Resumo


A Revista Portuguesa de Investigação Comportamental e Social exige que os autores sigam as recomendações do Publication Manual of the American Psychological Association (APA, 2010) na apresentação da informação estatística. Uma das recomendações da APA é de que os tamanhos do efeito sejam apresentados associados aos níveis de significância estatística.

Uma vez que os valores de p decorrentes dos resultados dos testes estatísticos não informam sobre a magnitude ou importância de uma diferença, devem então reportar-se os tamanhos do efeito (TDE). De facto, os TDE dão significado aos testes estatísticos, enfatizam o poder dos testes estatísticos, reduzem o risco de a mera variação amostral ser interpretada como relação real, podem aumentar o relato de resultados “não-significativos” e permitem acumular conhecimento de vários estudos usando a meta-análise.

Assim, os objetivos deste artigo são os de apresentar os limites do nível de significância; descrever os fundamentos da apresentação dos TDE dos testes estatísticos para análise de diferenças entre dois grupos; apresentar as fórmulas para calcular os TDE, fornecendo exemplos de estudos nossos; apresentar procedimentos de cálculo dos intervalos de confiança; fornecer as fórmulas de conversão para revisão da literatura; indicar como interpretar os TDE; e ainda mostrar que, apesar de frequentemente ser interpretável, o significado (efeito pequeno, médio ou grande para uma métrica arbitrária) pode ser impreciso, havendo necessidade de ser interpretado no contexto da área de investigação e de variáveis do mundo real.

 




DOI: http://dx.doi.org/10.7342/ismt.rpics.2015.1.1.14

Palavras-chave


Tamanho do efeito; Significância estatística; Valor p; d de Cohen; g de Hedges; Delta de Glass

Texto Completo:

PDF Folha Cálculo

Referências


Acion, L., Peterson, J. J., Temple, S. e Arndt, S. (2006). Probabilistic index: an intuitive non-parametric approach to measuring the size of treatment effects. Statistics in Medicine, 25(4), 591–602. doi:10.1002/sim.2256

Aguinis, H., Werner, S., Abbott, J. L., Angert, C., Park, J. H. e Kohlhausen, D. (2010). Customer-centric science: Reporting significant research results with rigor, relevance, and practical impact in mind. Organizational Research Methods, 13(3), 515–539.

Aickin, M. (2004). Bayes without priors. Journal of Clinical Epidemiology, 57(1), 4–13. doi:10.1016/S0895-4356(03)00251-8

American Psychological Association (APA) (2010). Publication Manual of the American Psychological Association (6.ª ed.). Washington, DC: APA.

Andersen, M. B., McCullagh, P. e Wilson, G. J. (2007). But what do the numbers really tell us? Arbitrary metrics and effect size reporting in sport psychology research. Journal of Sport e Exercise Psychology, 29(5), 664–672.

Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617.

Berben, L., Sereika, S. M. e Engberg, S. (2012). Effect size estimation: methods and examples. International Journal of Nursing Studies, 49(8), 1039–1047. doi:10.1016/j.ijnurstu.2012.01.015

Bezeau, S. e Graves, R. (2001). Statistical power and effect sizes of clinical neuropsychology research. Journal of Clinical and Experimental Neuropsychology (Neuropsychology, Development and Cognition: Section a), 23(3), 399–406.

Blanton, H. e Jaccard, J. (2006a). Arbitrary metrics in psychology. The American Psychologist, 61(1), 27–41. doi:10.1037/0003-066X.61.1.27

Blanton, H. e Jaccard, J. (2006b). Arbitrary metrics redux. The American Psychologist, 61(1), 62.

Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hodges e J. C. Valentine, The handbook of research synthesis and meta-analysis (pp. 221–235). New York: Russell Sage Foundation.

Breaugh, J. A. (2003). Effect size estimation: Factors to consider and mistakes to avoid. Journal of Management, 29(1), 79–97.

Caperos, J. M. e Pardo, A. (2013). Consistency errors in p-values reported in Spanish psychology journals. Psicothema, 25(3), 408–414. doi:10.7334/psicothema2012.207

Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378–399.

Chow, S. L. (1988). Significance test or effect size? Psychological Bulletin, 103(1), 105-110.

Coe, R. (2002). It's the effect size, stupid: what effect size is and why it is important. Presented at the Annual Conference of the British Educational Research Association, University of Exeter, England, Education-line.

Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65(3), 145–153.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2.ª ed.). Hillsdale: Lawrence Erlbaum Associates.

Cohen, J. (1992a). A power primer. Psychological Bulletin, 112(1), 155.

Cohen, J. (1992b). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98–101.

Cohen, J. (1994). The earth is round (p < .05). The American Psychologist, 49(12), 997-1003.

Conn, V. S., Chan, K. C. e Cooper, P. S. (2014). The problem with p. Western Journal of Nursing Research, 36(3), 291–293.

Cook, R. J. e Sackett, D. L. (1995). The number needed to treat: a clinically useful measure of treatment effect. BMJ, 310(6977), 452–454.

Cooper, H., Hedges, L. V. e Valentine, J. C. (2009). The handbook of research synthesis and meta-analysis (2.ª ed.). New York: Russell Sage Foundation.

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge.

Dunlap, W. P. (1994). Generalizing the common language effect size indicator to bivariate normal correlations. Psychological Bulletin, 116(3), 509–511.

Durlak, J. A. (2009). How to select, calculate, and interpret effect sizes. Journal of Pediatric Psychology, 34(9), 917–928.

Ellis, P. D. (2010). The essential guide to effect sizes. Statistical power, meta-analysis, and the interpretation of research results (pp. 1–193). Cambridge: Cambridge University Press.

Embretson, S. E. (2006). The continued search for nonarbitrary metrics in psychology. The American Psychologist, 61(1), 50–55. doi:10.1037/0003-066X.61.1.50

Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40(5), 532-538.

Fern, E. F. e Monroe, K. B. (1996). Effect-size estimates: Issues and problems in interpretation. Journal of Consumer Research, 23(2), 89–105.

Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.

Fisher, R. A. (1959). Statistical methods and scientific inferente (2.ª ed.). Edinburgh: Oliver and Boyd.

Furukawa, T. A. e Leucht, S. (2011). How to obtain NNT from Cohen's d: Comparison of two methods. PLoS ONE, 6(4), e19070, 1-5.

Giere, R. N. (1972). The significance test controversy. British Journal for the Philosophy of Science, 23(2), 170–181.

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8.

Glass, G.V., McGaw, B. e Smith, M. L. (1981). Meta-analysis in social research. Sage: Beverly Hills.

Grissom, R. J. e Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence Erlbaum Associates Publishers.

Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational and Behavioral Statistics, 6(2), 107–128.

Hedges, L. V. e Olkin, I. (1985). Statistical methods for meta-analysis (Vol. 11, pp. 104–106). Orlando: Academic Press.

Hentschke, H. e Stüttgen, M. C. (2011). Computation of measures of effect size for neuroscience data sets. The European Journal of Neuroscience, 34(12), 1887–1894. doi:10.1111/j.1460-9568.2011.07902.x

Huberty, C. J. (1993). Historical origins of statistical testing practices: The treatment of Fisher versus Neyman-Pearson views in textbooks. The Journal of Experimental Education, 61(4), 317–333.

Jacobson, N. S. e Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12-19.

Kazdin, A. E. (2006). Arbitrary metrics: implications for identifying evidence-based treatments. The American Psychologist, 61(1), 42-49. doi:10.1037/0003-066X.61.1.42

Killeen, P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16(5), 345–353.

Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), 746–759. doi:10.1177/0013164496056005002

Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research (2.ª ed.). Washington, DC: American Psychological Association.

Kraemer, H. C. e Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. BPS, 59(11), 990–996. doi:10.1016/j.biopsych.2005.09.014

Kühberger, A., Fritz, A. e Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS ONE, 9(9), e105825, 1-8. doi:10.1371/journal.pone.0105825

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4(863), 1-12. doi:10.3389/fpsyg.2013.00863

Lee, M. D. e Wagenmakers, E.-J. (2005). Bayesian statistical inference in psychology: comment on Trafimow (2003). Psychological Review, 112(3), 662–668. doi:10.1037/0033-295X.112.3.662

Lemos, L., Espirito-Santo, H., Silva, G. F., Costa, M., Cardoso, D., Vicente, F. ... Motinho, S. (2014). The impact of a Neuropsychological Rehabilitation Group Program (NRGP) on cognitive and emotional functioning in institutionalized elderly (p. 1). Presented at the 22nd European Congress of Psychiatry, Munich.

Lenth, R. V. (2006–2014). Java applets for power and sample size. Acedido em http://homepage.stat.uiowa.edu/~rlenth/Power/

Liesbeth, W. A., Prins, J. B., Vernooij-Dassen, M. J. F. J., Wijnen, H. H., Olde Rikkert, M. G. M. e Kessels, R. P. C. (2011). Group therapy for patients with mild cognitive impairment and their significant others: results of a waiting-list controlled trial. Gerontology, 57(5), 444–454. doi:10.1159/000315933

Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W. … Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. National Center for Special Education Research. National Center for Special Education Research, Institute of Education Sciences.

Loftus, G. R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36(2), 102–105.

Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 161–171.

McCartney, K. e Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71(1), 173–180.

McGraw, K. O. e Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361-365.

McMillan, J. H. e Foley, J. (2011). Reporting and discussing effect size: Still the road less traveled. Practical Assessment, Research e Evaluation, 16(14), 1–12.

Morrison, D. E. e Henkel, R. E. (Eds.). (1970). The significance test controversy: A reader. Chicago: Aldine.

Nakagawa, S. e Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: a practical guide for biologists. Biological Reviews, 82(4), 591–605. doi:10.1111/j.1469-185X.2007.00027.x

Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301. doi:10.1037//1082-989X.5.2.241

Nunnally, J. C. (1978). Psychometric Theory. New York: McGraw-Hill.

Olejnik, S. e Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25(3), 241–286. doi:10.1006/ceps.2000.1040

Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics, 8(2), 157–159.

Paiva, A. C., Cunha, M., Xavier, A. M., Marques, M., Simões, S. e Espirito-Santo, H. (2013). Exploratory study of risk-taking and self-harm behaviours in adolescents: prevalence, characteristics and its relationship to attachment styles. European Psychiatry, 28(Supl. 1). doi:10.1016/S0924-9338(13)76530-1

Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London. Series a, Containing Papers of a Mathematical or Physical Character, 195, 1–47. doi:10.1098/rsta.1900.0022

Reiser, B. e Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. Journal of the Royal Statistical Society: Series D (the Statistician), 48(3), 413–418.

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638-641.

Rosenthal, R. (1983). Assessing the statistical and social importance of the effects of psychotherapy. Journal of Consulting and Clinical Psychology, 51, 4-13.

Rosenthal, R. (1994). Parametric measures of effect size. Em H. Cooper e L. V. Hedges (Eds.). The handbook of research synthesis (pp. 231–244). New York: Russell Sage.

Rosenthal, J. A. (1996). Qualitative descriptors of strength of association and effect size. Journal of Social Service Research, 21(4), 37-59.

Rosnow, R. L. e Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. The American Psychologist, 44(10), 1276-1284.

Rosnow, R. L., Rosenthal, R. e Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation. Psychological Science, 11(6), 446–453.

Salsburg, D. (2002). The lady tasting tea. New York: Macmillan.

Sanabria, F. e Killeen, P. R. (2007). Better statistics for better decisions: rejecting null hypotheses statistical tests in favor of replication statistics. Psychology in the Schools, 44(5), 471–481. doi:10.1002/pits.20239

Schatz, P., Jay, K. A., McComb, J. e McLaughlin, J. R. (2005). Misuse of statistical tests in Archives of Clinical Neuropsychology publications. Archives of Clinical Neuropsychology, 20(8), 1053–1059. doi:10.1016/j.acn.2005.06.006

Schmidt, F. L. e Hunter, J. E. (2004). Methods of Meta-Analysis. Thousand Oaks: SAGE Publications.

Schneider, A. L. e Darcy, R. E. (1984). Policy implications of using significance tests in evaluation research. Evaluation Review, 8(4), 573–582. doi:10.1177/0193841X8400800407

Schünemann, H. J., Oxman, A. D., Vist, G. E., Higgins, J. P. T., Deeks, J. J., Glasziou, P. e Guyatt, G. H. (2008). Interpreting results and drawing conclusions. Em J. P. T. Higgins e S. Green, Cochrane Handbook for Systematic Reviews of Interventions: Cochrane Book Series (pp. 1–29). The Cochrane Collaboration.

Sechrest, L. McKnight, P. e McKnight, K. (1996). Calibration of measures in psychotherapy outcome studies. American Psychologist, 51, 1065-1071.

Sedlmeier, P. e Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105(2), 309-316.

Snyder, P. e Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. The Journal of Experimental Education, 61(4), 334–349.

Sun, S., Pan, W. e Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102(4), 989-1004.

Tabachnick, B. G. e Fidell, L. S. (2007). Using Multivariate Statistics (5.ª ed.). Boston: Pearson.


Apontadores

  • Não há apontadores.


Copyright (c) 2015 Helena Espirito Santo, Fernanda Bento Daniel

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.