Calcular e apresentar tamanhos do efeito em trabalhos científicos (1): As limitações do p < 0,05 na análise de diferenças de médias de dois grupos

Helena Espirito Santo, Fernanda Bento Daniel

Resumo


A Revista Portuguesa de Investigação Comportamental e Social exige que os autores sigam as recomendações do Publication Manual of the American Psychological Association (APA, 2010) na apresentação da informação estatística. Uma das recomendações da APA é de que os tamanhos do efeito sejam apresentados associados aos níveis de significância estatística.

Uma vez que os valores de p decorrentes dos resultados dos testes estatísticos não informam sobre a magnitude ou importância de uma diferença, devem então reportar-se os tamanhos do efeito (TDE). De facto, os TDE dão significado aos testes estatísticos, enfatizam o poder dos testes estatísticos, reduzem o risco de a mera variação amostral ser interpretada como relação real, podem aumentar o relato de resultados “não-significativos” e permitem acumular conhecimento de vários estudos usando a meta-análise.

Assim, os objetivos deste artigo são os de apresentar os limites do nível de significância; descrever os fundamentos da apresentação dos TDE dos testes estatísticos para análise de diferenças entre dois grupos; apresentar as fórmulas para calcular os TDE, fornecendo exemplos de estudos nossos; apresentar procedimentos de cálculo dos intervalos de confiança; fornecer as fórmulas de conversão para revisão da literatura; indicar como interpretar os TDE; e ainda mostrar que, apesar de frequentemente ser interpretável, o significado (efeito pequeno, médio ou grande para uma métrica arbitrária) pode ser impreciso, havendo necessidade de ser interpretado no contexto da área de investigação e de variáveis do mundo real.

 




DOI: http://dx.doi.org/10.7342/ismt.rpics.2015.1.1.14

Palavras-chave


Tamanho do efeito; Significância estatística; Valor p; d de Cohen; g de Hedges; Delta de Glass

Texto Completo:

PDF Folha Cálculo HTML XML

Referências


Acion, L., Peterson, J. J., Temple, S., & Arndt, S. (2006). Probabilistic index: An intuitive non-parametric approach to measuring the size of treatment effects. Statistics in Medicine, 25(4), 591–602. doi:10.1002/sim.2256

Aguinis, H., Werner, S., Abbott, J. L., Angert, C., Park, J. H., & Kohlhausen, D. (2010). Customer-centric science: Reporting significant research results with rigor, relevance, and practical impact in mind. Organizational Research Methods, 13(3), 515–539.

Aickin, M. (2004). Bayes without priors. Journal of Clinical Epidemiology, 57(1), 4–13. doi:10.1016/S0895-4356(03)00251-8

American Psychological Association. (APA) (2010). Publication Manual of the American Psychological Association (6th ed.). Washington, DC: APA.

Andersen, M. B., McCullagh, P., & Wilson, G. J. (2007). But what do the numbers really tell us? Arbitrary metrics and effect size reporting in sport psychology research. Journal of Sport & Exercise Psychology, 29(5), 664–672. doi:10.1123/jsep.29.5.664

Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617. doi:10.1348/000712608x377117

Berben, L., Sereika, S. M., & Engberg, S. (2012). Effect size estimation: methods and examples. International Journal of Nursing Studies, 49(8), 1039–1047. doi:10.1016/j.ijnurstu.2012.01.015

Bezeau, S., & Graves, R. (2001). Statistical power and effect sizes of clinical neuropsychology research. Journal of Clinical and Experimental Neuropsychology (Neuropsychology, Development and Cognition: Section A), 23(3), 399–406. doi:10.1076/jcen.23.3.399.1181

Blanton, H., & Jaccard, J. (2006). Arbitrary metrics in psychology. The American Psychologist, 61(1), 27–41. doi:10.1037/0003-066X.61.1.27

Blanton, H., & Jaccard, J. (2006). Arbitrary metrics redux. The American Psychologist, 61(1), 62-71. doi:10.1037/0003-066X.61.1.62

Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hodges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (pp. 221–235). New York, NY: Russell Sage Foundation.

Breaugh, J. A. (2003). Effect size estimation: Factors to consider and mistakes to avoid. Journal of Management, 29(1), 79–97. doi:10.1016/s0149-2063(02)00221-0

Caperos, J. M., & Pardo, A. (2013). Consistency errors in p-values reported in Spanish psychology journals. Psicothema, 25(3), 408–414. doi:10.7334/psicothema2012.207

Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378–399. doi:10.17763/haer.48.3.t490261645281841

Chow, S. L. (1988). Significance test or effect size? Psychological Bulletin, 103(1), 105-110. Retrieved from http://psych.colorado.edu/~willcutt/pdfs/Chow_1988.pdf

Coe, R. (2002). It's the effect size, stupid: What effect size is and why it is important. Paper presented at the Annual Conference of the British Educational Research Association, University of Exeter, England, Education-line. Retrieved from http://www.cem.org/attachments/ebe/ESguide.pdf

Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65(3), 145–153. doi:10.1037/h0045186

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Lawrence Erlbaum Associates.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. doi:10.1037/0033-2909.112.1.155

Cohen, J. (1992). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98–101. doi:10.2307/20182143

Cohen, J. (1994). The earth is round (p < .05). The American Psychologist, 49(12), 997-1003. doi:10.1037/0003-066X.49.12.997

Conn, V. S., Chan, K. C., & Cooper, P. S. (2014). The problem with p. Western Journal of Nursing Research, 36(3), 291–293. doi:10.1177/0193945913492495

Cook, R. J., & Sackett, D. L. (1995). The number needed to treat: A clinically useful measure of treatment effect. BMJ, 310(6977), 452–454. doi:10.1136/bmj.310.6977.452

Cooper, H., Hedges, L. V., & Valentine, J. C. (2009). The handbook of research synthesis and meta-analysis (2nd ed.). New York, NY: Russell Sage Foundation.

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge.

Dunlap, W. P. (1994). Generalizing the common language effect size indicator to bivariate normal correlations. Psychological Bulletin, 116(3), 509–511. doi:10.1037/0033-2909.116.3.509

Durlak, J. A. (2009). How to select, calculate, and interpret effect sizes. Journal of Pediatric Psychology, 34(9), 917–928. doi:10.1093/jpepsy/jsp004

Ellis, P. D. (2010). The essential guide to effect sizes. Statistical power, meta-analysis, and the interpretation of research results. Cambridge: Cambridge University Press.

Embretson, S. E. (2006). The continued search for nonarbitrary metrics in psychology. The American Psychologist, 61(1), 50–55. doi:10.1037/0003-066X.61.1.50

Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40(5), 532-538. doi:10.1037/a0015808

Fern, E. F., & Monroe, K. B. (1996). Effect-size estimates: Issues and problems in interpretation. Journal of Consumer Research, 23(2), 89–105. doi:10.2307/2489707

Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.

Fisher, R. A. (1959). Statistical methods and scientific inferente (2nd ed.). Edinburgh: Oliver and Boyd.

Furukawa, T. A., & Leucht, S. (2011). How to obtain NNT from Cohen's d: Comparison of two methods. PLoS ONE, 6(4), e19070, 1-5. doi:10.1371/journal.pone.0019070

Giere, R. N. (1972). The significance test controversy. British Journal for the Philosophy of Science, 23(2), 170–181. doi:10.2307/686441

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. doi:10.3102/0013189X005010003

Glass, G.V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills: Sage.

Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence Erlbaum Associates Publishers.

Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational and Behavioral Statistics, 6(2), 107–128. doi:10.3102/10769986006002107

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis (Vol. 11, pp. 104–106). Orlando: Academic Press.

Hentschke, H., & Stüttgen, M. C. (2011). Computation of measures of effect size for neuroscience data sets. The European Journal of Neuroscience, 34(12), 1887–1894. doi:10.1111/j.1460-9568.2011.07902.x

Huberty, C. J. (1993). Historical origins of statistical testing practices: The treatment of Fisher versus Neyman-Pearson views in textbooks. The Journal of Experimental Education, 61(4), 317–333. doi:10.2307/20152384

Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12-19. doi:10.1037/0022-006X.59.1.12

Kazdin, A. E. (2006). Arbitrary metrics: Implications for identifying evidence-based treatments. The American Psychologist, 61(1), 42-49. doi:10.1037/0003-066X.61.1.42

Killeen, P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16(5), 345–353. doi:10.2307/40064228

Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), 746–759. doi:10.1177/0013164496056005002

Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research (2nd ed.). Washington, DC: American Psychological Association.

Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. BPS, 59(11), 990–996. doi:10.1016/j.biopsych.2005.09.014

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS ONE, 9(9), e105825, 1-8. doi:10.1371/journal.pone.0105825

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4(863), 1-12. doi:10.3389/fpsyg.2013.00863

Lee, M. D., & Wagenmakers, E.-J. (2005). Bayesian statistical inference in psychology: Comment on Trafimow (2003). Psychological Review, 112(3), 662–668. doi:10.1037/0033-295X.112.3.662

Lemos, L., Espirito-Santo, H., Silva, G. F., Costa, M., Cardoso, D., Vicente, F., ... Moitinho, S. (2014). The impact of a Neuropsychological Rehabilitation Group Program (NRGP) on cognitive and emotional functioning in institutionalized elderly. Poster presented at the 22nd European Congress of Psychiatry, Munich. Retrieved from https://www.researchgate.net/publication/264979017_EPA-1657_-_The_impact_of_a_neuropsychological_rehabilitation_group_program_NRGP_on_cognitive_and_emotional_functioning_in_institutionalized_elderly

Lenth, R. V. (2006–2014). Java applets for power and sample size. Retrieved from http://homepage.stat.uiowa.edu/~rlenth/Power/

Liesbeth, W. A., Prins, J. B., Vernooij-Dassen, M. J. F. J., Wijnen, H. H., Olde Rikkert, M. G. M., & Kessels, R. P. C. (2011). Group therapy for patients with mild cognitive impairment and their significant others: Results of a waiting-list controlled trial. Gerontology, 57(5), 444–454. doi:10.1159/000315933

Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., ... Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. Washington, DC: National Center for Special Education Research, Institute of Education Sciences. Retrieved from https://ies.ed.gov/ncser/pubs/20133000/pdf/20133000.pdf

Loftus, G. R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36(2), 102–105. doi:10.1037/029395

Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5(6), 161–171. doi:10.1111/1467-8721.ep11512376

McCartney, K., & Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71(1), 173–180. doi:10.1111/1467-8624.00131

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361-365. doi:10.1037/0033-2909.111.2.361

McMillan, J. H., & Foley, J. (2011). Reporting and discussing effect size: Still the road less traveled. Practical Assessment, Research & Evaluation, 16(14), 1–12. Retrieved from http://pareonline.net/pdf/v16n14.pdf

Morrison, D. E., & Henkel, R. E. (Eds.). (1970). The significance test controversy: A reader. Chicago: Aldine.

Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Reviews, 82(4), 591–605. doi:10.1111/j.1469-185X.2007.00027.x

Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301. doi:10.1037//1082-989X.5.2.241

Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-Hill.

Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25(3), 241–286. doi:10.1006/ceps.2000.1040

Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics, 8(2), 157–159. doi:10.2307/1164923

Paiva, A. C., Cunha, M., Xavier, A. M., Marques, M., Simões, S., & Espirito-Santo, H. (2013). Exploratory study of risk-taking and self-harm behaviours in adolescents: Prevalence, characteristics and its relationship to attachment styles. European Psychiatry, 28(Suppl. 1), 1. doi:10.1016/S0924-9338(13)76530-1

Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 195, 1–47. doi:10.1098/rsta.1900.0022

Reiser, B., & Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: The normal equal variance case. Journal of the Royal Statistical Society: Series D (the Statistician), 48(3), 413–418. doi:10.1111/1467-9884.00199

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638-641. doi:10.1111/1467-9884.00199

Rosenthal, R. (1983). Assessing the statistical and social importance of the effects of psychotherapy. Journal of Consulting and Clinical Psychology, 51, 4-13. doi:10.1037/0022-006X.51.1.4

Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.). The handbook of research synthesis (pp. 231–244). New York, NY: Russell Sage.

Rosenthal, J. A. (1996). Qualitative descriptors of strength of association and effect size. Journal of Social Service Research, 21(4), 37-59. doi:10.1300/J079v21n04_02

Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. The American Psychologist, 44(10), 1276-1284. doi:10.1037/0003-066X.44.10.1276

Rosnow, R. L., Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation. Psychological Science, 11(6), 446–453. doi:10.1111/1467-9280.00287

Salsburg, D. (2002). The lady tasting tea. New York, NY: Macmillan.

Sanabria, F., & Killeen, P. R. (2007). Better statistics for better decisions: Rejecting null hypotheses statistical tests in favor of replication statistics. Psychology in the Schools, 44(5), 471–481. doi:10.1002/pits.20239

Schatz, P., Jay, K. A., McComb, J., & McLaughlin, J. R. (2005). Misuse of statistical tests in Archives of Clinical Neuropsychology publications. Archives of Clinical Neuropsychology, 20(8), 1053–1059. doi:10.1016/j.acn.2005.06.006

Schmidt, F. L., & Hunter, J. E. (2004). Methods of meta-analysis. Thousand Oaks: SAGE Publications.

Schneider, A. L., & Darcy, R. E. (1984). Policy implications of using significance tests in evaluation research. Evaluation Review, 8(4), 573–582. doi:10.1177/0193841X8400800407

Schünemann, H. J., Oxman, A. D., Vist, G. E., Higgins, J. P. T., Deeks, J. J., Glasziou, P., & Guyatt, G. H. (2008). Interpreting results and drawing conclusions. In J. P. T. Higgins & S. Green (Eds.), Cochrane Handbook for Systematic Reviews of Interventions: Cochrane Book Series (pp. 1–29). The Cochrane Collaboration.

Sechrest, L., McKnight, P., & McKnight, K. (1996). Calibration of measures in psychotherapy outcome studies. American Psychologist, 51, 1065-1071. doi:10.1037/0003-066X.51.10.1065

Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies?. Psychological Bulletin, 105(2), 309-316. doi:10.1037/0033-2909.105.2.309

Snyder, P., & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. The Journal of Experimental Education, 61(4), 334–349. doi:10.1037/0033-2909.105.2.309

Sun, S., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102(4), 989-1004. doi:10.1037/a0019507

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston: Pearson.


Apontadores

  • Não há apontadores.


Copyright (c) 2015 Helena Espirito Santo, & Fernanda Bento Daniel

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.