Item Details

How to Set Delta in the Two-One-Sided T-tests Procedure (TOST)

Issue: Vol 5 No. 1-2 (2018)

Journal: Journal of Research Design and Statistics in Linguistics and Communication Science

Subject Areas: Linguistics

Abstract:

The Two-One-Sided T-test procedure (TOST) is used to show that two samples are equivalent or similar, in contrast to classical statistical tests which check for dissimilarity. The TOST relies on a parameter called delta, which has to be set by the researcher using their intuition. Doing so can be difficult, because of complex interactions of relevant parameters. In this article we present a method to set delta, which is established and validated through extensive simulations based on real data sets from linguistics and other sciences. The presented method is shown to be sound and reliable, but we cannot exclude deviant early model behaviour (N≤10) and deviant late model behaviour (N>100,000).

Author: Tom S. Juzek, Johannes Kizach

View Full Text

References :

Aguilar-Sánchez, J. (2014). Replicability of (Socio)Linguistic Studies. Journal of Research Design and Statistics in Linguistics and Communication Science 1 (1): 1-21.
https://doi.org/10.1558/jrds/6783228282

Aguilar-Sánchez, J. (2018). Copula+ Adjective: An a-posteriori power analysis for the generalizability of results. Journal of Research Design and Statistics in Linguistics and Communication Science 4 (2): 91-123.
https://doi.org/10.1558/jrds.33845

Altman, D. G. and Bland, J. M. (1995). Statistics notes: Absence of evidence is not evidence of absence. BMJ 311 (7003): 485.
https://doi.org/10.1136/bmj.311.7003.485

Lin, M., Lucas Jr, H. C., and Galit, S. (2013). Research commentary - too big to fail: Large samples and the p-value problem. Information Systems Research 24 (4): 906-917.
https://doi.org/10.1287/isre.2013.0480

Liu, X. S. (2013). Statistical Power Analysis for the Social and Behavioral Sciences: Basic and Advanced Techniques. New York: Routledge.
https://doi.org/10.4324/9780203127698

Richter, S. J. and Richter, C. (2002). A method for determining equivalence in industrial applications. Quality Engineering 14 (3): 375-380. doi:10.1081/QEN-120001876.
https://doi.org/10.1081/QEN-120001876

Royall, R. M. (1986). The effect of sample size on the meaning of significance tests. The American Statistician 40 (4): 313-315.
https://doi.org/10.1080/00031305.1986.10475424

Schuirmann, D. L. (1981). On hypothesis-testing to determine if the mean of a normal-distribution is contained in a known interval. Biometrics 37: 617-617.

Stegner, B. L., Bostrom, A. G., and Greenfield, T. K. (1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning 19 (3): 193-198.
https://doi.org/10.1016/0149-7189(96)00011-0

Vasishth, S. and Broe, M. (2011). The Foundations of Statistics: A Simulation-based Approach. Berlin: Springer. http://www.springer.com/mathematics/applications/book/978-3-642-16312-8 (20 May, 2014).

Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics 32 (4): 741-744.
https://doi.org/10.2307/2529259

Aguilar-Sánchez, J. (2014). Replicability of (Socio)Linguistic Studies. Journal of Research Design and Statistics in Linguistics and Communication Science 1 (1): 1-21. https://doi.org/10.1558/jrds/6783228282

Aguilar-Sánchez, J. (2018). Copula+ Adjective: An a-posteriori power analysis for the generalizability of results. Journal of Research Design and Statistics in Linguistics and Communication Science 4 (2): 91-123. https://doi.org/10.1558/jrds.33845

Altman, D. G. and Bland, J. M. (1995). Statistics notes: Absence of evidence is not evidence of absence. BMJ 311 (7003): 485. https://doi.org/10.1136/bmj.311.7003.485

Lin, M., Lucas Jr, H. C., and Galit, S. (2013). Research commentary - too big to fail: Large samples and the p-value problem. Information Systems Research 24 (4): 906-917. https://doi.org/10.1287/isre.2013.0480

Liu, X. S. (2013). Statistical Power Analysis for the Social and Behavioral Sciences: Basic and Advanced Techniques. New York: Routledge. https://doi.org/10.4324/9780203127698

Richter, S. J. and Richter, C. (2002). A method for determining equivalence in industrial applications. Quality Engineering 14 (3): 375-380. doi:10.1081/QEN-120001876. https://doi.org/10.1081/QEN-120001876

Royall, R. M. (1986). The effect of sample size on the meaning of significance tests. The American Statistician 40 (4): 313-315. https://doi.org/10.1080/00031305.1986.10475424

Schuirmann, D. L. (1981). On hypothesis-testing to determine if the mean of a normal-distribution is contained in a known interval. Biometrics 37: 617-617.

Stegner, B. L., Bostrom, A. G., and Greenfield, T. K. (1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning 19 (3): 193-198. https://doi.org/10.1016/0149-7189(96)00011-0

Vasishth, S. and Broe, M. (2011). The Foundations of Statistics: A Simulation-based Approach. Berlin: Springer. http://www.springer.com/mathematics/applications/book/978-3-642-16312-8 (20 May, 2014).

Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics 32 (4): 741-744. https://doi.org/10.2307/2529259