Reporting practices of rater reliability in interpreting research: A mixed-methods review of 14 journals (2004–2014)
Issue: Vol 3 No. 1 (2016) Mixed Methods
Journal: Journal of Research Design and Statistics in Linguistics and Communication Science
Subject Areas: Linguistics
DOI: 10.1558/jrds.29622
Abstract:
The issue addressed in this study is the reporting practices of rater reliability in interpreting research (IR), given that the use of raters as a method of measurement is a commonplace in IR, and that little is known about to what extent and how rater reliability estimates (RREs) have been reported. Drawing upon 447 articles from 14 translation and interpreting journals (2004--2014), this mixed-methods study attempts to gain quantitative and qualitative insights into the reporting practices. Data analysis reveals that: 1) almost 90% of the articles that needed to report RREs failed to do so; 2) potential problems emerged from those articles that reported RREs: lack of distinction between rater consensus and consistency, underreporting, misinterpretation and misuse of RREs, and lack of justification for the use of rater-generated measurements for subsequent data analysis. These findings highlight an urgent need for increased author awareness of reporting appropriate RREs in IR.
Author: Chao Han
References :
Agrifoglio, M. (2004). Sight translation and interpreting: A comparative analysis of constraints and failures. Interpreting 6 (1), 43−67. https://doi.org/10.1075/intp.6.1.05agr
Angelelli, C. (2009). Using a rubric to assess translation ability: Defining the construct. In C. Angelelli and H. E. Jacobson (Eds) Testing and Assessment in Translation and Interpreting Studies, 13–47. Amsterdam: John Benjamins. https://doi.org/10.1075/ata.xiv.03ang
Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.
Bakti, M. and Bóna, J. (2014). Source language-related erroneous stress placement in the target language output of simultaneous interpreters. Interpreting 16 (1), 34–48. https://doi.org/10.1075/intp.16.1.03bak
Bale, R. (2013). Undergraduate consecutive interpreting and lexical knowledge: The role of spoken corpora. The Interpreter and Translator Trainer 7 (1), 27–50. https://doi.org/10.1080/13556509.2013.10798842
Bartko, J. J. (1966). The intraclass correlation coefficient as a measure of reliability. Psychological Reports 19, 3–11. https://doi.org/10.2466/pr0.1966.19.1.3
Bartłomiejczyk, M. (2006). Strategies of simultaneous interpreting and directionality. Interpreting 8 (2), 149–174. https://doi.org/10.1075/intp.8.2.03bar
Baumgartner, T. A. (1989). Norm-referenced measurement: Reliability. In M. J. Safrit and T. M. Wood (Eds) Measurement Concepts in Physical Education and Exercise Science, 45–72. Champaign, IL: Human Kinetics.
Braun, S. (2013). Keeping your distance? Remote interpreting in legal proceedings: A critical assessment of a growing practice. Interpreting 15 (2), 200–228. https://doi.org/10.1075/intp.15.2.03bra
Campbell, S. and Hale, S. (2003). Translation and interpreting assessment in the context of educational measurement. In G. Anderman and M. Rogers (Eds) Translation Today: Trends and Perspectives, 205–224. Clevedon: Multilingual Matters.
Chang, C-C. and Wu, M. M-C. (2014). Non-native English at international conferences: Perspectives from Chinese-English conference interpreters in Taiwan. Interpreting 16 (2), 169–190. https://doi.org/10.1075/intp.16.2.02cha
Cheung, A. K. F. (2007). The effectiveness of summary training in consecutive interpreting (CI) delivery. Forum 5 (2), 1–23. https://doi.org/10.1075/forum.5.2.01che
Cheung, A. K. F. (2014). Anglicized numerical denominations as a coping tactic for simultaneous interpreting from English into Mandarin Chinese: An experimental study. Forum 12 (1), 1–22. https://doi.org/10.1075/forum.12.1.01che
Cohen, J. (1960). A coefficient for agreement for nominal scales. Educational and Psychological Measurement 20, 37–46. https://doi.org/10.1177/001316446002000104
Crocker, L. and Algina, J. (1986). Introduction to Classical and Modern Test Theory. Orlando, FL: Harcourt Brace Jovanovich.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334. https://doi.org/10.1007/BF02310555
Davitti, E. (2013). Dialogue interpreting as intercultural mediation: Interpreters’ use of upgrading moves in parent-teacher meetings. Interpreting 15 (2), 168–199. https://doi.org/10.1075/intp.15.2.02dav
Feldt, L. S. and Brennan, R. L. (1989). Reliability. In R. L. Linn (Eds) Educational Measurement (3rd ed.), 127–44. New York: Macmillan.
Fleenor, J. W., Fleenor, J. B. and Grossnickle, W. F. (1996). Interrater reliability and agreement of performance ratings: A methodological comparison. Journal of Business and Psychology 10 (2), 367–380. https://doi.org/10.1007/BF02249609
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 378–382. https://doi.org/10.1037/h0031619
Frick, T. and Semmel, M. I. (1978). Observer agreement and reliabilities of classroom observational measures. Review of Educational Research 48, 157–184. https://doi.org/10.3102/00346543048001157
Geertz, C. (1973). The Interpretation of Cultures. New York: Basic Books.
Gile, D. (1994). Opening up in interpretation studies. In M. Snell-Hornby, F. Pöchhacker and K. Kaindl (Eds) Translation Studies: An interdiscipline, 149–158. Amsterdam: John Benjamin. https://doi.org/10.1075/btl.2.20gil
Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercise Science 5 (1), 13–34. https://doi.org/10.1207/S15327841MPEE0501_2
Gwet, K. L. (2013). Handbook of Inter-rater Reliability (3rd ed.). Gaithersburg, MD: Advanced Analytics, LLC.
Hale, S., Garcia, I., Hlavac, J., Kim, M., Lai, M., Turner, B., and Slatyer, H. (2012). Development of a conceptual overview for a new model for NAATI standards, testing and assessment. Retrieved on 22 May 2015 from http://www.naati.com.au/PDF/INT/INTFinalReport.pdf
Hale, S. and Napier, J. (2013). Research Methods in Interpreting: A Practical Resource. London and New York: Bloomsbury.
Han, C. (2015). Investigating rater severity/leniency in interpreter performance testing: A multifaceted Rasch measurement approach. Interpreting 17 (2), 255–283. https://doi.org/10.1075/intp.17.2.05han
Han, C. (2016). Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach. Language Assessment Quarterly 13 (3), 186–201. https://doi.org/10.1080/15434303.2016.1211132
Hayes, A. F. (2005). Statistical Methods for Communication Science. Mahwah, NJ: Lawrence Erlbaum.
James, J. R. and Gabriel, K. I. (2012). Student interpreters show encoding and recall differences from information in English and American Sign Language. Translation and Interpreting Research 4 (1), 21–37.
Johnson, R. B. and Turner, L. S. (2003). Data collection strategies in mixed methods research. In A. Tashakkori and C. Teddlie (Eds) Handbook of Mixed Methods in Social and Behavioral Research, 297–319. Thousand Oaks, CA: SAGE.
Keselman, O., Cederborg, A-C., and Linell, P. (2010). ‘That is not necessary for you to know!’ Negotiation of participation status of unaccompanied children in interpreter-mediated asylum hearings. Interpreting 12 (1), 83–104. https://doi.org/10.1075/intp.12.1.04kes
Kozlowski, S. and Hattnip, K. (1992). A disagreement about within group agreement: Disentangling issues of consistency versus consensus. Journal of Applied Psychology 77, 161–167. https://doi.org/10.1037/0021-9010.77.2.161
Lee, J. (2008). Rating scales for interpreting performance assessment. The Interpreter and Translator Trainer 2 (2), 165–184. https://doi.org/10.1080/1750399X.2008.10798772
Lee, S-B. (2015). Developing an analytic scale for assessing undergraduate students’ consecutive interpreting performances. Interpreting 17 (2), 226–254. https://doi.org/10.1075/intp.17.2.04lee
Lin, I. I., Chang, F. A., and Kuo, F. (2013). The impact of non-native accented English on rendition accuracy in simultaneous interpreting. Translation & Interpreting Research 5 (2), 30–44. https://doi.org/10.12807/ti.105202.2013.a03
Liu, M-H. (2013). Design and analysis of Taiwan’s interpretation certification examination. In D. Tsagari and R. van Deemter (Eds) Assessment Issues in Language Translation and Interpreting, 163–178. Frankfurt: Peter Lang.
Liu, M-H. and Chiu, Y-H. (2009). Assessing source material difficulty for consecutive interpreting: Quantifiable measures and holistic judgment. Interpreting 11 (2), 244–266. https://doi.org/10.1075/intp.11.2.07liu
Liu, M-H., Chang, C-C. and Wu, S-C. (2008). Interpretation evaluation practices: Comparison of eleven schools in Taiwan, China, Britain, and the USA. Compilation and Translation Review 1 (1), 1–42.
Liu, M-H., Schallert, D. L., and Carroll, P. J. (2004). Working memory and expertise in simultaneous interpreting. Interpreting 6 (1), 19–42. https://doi.org/10.1075/intp.6.1.04liu
McDermid, C. (2014) Cohesion in English to ASL simultaneous interpreting. Translation and Interpreting Research 6 (1), 76–101. ttps://doi.org/10.12807/ti.106201.2014.a05
Multon, K. D. (2010). Interrater reliability. In N. J. Salkind (Ed.) Encyclopedia of Research Design, 627–629. Thousand Oaks, CA: SAGE.
Napier, J. (2004). Interpreting omissions: A new perspective. Interpreting 6 (2), 117–142. https://doi.org/10.1075/intp.6.2.02nap
Peng, G. (2009). Using Rhetorical Structure Theory (RST) to describe the development of coherence in interpreting trainees. Interpreting 11 (2), 216–243. https://doi.org/10.1075/intp.11.2.06pen
Pöchhacker, F. (2011) Research interpreting: Approaches to inquiry. In B. Nicodemus and L. Sabey (Eds) Advances in Interpreting Research, 5–25. Amsterdam: John Benjamins.
Pradas Macías, M. (2006). Probing quality criteria in simultaneous interpreting: The role of silent pauses in fluency. Interpreting 8 (1), 25–43. https://doi.org/10.1075/intp.8.1.03pra
Reithofer, K. (2013). Comparing modes of communication: The effect of English as a lingua franca vs. interpreting. Interpreting 15 (1), 48–73. https://doi.org/10.1075/intp.15.1.03rei
Rosiers, A., Eyckmans, J., and Bauwens, D. (2011). A story of attitudes and aptitudes? Investigating individual difference variables within the context of interpreting. Interpreting 13 (1), 53–69. https://doi.org/10.1075/intp.13.1.04ros
Rovira-Esteva, S. and Orero, P. (2011). A contrastive analysis of the main benchmarking tools for research assessment in translation and interpreting: The Spanish approach. Perspectives 19 (3), 233–251. https://doi.org/10.1080/0907676X.2011.590214
Roziner, I. and Shlesinger, M. (2010). Much ado about something remote: Stress and performance in remote interpreting. Interpreting 12 (2), 214–247. https://doi.org/10.1075/intp.12.2.05roz
Sawyer, D. B. (2004). Fundamental Aspects of Interpreter Education: Curriculum and Assessment. Amsterdam: John Benjamins. https://doi.org/10.1075/btl.47
Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly 19, 321–325. https://doi.org/10.1086/266577
Setton, R. and Motta, M. (2007). Syntacrobatics Quality and reformulation in simultaneous-with-text. Interpreting 9 (2), 199–230. https://doi.org/10.1075/intp.9.2.04set
Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin 19 (3), 321–325. https://doi.org/10.1037/0033-2909.86.2.420
Shlesigner, M. (2009). Crossing the divide: What researchers and practitioners can learn from one another. Translation and Interpreting Research 1 (1), 1–16.
Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research and Evaluation 9 (4). Retrieved 30 May 2014 from http://pareonline.net/getvn.asp?v=9&n=4
Stemler, S. E. and Tsai, J. (2008). Best practices in estimating interrater reliability:Three common approaches. In J. Osborne (Ed.) Best Practices in Quantitative Methods, 29–49. Thousand Oaks, CA: SAGE. https://doi.org/10.4135/9781412995627.d5
Teddlie, C. and Tashakkori, A. (2009). The Foundations of Mixed Methods Research: Integrating Quantitative and Qualitative Techniques in the Social and Behavioral Sciences (2nd ed.). Thousand Oaks, CA: SAGE.
Thompson, B. and Snyder, P. A. (1998). Statistical significance and reliability analysis in recent JCD research article. Journal of Counseling and Development 76, 436–441. https://doi.org/10.1002/j.1556-6676.1998.tb02702.x
Tinsley, H. E. A. and Weiss, D. J. (1975). Interrater reliability and agreement of subjective judgements. Journal of Counseling Psychology 22 (4), 358–376. https://doi.org/10.1037/h0076640
Tiselius, E. (2009). Revisiting Carroll’s scales. In C. Angelelli and H. E. Jacobson (Eds) Testing and Assessment in Translation and Interpreting Studies, 95–121. Amsterdam: John Benjamins. https://doi.org/10.1075/ata.xiv.07tis
von Eye, A. and Mun, E. Y. (2004). Analyzing Rater Agreement: Manifest Variable Methods. Mahwah, NJ: Lawrence Erlbaum.
Wu, S. C. (2013). How do we assess students in the interpreting examinations. In D. Tsagari and R. van Deemter (Eds) Assessment Issues in Language Translation and Interpreting, 15–33. Frankfurt: Peter Lang.
Yan, J-X., Pan, J., Wu, H., and Wang, Y. (2013). Mapping Interpreting Studies: The state of the field based on a database of nine major Translation and Interpreting journals (2000–2010). Perspectives 21 (3), 446–73. https://doi.org/10.1080/0907676X.2012.746379
Zheng, B-H. and Xiang, X. (2014). The impact of cultural background knowledge in the processing of metaphorical expressions: An empirical study of English-Chinese sight translation. Translation and Interpreting Studies 9 (1), 5–24. https://doi.org/10.1075/tis.9.1.01zhe
Zuo, J. (2014). Image schemata and visualization in simultaneous interpreting training. The Interpreter and Translator Trainer 8 (2), 204–216. https://doi.org/10.1080/1750399X.2014.908553