Item Details

Reporting practices of rater reliability in interpreting research: A mixed-methods review of 14 journals (2004–2014)

Issue: Vol 3 No. 1 (2016) Mixed Methods

Journal: Journal of Research Design and Statistics in Linguistics and Communication Science

Subject Areas: Linguistics

DOI: 10.1558/jrds.29622

Abstract:

The issue addressed in this study is the reporting practices of rater reliability in interpreting research (IR), given that the use of raters as a method of measurement is a commonplace in IR, and that little is known about to what extent and how rater reliability estimates (RREs) have been reported. Drawing upon 447 articles from 14 translation and interpreting journals (2004--2014), this mixed-methods study attempts to gain quantitative and qualitative insights into the reporting practices. Data analysis reveals that: 1) almost 90% of the articles that needed to report RREs failed to do so; 2) potential problems emerged from those articles that reported RREs: lack of distinction between rater consensus and consistency, underreporting, misinterpretation and misuse of RREs, and lack of justification for the use of rater-generated measurements for subsequent data analysis. These findings highlight an urgent need for increased author awareness of reporting appropriate RREs in IR.

Author: Chao Han

View Original Web Page

References :

Agrifoglio, M. (2004). Sight translation and interpreting: A comparative analysis of constraints and failures. Interpreting 6 (1), 43−67. https://doi.org/10.1075/intp.6.1.05agr


Angelelli, C. (2009). Using a rubric to assess translation ability: Defining the construct. In C. Angelelli and H. E. Jacobson (Eds) Testing and Assessment in Translation and Interpreting Studies, 13–47. Amsterdam: John Benjamins. https://doi.org/10.1075/ata.xiv.03ang


Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.


Bakti, M. and Bóna, J. (2014). Source language-related erroneous stress placement in the target language output of simultaneous interpreters. Interpreting 16 (1), 34–48. https://doi.org/10.1075/intp.16.1.03bak


Bale, R. (2013). Undergraduate consecutive interpreting and lexical knowledge: The role of spoken corpora. The Interpreter and Translator Trainer 7 (1), 27–50. https://doi.org/10.1080/13556509.2013.10798842


Bartko, J. J. (1966). The intraclass correlation coefficient as a measure of reliability. Psychological Reports 19, 3–11. https://doi.org/10.2466/pr0.1966.19.1.3


Bartłomiejczyk, M. (2006). Strategies of simultaneous interpreting and directionality. Interpreting 8 (2), 149–174. https://doi.org/10.1075/intp.8.2.03bar


Baumgartner, T. A. (1989). Norm-referenced measurement: Reliability. In M. J. Safrit and T. M. Wood (Eds) Measurement Concepts in Physical Education and Exercise Science, 45–72. Champaign, IL: Human Kinetics.


Braun, S. (2013). Keeping your distance? Remote interpreting in legal proceedings: A critical assessment of a growing practice. Interpreting 15 (2), 200–228. https://doi.org/10.1075/intp.15.2.03bra


Campbell, S. and Hale, S. (2003). Translation and interpreting assessment in the context of educational measurement. In G. Anderman and M. Rogers (Eds) Translation Today: Trends and Perspectives, 205–224. Clevedon: Multilingual Matters.


Chang, C-C. and Wu, M. M-C. (2014). Non-native English at international conferences: Perspectives from Chinese-English conference interpreters in Taiwan. Interpreting 16 (2), 169–190. https://doi.org/10.1075/intp.16.2.02cha


Cheung, A. K. F. (2007). The effectiveness of summary training in consecutive interpreting (CI) delivery. Forum 5 (2), 1–23. https://doi.org/10.1075/forum.5.2.01che


Cheung, A. K. F. (2014). Anglicized numerical denominations as a coping tactic for simultaneous interpreting from English into Mandarin Chinese: An experimental study. Forum 12 (1), 1–22. https://doi.org/10.1075/forum.12.1.01che


Cohen, J. (1960). A coefficient for agreement for nominal scales. Educational and Psychological Measurement 20, 37–46. https://doi.org/10.1177/001316446002000104


Crocker, L. and Algina, J. (1986). Introduction to Classical and Modern Test Theory. Orlando, FL: Harcourt Brace Jovanovich.


Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334. https://doi.org/10.1007/BF02310555


Davitti, E. (2013). Dialogue interpreting as intercultural mediation: Interpreters’ use of upgrading moves in parent-teacher meetings. Interpreting 15 (2), 168–199. https://doi.org/10.1075/intp.15.2.02dav


Feldt, L. S. and Brennan, R. L. (1989). Reliability. In R. L. Linn (Eds) Educational Measurement (3rd ed.), 127–44. New York: Macmillan.


Fleenor, J. W., Fleenor, J. B. and Grossnickle, W. F. (1996). Interrater reliability and agreement of performance ratings: A methodological comparison. Journal of Business and Psychology 10 (2), 367–380. https://doi.org/10.1007/BF02249609


Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 378–382. https://doi.org/10.1037/h0031619


Frick, T. and Semmel, M. I. (1978). Observer agreement and reliabilities of classroom observational measures. Review of Educational Research 48, 157–184. https://doi.org/10.3102/00346543048001157


Geertz, C. (1973). The Interpretation of Cultures. New York: Basic Books.


Gile, D. (1994). Opening up in interpretation studies. In M. Snell-Hornby, F. Pöchhacker and K. Kaindl (Eds) Translation Studies: An interdiscipline, 149–158. Amsterdam: John Benjamin. https://doi.org/10.1075/btl.2.20gil


Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercise Science 5 (1), 13–34. https://doi.org/10.1207/S15327841MPEE0501_2


Gwet, K. L. (2013). Handbook of Inter-rater Reliability (3rd ed.). Gaithersburg, MD: Advanced Analytics, LLC.


Hale, S., Garcia, I., Hlavac, J., Kim, M., Lai, M., Turner, B., and Slatyer, H. (2012). Development of a conceptual overview for a new model for NAATI standards, testing and assessment. Retrieved on 22 May 2015 from http://www.naati.com.au/PDF/INT/INTFinalReport.pdf


Hale, S. and Napier, J. (2013). Research Methods in Interpreting: A Practical Resource. London and New York: Bloomsbury.


Han, C. (2015). Investigating rater severity/leniency in interpreter performance testing: A multifaceted Rasch measurement approach. Interpreting 17 (2), 255–283. https://doi.org/10.1075/intp.17.2.05han


Han, C. (2016). Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach. Language Assessment Quarterly 13 (3), 186–201. https://doi.org/10.1080/15434303.2016.1211132


Hayes, A. F. (2005). Statistical Methods for Communication Science. Mahwah, NJ: Lawrence Erlbaum.


James, J. R. and Gabriel, K. I. (2012). Student interpreters show encoding and recall differences from information in English and American Sign Language. Translation and Interpreting Research 4 (1), 21–37.


Johnson, R. B. and Turner, L. S. (2003). Data collection strategies in mixed methods research. In A. Tashakkori and C. Teddlie (Eds) Handbook of Mixed Methods in Social and Behavioral Research, 297–319. Thousand Oaks, CA: SAGE.


Keselman, O., Cederborg, A-C., and Linell, P. (2010). ‘That is not necessary for you to know!’ Negotiation of participation status of unaccompanied children in interpreter-mediated asylum hearings. Interpreting 12 (1), 83–104. https://doi.org/10.1075/intp.12.1.04kes


Kozlowski, S. and Hattnip, K. (1992). A disagreement about within group agreement: Disentangling issues of consistency versus consensus. Journal of Applied Psychology 77, 161–167. https://doi.org/10.1037/0021-9010.77.2.161


Lee, J. (2008). Rating scales for interpreting performance assessment. The Interpreter and Translator Trainer 2 (2), 165–184. https://doi.org/10.1080/1750399X.2008.10798772


Lee, S-B. (2015). Developing an analytic scale for assessing undergraduate students’ consecutive interpreting performances. Interpreting 17 (2), 226–254. https://doi.org/10.1075/intp.17.2.04lee


Lin, I. I., Chang, F. A., and Kuo, F. (2013). The impact of non-native accented English on rendition accuracy in simultaneous interpreting. Translation & Interpreting Research 5 (2), 30–44. https://doi.org/10.12807/ti.105202.2013.a03


Liu, M-H. (2013). Design and analysis of Taiwan’s interpretation certification examination. In D. Tsagari and R. van Deemter (Eds) Assessment Issues in Language Translation and Interpreting, 163–178. Frankfurt: Peter Lang.


Liu, M-H. and Chiu, Y-H. (2009). Assessing source material difficulty for consecutive interpreting: Quantifiable measures and holistic judgment. Interpreting 11 (2), 244–266. https://doi.org/10.1075/intp.11.2.07liu


Liu, M-H., Chang, C-C. and Wu, S-C. (2008). Interpretation evaluation practices: Comparison of eleven schools in Taiwan, China, Britain, and the USA. Compilation and Translation Review 1 (1), 1–42.


Liu, M-H., Schallert, D. L., and Carroll, P. J. (2004). Working memory and expertise in simultaneous interpreting. Interpreting 6 (1), 19–42. https://doi.org/10.1075/intp.6.1.04liu


McDermid, C. (2014) Cohesion in English to ASL simultaneous interpreting. Translation and Interpreting Research 6 (1), 76–101. ttps://doi.org/10.12807/ti.106201.2014.a05


Multon, K. D. (2010). Interrater reliability. In N. J. Salkind (Ed.) Encyclopedia of Research Design, 627–629. Thousand Oaks, CA: SAGE.


Napier, J. (2004). Interpreting omissions: A new perspective. Interpreting 6 (2), 117–142. https://doi.org/10.1075/intp.6.2.02nap


Peng, G. (2009). Using Rhetorical Structure Theory (RST) to describe the development of coherence in interpreting trainees. Interpreting 11 (2), 216–243. https://doi.org/10.1075/intp.11.2.06pen


Pöchhacker, F. (2011) Research interpreting: Approaches to inquiry. In B. Nicodemus and L. Sabey (Eds) Advances in Interpreting Research, 5–25. Amsterdam: John Benjamins. 


Pradas Macías, M. (2006). Probing quality criteria in simultaneous interpreting: The role of silent pauses in fluency. Interpreting 8 (1), 25–43. https://doi.org/10.1075/intp.8.1.03pra


Reithofer, K. (2013). Comparing modes of communication: The effect of English as a lingua franca vs. interpreting. Interpreting 15 (1), 48–73. https://doi.org/10.1075/intp.15.1.03rei


Rosiers, A., Eyckmans, J., and Bauwens, D. (2011). A story of attitudes and aptitudes? Investigating individual difference variables within the context of interpreting. Interpreting 13 (1), 53–69. https://doi.org/10.1075/intp.13.1.04ros


Rovira-Esteva, S. and Orero, P. (2011). A contrastive analysis of the main benchmarking tools for research assessment in translation and interpreting: The Spanish approach. Perspectives 19 (3), 233–251. https://doi.org/10.1080/0907676X.2011.590214


Roziner, I. and Shlesinger, M. (2010). Much ado about something remote: Stress and performance in remote interpreting. Interpreting 12 (2), 214–247. https://doi.org/10.1075/intp.12.2.05roz


Sawyer, D. B. (2004). Fundamental Aspects of Interpreter Education: Curriculum and Assessment. Amsterdam: John Benjamins. https://doi.org/10.1075/btl.47


Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly 19, 321–325. https://doi.org/10.1086/266577


Setton, R. and Motta, M. (2007). Syntacrobatics Quality and reformulation in simultaneous-with-text. Interpreting 9 (2), 199–230. https://doi.org/10.1075/intp.9.2.04set


Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin 19 (3), 321–325. https://doi.org/10.1037/0033-2909.86.2.420


Shlesigner, M. (2009). Crossing the divide: What researchers and practitioners can learn from one another. Translation and Interpreting Research 1 (1), 1–16.


Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research and Evaluation 9 (4). Retrieved 30 May 2014 from http://pareonline.net/getvn.asp?v=9&n=4


Stemler, S. E. and Tsai, J. (2008). Best practices in estimating interrater reliability:Three common approaches. In J. Osborne (Ed.) Best Practices in Quantitative Methods, 29–49. Thousand Oaks, CA: SAGE. https://doi.org/10.4135/9781412995627.d5


Teddlie, C. and Tashakkori, A. (2009). The Foundations of Mixed Methods Research: Integrating Quantitative and Qualitative Techniques in the Social and Behavioral Sciences (2nd ed.). Thousand Oaks, CA: SAGE.


Thompson, B. and Snyder, P. A. (1998). Statistical significance and reliability analysis in recent JCD research article. Journal of Counseling and Development 76, 436–441. https://doi.org/10.1002/j.1556-6676.1998.tb02702.x


Tinsley, H. E. A. and Weiss, D. J. (1975). Interrater reliability and agreement of subjective judgements. Journal of Counseling Psychology 22 (4), 358–376. https://doi.org/10.1037/h0076640


Tiselius, E. (2009). Revisiting Carroll’s scales. In C. Angelelli and H. E. Jacobson (Eds) Testing and Assessment in Translation and Interpreting Studies, 95–121. Amsterdam: John Benjamins. https://doi.org/10.1075/ata.xiv.07tis


von Eye, A. and Mun, E. Y. (2004). Analyzing Rater Agreement: Manifest Variable Methods. Mahwah, NJ: Lawrence Erlbaum.


Wu, S. C. (2013). How do we assess students in the interpreting examinations. In D. Tsagari and R. van Deemter (Eds) Assessment Issues in Language Translation and Interpreting, 15–33. Frankfurt: Peter Lang.


Yan, J-X., Pan, J., Wu, H., and Wang, Y. (2013). Mapping Interpreting Studies: The state of the field based on a database of nine major Translation and Interpreting journals (2000–2010). Perspectives 21 (3), 446–73. https://doi.org/10.1080/0907676X.2012.746379


Zheng, B-H. and Xiang, X. (2014). The impact of cultural background knowledge in the processing of metaphorical expressions: An empirical study of English-Chinese sight translation. Translation and Interpreting Studies 9 (1), 5–24. https://doi.org/10.1075/tis.9.1.01zhe


Zuo, J. (2014). Image schemata and visualization in simultaneous interpreting training. The Interpreter and Translator Trainer 8 (2), 204–216. https://doi.org/10.1080/1750399X.2014.908553