Estudio del efecto de imprimación de la traducción automática sobre un corpus de textos del español institucional

Celia Rico Pérez

doi:10.5944/rhd.vol.10.2025.41906

Autores/as

Celia Rico Pérez Universidad Complutense de Madrid https://orcid.org/0000-0002-5056-8513

DOI:

https://doi.org/10.5944/rhd.vol.10.2025.41906

Palabras clave:

imprimación de la traducción automática, post-editese, diversidad y densidad léxica, corpus lenght ratio, corpus UCM-EUROPA

Resumen

Este artículo presenta un análisis del efecto de imprimación de la traducción automática en los textos institucionales de la Unión Europea traducidos al español. Se abordan dos preguntas clave: a) ¿es posible identificar alguna variación lingüística en los textos traducidos automáticamente coincidiendo temporalmente con los diferentes desarrollos de la tecnología de traducción automática?; b) si existen variaciones ¿hasta qué punto pueden deberse al efecto de imprimación de la traducción automática? Se trata de un estudio cuantitativo sobre cuatro aspectos: la diversidad léxica, la densidad léxica, el índice de la longitud del corpus (lenght ratio) y los patrones léxicos. Los resultados muestran ciertos indicios de imprimación de la traducción automática, aunque, como se indica en la conclusión, los datos no son concluyentes. Sería necesario complementarlos con un análisis cualitativo que examine casos individuales en contexto y que explore las variaciones lingüísticas que no se reflejan en los datos cuantitativos.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Anthony, L. (2022). TagAnt (Version 2.0.5) [Computer Software]. Tokyo, Japan: Waseda University. https://www.laurenceanthony.net/software

Anthony, L. (2023). AntConc (Version 4.2.4) [Computer Software]. Tokyo, Japan: Waseda University. https://www.laurenceanthony.net/software

Aranberri, N. (2020). Can translationese features help users select an MT system for post-editing? Procesamiento del Lenguaje Natural, 64, 93-100. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6200

Avramidis, E., Burchardt, A., Hunsicker, S., Popovic, M., Tscherwinka, C., Vilar, D., y Uszkoreit, H. (2014). The taraxu¨ corpus of human-annotated machine translations. Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2679–2682. http://www.lrec-conf.org/proceedings/lrec2014/pdf/401_Paper.pdf

Baker, M. (1993). Corpus linguistics and translation studies: Implications and

applications. En M. Baker, G. Francis, y E. Tognini-Bonelli (Eds.), Text and

technology: in honour of John Sinclair (pp. 233-252). John Benjamins.

Bangalore, S., Behrens, B., Carl, M., Ghankot, M., Heilmann, A., Nitzke, J., Schaeffer, M., y Sturm, A. (2016). Syntactic Variance and Priming Effects in Translation. En M. Carl, S. Bangalore, y M. Schaeffer (Eds.), New Directions in Empirical Translation Process Research. New Frontiers in Translation Studies (pp. 211-238). Springer.

Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., y Zampieri, M. (2016, agosto). Findings of the 2016 conference on machine translation (wmt16). First conference on machine translation, 131-198. https://aclanthology.org/W16-2301/

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., HerbertVoss, S., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., y Amodei, D. (2020). Language models are few-shot learners. Advances in NeurIPS, 33, 1877–1901.

CELEX. (1994). Communication from the Commission to the Council and the European Parliament. Final evaluation of the results of Eurotra: a specific programme concerning the preparation of the development of an operational Eurotra system for Machine Translation. Document 51994DC0069. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:51994DC0069

Cettolo, M., Niehues, J., Stuker, S., Bentivogli, L., Cattoni, R., y Marcello, F. (2015). The iwslt 2015 evaluation campaign. Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign, 2–14. https://aclanthology.org/2015.iwslt-evaluation.1/

Čulo, O., y Nitzke, J. (2016). Patterns of Terminological Variation in Post-editing and of Cognate Use in Machine Translation in Contrast to Human Translation. Proceedings of the 19th Annual Conference of the European Association for Machine Translation, 106–114. https://aclanthology.org/W16-3401/

Daems, J., De Clercq, O., y Macken, L. (2017). Translationese and post-editese: How comparable is comparable quality? Linguistica Antverpiensia, New Series: Themes in Translation Studies, 16, 89–103.

DataLitMT (2023). Data Evaluation: Machine Translationese and Post-Editese. https://github.com/ITMK/DataLitMT/blob/main/learning_resources/machine_translationese_post-editese/Data_Evaluation_Machine_Translationese_and_Post-Editese_Basic_Paper.pdf Recuperado el 17 de julio de 2024.

DGT-Translation memory. (2024). https://joint-research-centre.ec.europa.eu/language-technology-resources/dgt-translation-memory Recuperado el 17 de julio de 2024.

EUR-Lex (1997). EUR-Lex - 91996E2286 – EN. WRITTEN QUESTION No. 2286/96 by Ben FAYOT to the Commission. Systran translation system developed by DG XIII - Position of the system development team. Official Journal C 011 , 13/01/1997. https://eur-lex.europa.eu/legal-content/MT/TXT/?uri=CELEX:91996E002286

EUR-Lex (2012). Carta de los derechos fundamentales de la Unión Europea. (2012/C 326/02). https://eur-lex.europa.eu/legal-content/ES/TXT/HTML/?uri=CELEX:12012P/TXT

European Commission. (2008). Translation Tools and Workflow. European Commission Directorate-General for Translation Communication and Information Unit.

European Commission. (2016). Translation Tools and Workflow. European Commission Directorate-General for Translation Communication and Information Unit.

Farrel, M. (2018). Machine Translation Markers in Post-Edited Machine Translation Output. Proceedings of the 40th Conference Translating and the Computer, 50–59.

Foti, M. (2012, 29 y 30 de noviembre). MT@EC: Working with translators. [Paper presentation]. ASLIB - Translating and the Computer Conference, Londres, Reino Unido. https://aclanthology.org/2012.tc-1.4.pdf

Freitag, M., Vilar, D., Grangier, D., Cherry, C., y Foster, D. (2022). A Natural Diet: Towards Improving Naturalness of Machine Translation Output. Findings of the Association for Computational Linguistics: ACL, 3340–3353. https://aclanthology.org/2022.findings-acl.263/

Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-Dowmunt, M., Lewis, W., Li, M., Liu, S., Liu, T., Luo, R., Menezes A., Qin, T., Seide, F., Tan, X., Tian, F., Wu, L., Wu, S., Xia, Y., Zhang, D., Zhang, Z., y Zhou, M. (2018). Achieving Human Parity on Automatic Chinese to English News Translation. https:// arxiv.org/abs/1803.05567

Jiang. Y, y Niu, J. (2022). A corpus-based search for machine translationese in terms of

discourse coherence. Across Languages and Cultures, 23(2), 148–166.

Jimenez-Crespo, M. A. (2023). “Translationese” (and “post-editese”?) no more: on importing fuzzy conceptual tools from Translation Studies in MT research. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 261–268. European Association for Machine Translation. https://aclanthology.org/2023.eamt-1.25/

Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental perspective. Linguistics and Phonetics Working Papers, 53, 61-79.

Kajzer-Wietrzny, M., y Ivaska, I. (2020). A multivariate approach to lexical diversity in

constrained language. Across Languages and Cultures, 21(2), 169–194.

Kyle, K. (2020). Lexical-diversity 0.1.1 [Computer software]. https://pypi.org/project/lexical-diversity/#description

Koehn, P., Hoang, H., Birch, A. Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., y Herbst, E. (2007). Moses: Open Source Toolkit for Statistical Machine Translation. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Companion Volume Proceedings of the Demo and Poster Sessions,177–180. https://aclanthology.org/P07-2045/

Krüger, H., y van Rooy, B. (2016). Constrained language: a multidimensional analysis of translated English and a non-native indigenised variety of English. English World-Wide, 37(1), 26–57.

Krüger, R. (2020). Explicitation in Neural Machine Translation. Across Languages and Cultures, 21(2), 195-216 (2020). https://doi.org/10.1556/084.2020.00012

Lapshinova-Koltunski, E. (2013). VARTRA: A Comparable Corpus for the Analysis of

Translation Variation. Proceedings of the 6th Workshop on Building and Using

Comparable Corpora, 77–86. https://aclanthology.org/W13-2510/

Lapshinova-Koltunski, E. (2015). Variation in Translation: Evidence from Corpora. En C. Fantinuoli, y F. Zanettin (Eds.), New Directions in Corpus-Based Translation Studies (pp. 93–113). Translation and Multilingual Natural Language Processing 1. Berlin: Language Science Press.

Maegaard, B. (1995). EUROTRA, History and Results. Proceedings of the V MT Summitt. Luxemburgo. https://aclanthology.org/1995.mtsummit-1.5.pdf

Maegaard, B., y Perschke, S. (1991). An Introduction to the Eurotra Programme. En C. Copeland, J. Durand, S. Krauwer, y B. Maegaard (Eds.), The Eurotra Linguistic Specifications. Studies in Machine Translation and Natural Language Processing, vol. 1 (pp. 7-14). Office for Official Publications of the Commission of the European Community.

Martikainen, H., y Kuble, N. (2016). Ergonomie cognitive de la post-edition de traduction automatique: enjeux pour la qualite des traductions. ILCEA Revue de l’Institut des langues et cultures d'Europe, Amérique, Afrique, Asie et Australie, 27, 1-17. https://doi.org/10.4000/ilcea.3863

McCarthy, P. M., y Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. https://doi.org/10.3758/BRM.42.2.381

Niu, J., y Jiang, Y. (2024). Does simplification hold true for machine translations? A corpus-based analysis of lexical diversity in text varieties across genres. Humanities and Social Sciences Communications, 11, 1-10. https://doi.org/10.1057/s41599-024-02986-7

Richards, B. (2006). Type/Token Ratios: what do they really tell us? Journal of Child Language, 14, 201 – 209.

Rico Pérez, C. (2024). Corpus UCM-EUROPA: estudio del efecto de imprimación de la traducción automática sobre un corpus de textos del español institucional [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14017181

Rothwell, A., Moorkens, J., Fernández-Parra, M., Drugan, J., y Austermuehl, F. (2023). Translation Tools and Technologies. Routledge.

Salton, G. (1989). Automatic text processing: The transformation, analysis and retrieval of information by computer. Addison-Wesley Longman.

Sánchez Ramos, M. M., y Rico Pérez, C. (2020). Traducción automática. Conceptos clave, procesos de evaluación y técnicas de posedición. Comares.

Sánchez-Saus Laserna, M. (2022). ¿De qué hablamos cuando divulgamos sobre lingüística? Análisis de un corpus de textos divulgativos y aplicaciones al estudio terminológico de la semántica léxica. ELUA. Estudios de Lingüística, 38, 73-98. https://doi.org/10.14198/ELUA.22384

Stefanowitsch, A. (2020). Corpus linguistics: A guide to the methodology. Language Science Press.

Steinberger R., Eisele, A., Klocek, S., Pilos, S., y Schlüter, P. (2012). DGT-TM: A freely Available Translation Memory in 22 Languages. Proceedings of the 8th international conference on Language Resources and Evaluation (LREC'2012). https://aclanthology.org/L12-1481/

Steinberger, R., Ebrahim, M., Poulis, A., Carrasco-Benitez, M., Schlüter, P., Przybyszewski, M., y Gilbro, S. (2014). An overview of the European Union's highly multilingual parallel corpora . Language Resources and Evaluation Journal (LRE), 679-707. https://doi.org/10.1007/s10579-014-9277-0

Teich, E. (2003). Cross-linguistic variation in system and text: a methodology for the investigation of translations and comparable texts. Mouton de Gruyter.

Tirkkonen-Condit, S. (2004). Unique items: over- or under-represented in translated language? En A. Mauranen, y P. Kujamäki (Eds.), Translation universals: Do they exist? (pp. 177-184). John Benjamins.

Toma, P. (1976). SYSTRAN. Foreign Broadcast Information Service Seminar on Machine Translation, 40–45. https://aclanthology.org/1976.earlymt-1.11/

Toral, A., y Sánchez-Cartagena, V.M. (2017). A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Volume 1, 1063–1073. https://aclanthology.org/E17-1000/

Toral, A. (2019). Post-editese: an Exacerbated Translationese. Proceedings of Machine Translation Summit XVII: Research Track, 273–281. https://aclanthology.org/W19-6627/

Translation Centre. (2019). Consolidated Activity Report of the Translation Centre 2018. Translation Centre for the Bodies of the European Union. https://op.europa.eu/s/zFA9

Translation Centre. (2023). Consolidated Activity Report of the Translation Centre 2022, Translation Centre. https://data.europa.eu/doi/10.2817/769919

Vanmassenhove, E., Shterionov, D., y Gwilliam, M. (2021). Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2203–2213. https://aclanthology.org/2021.eacl-main.188.pdf

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser. L., y Polosukhin, I. (2017). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010. https://doi.org/10.48550/arXiv.1706.03762

Volkart, L., y Bouillon, P. (2022). Studying Post-Editese in a Professional Context: A Pilot Study. Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 71–79. https://aclanthology.org/2022.eamt-1.10/

Estudio del efecto de imprimación de la traducción automática sobre un corpus de textos del español institucional

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Citas

Descargas

Publicado

Versiones

Cómo citar

Número

Sección

Licencia

Artículos similares

issn

Idioma

Palabras clave

Enviar un artículo

Portal de revistas UNED

Indexación