Recomendador de evaluación para preguntas cortas utilizando modelos de lenguaje en propiedad intelectual
DOI:
https://doi.org/10.5944/ried.45541Palabras clave:
evaluación, feedback, Moodle, test, preguntas cortas, inteligencia artificialResumen
El uso de la Inteligencia Artificial (IA) en educación está creciendo rápidamente, transformando el proceso de enseñanza-aprendizaje y también el proceso de evaluación. Este trabajo presenta SLASys, una herramienta para recomendar al profesorado la evaluación de preguntas cortas mediante técnicas de IA semántica, difiriendo de otros trabajos basados en IA generativa por el uso del modelo de lenguaje BERT que es más ligero, comprende mejor los conceptos en un contexto específico, mejora la eficiencia computacional y reduce los problemas éticos y de privacidad. SLASys implementa comparación semántica y modelos predictivos de clasificación de respuestas basados en BERT. Se ha seguido una metodología de investigación mixta, combinando investigación de acción con un enfoque de diseño y creación, para desarrollar y perfeccionar SLASys a lo largo de cuatro ediciones de un curso de nivel de máster sobre examen de patentes en el contexto de la propiedad intelectual. SLASys se ha integrado en Moodle, permitiendo su uso por parte de profesorado sin conocimientos técnicos, y ha sido probada por 120 estudiantes. Los resultados muestran su efectividad, tanto en el marco de la experiencia descrita como según la literatura existente, incluso con conjuntos de datos reducidos y un número limitado de participantes, y ha sido valorada positivamente por el profesorado y el estudiantado. Este trabajo contribuye a mostrar la viabilidad del uso de la IA en la educación superior, tanto en entornos híbridos como en línea, ofreciendo una solución para mejorar la evaluación y el feedback en preguntas cortas en contextos reales de aprendizaje.
Descargas
Citas
Abu Khurma, O., Albahti, F., Ali, N., & Bustanji, A. (2024). AI ChatGPT and student engagement: Unraveling dimensions through PRISMA analysis for enhanced learning experiences. Contemporary Educational Technology, 16(2). https://doi.org/10.30935/cedtech/14334
Adhikari, A., Ram, A., Tang, R., & Lin, J. (2019). DocBERT: BERT for document classification. arXiv. https://doi.org/10.48550/arXiv.1904.08398
Aggarwal, D., Sil, P., Raman, B., & Bhattacharyya, P. (2025). “I understand why I got this grade”: Automatic short answer grading with feedback. In Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-031-98420-4_22
Akçapınar, G. (2015). How automated feedback through text mining changes plagiaristic behavior in online assignments. Computers & Education, 87, 123-130. https://doi.org/10.1016/j.compedu.2015.04.007
Almasre, M. (2024). Development and evaluation of a custom GPT for the assessment of students’ designs in a typography course. Education Sciences, 14(2), Article 148. https://doi.org/10.3390/educsci14020148
Arefeen, M. A., Debnath, B., & Chakradhar, S. (2024). LeanContext: Cost-efficient domain-specific question answering using LLMs. Natural Language Processing Journal, 7, 100065. https://doi.org/10.1016/j.nlp.2024.100065
Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural machine translation by jointly learning to align and translate. arXiv. https://doi.org/10.48550/arXiv.1409.0473
Banihashem, S. K., Kerman, N. T., Noroozi, O., Moon, J., & Drachsler, H. (2024). Feedback sources in essay writing: Peer-generated or AI-generated feedback? International Journal of Educational Technology in Higher Education, 21(1), 23. https://doi.org/10.1186/s41239-024-00455-4
Baral, S., Botelho, A. F., Erickson, J. A., Benachamardi, P., & Heffernan, N. T. (2021). Improving automated scoring of student open responses in mathematics. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021).
Bergmann, J., & Sams, A. (2012). Flip your classroom: Reach every student in every class every day. International Society for Technology in Education.
Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60-117. https://doi.org/10.1007/s40593-014-0026-8
Calimeris, L., & Kosack, E. (2020). Immediate feedback assessment technique (IF-AT) quizzes and student performance in microeconomic principles courses. Journal of Economic Education, 51(3–4), 304-319. https://doi.org/10.1080/00220485.2020.1804501
Camus, L., & Filighera, A. (2020). Investigating transformers for automatic short answer grading. In I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. Millán (Eds.), Artificial intelligence in education (Lecture Notes in Computer Science, Vol. 12164, pp. 43-48). Springer. https://doi.org/10.1007/978-3-030-52240-7_8
Dai, Y., Lai, S., Lim, C. P., & Liu, A. (2025). University policies on generative AI in Asia: Promising practices, gaps, and future directions. Journal of Asian Public Policy, 18(2), 260-281. https://doi.org/10.1080/17516234.2024.2379070
del Gobbo, E., Guarino, A., Cafarelli, B., & Grilli, L. (2023). GradeAid: A framework for automatic short answers grading in educational contexts-Design, implementation and evaluation. Knowledge and Information Systems, 65(10), 4479-4507. https://doi.org/10.1007/s10115-023-01892-9
De La Cruz Martínez, G., Eslava-Cervantes, A.-L., & Ramírez, S. (2024, July 1). Analysis of solutions of ChatGPT to logic problems based on critical thinking. In Proceedings of the 16th International Conference on Education and New Learning Technologies (EDULEARN24) (pp. 10324-10331). IATED. https://doi.org/10.21125/edulearn.2024.2525
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4171-4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423
Dhananjaya, G. M., Goudar, R. H., Kulkarni, A. A., Rathod, V. N., & Hukkeri, G. S. (2024). A digital recommendation system for personalized learning to enhance online education: A review. IEEE Access, 12, 33591–33615. https://doi.org/10.1109/ACCESS.2024.3369901
Escalante, J., Pack, A., & Barrett, A. (2023). AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20(1), 40. https://doi.org/10.1186/s41239-023-00425-2
European Commission. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689
Evans, C. (2013). Making sense of assessment feedback in higher education. Review of Educational Research, 83(1), 70–120. https://doi.org/10.3102/0034654312474350
Gaddipati, S. K., Nair, D., & Plöger, P. G. (2020). Comparative evaluation of pretrained transfer learning models on automatic short answer grading. arXiv. https://arxiv.org/abs/2009.01303
Gaddipati, S. K., Plöger, P., Hochgeschwender, N., & Metzler, M. (2021, April 5). Automatic formative assessment for students’ short text answers through feature extraction [Doctoral dissertation, Hochschule Bonn-Rhein-Sieg].
Gaona, J., Reguant, M., Valdivia, I., Vásquez, M., & Sancho-Vinuesa, T. (2018). Feedback by automatic assessment systems used in mathematics homework in the engineering field. Computer Applications in Engineering Education, 26(4), 921-934. https://doi.org/10.1002/cae.21950
García-Peñalvo, F. J., Alier, M., Pereira, J., & Casany, M. J. (2024). Safe, transparent, and ethical artificial intelligence: Keys to quality sustainable education (SDG4). International Journal of Educational Research and Innovation, 22, 1–21. https://doi.org/10.46661/ijeri.11036
González Fernández, M. O., Romero-López, M. A., Sgreccia, N. F., & Latorre Medina, M. J. (2025). Marcos normativos para una IA ética y confiable en la educación superior: Estado de la cuestión. RIED-Revista Iberoamericana de Educación a Distancia, 28(2), 181-208. https://doi.org/10.5944/ried.28.2.43511
Grévisse, C. (2024). LLM-based automatic short answer grading in undergraduate medical education. BMC Medical Education, 24(1), 1060. https://doi.org/10.1186/s12909-024-06026-5
György, A., & Vajda, I. (2007). Intelligent mathematics assessment in eMax. In IEEE AFRICON Conference (pp. 1-7). https://doi.org/10.1109/AFRCON.2007.4401512
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487
Huisman, B., Saab, N., van Driel, J., & van den Broek, P. (2017). Peer feedback on college students’ writing: Exploring the relation between students’ ability match, feedback quality and essay performance. Higher Education Research & Development, 36(7), 1433-1446. https://doi.org/10.1080/07294360.2017.1325854
Husein, R. A., Aburajouh, H., & Catal, C. (2025). Large language models for code completion: A systematic literature review. Computer Standards & Interfaces, 92, 103917. https://doi.org/10.1016/j.csi.2024.103917
Hustad, E., & Arntzen, A. A. B. (2013). Facilitating teaching and learning capabilities in social learning management systems: Challenges, issues, and implications for design. Journal of Integrated Design and Process Science, 17(1), 33-46. https://doi.org/10.3233/JID-2013-0003
Jia, Q., Cui, J., Xi, R., Liu, C., Rashid, P., Li, R., & Gehringer, E. (2024). On assessing the faithfulness of LLM-generated feedback on student assignments. In B. Paaßen & C. D. Epp (Eds.), Proceedings of the 17th International Conference on Educational Data Mining (pp. 491-499). International Educational Data Mining Society. https://doi.org/10.5281/zenodo.12729868
Kim, T. W. (2023). Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: A narrative review. Journal of Educational Evaluation for Health Professions, 20, 38. https://doi.org/10.3352/jeehp.2023.20.38
Klein, R., Kyrilov, A., & Tokman, M. (2011). Automated assessment of short free-text responses in computer science using latent semantic analysis. In Proceedings of ITiCSE ’11: The 16th Annual Conference on Innovation and Technology in Computer Science Education (pp. 158-162). https://doi.org/10.1145/1999747.1999793
Kuechler, W., & Vaishnavi, V. (2012). A framework for theory development in design science research: Multiple perspectives. Journal of the Association for Information Systems, 13(6), 395-423. https://doi.org/10.17705/1JAIS.00300
Li, T. W., Hsu, S., Fowler, M., Zhang, Z., Zilles, C., & Karahalios, K. (2023). Am I wrong, or is the autograder wrong? Effects of AI grading mistakes on learning. In Proceedings of the 2023 ACM Conference on International Computing Education Research (ICER ’23) (pp. 85-97). https://doi.org/10.1145/3568813.3600124
Liu, T., Ding, W., Wang, Z., Tang, J., Huang, G. Y., & Liu, Z. (2019). Automatic short answer grading via multiway attention networks. In Lecture Notes in Computer Science (Vol. 11626, pp. 376–388). https://doi.org/10.1007/978-3-030-23207-8_32
Lun, J., Zhu, J., Tang, Y., & Yang, M. (2020). Multiple data augmentation strategies for improving performance on automatic short answer scoring. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, pp. 13381-13388). https://doi.org/10.1609/aaai.v34i09.7062
Mehrafarin, H., Rajaee, S., & Pilehvar, M. T. (2022). On the importance of data size in probing fine-tuned models. In Findings of the Association for Computational Linguistics (pp. 239–248). https://doi.org/10.18653/v1/2022.findings-acl.20
Messer, M., Brown, N. C. C., Kölling, M., & Shi, M. (2024). Automated grading and feedback tools for programming education: A systematic review. ACM Transactions on Computing Education, 24(1), Article 1. https://doi.org/10.1145/3636515
Metzler, T., Plöger, P. G., & Hees, J. (2024). Computer-assisted short answer grading using large language models and rubrics. In INFORMATIK 2024: AI@WORK (pp. 1383-1393). Gesellschaft für Informatik e.V. https://doi.org/10.18420/inf2024_121
Nguyen, H., Bhat, S., Moore, S., Bier, N., & Stamper, J. (2022). Towards generalized methods for automatic question generation in educational domains. In S. Isotani, E. Millán, A. Ogan, P. Hastings, B. McLaren, & R. Luckin (Eds.), Intelligent Tutoring Systems. ITS 2022. Lecture Notes in Computer Science (Vol. 13450, pp. 272-284). Springer. https://doi.org/10.1007/978-3-031-16290-9_20
Nicol, D., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199-218. https://doi.org/10.1080/03075070600572090
Novak, G. M. (2012). Just-in-time teaching. New Directions for Teaching and Learning, 2012(128), 63-73. https://doi.org/10.1002/tl.469
Oates, B. J. (2006). Researching information systems and computing. SAGE Publications Ltd.
OpenAI. (2024). GPT-4o system card. https://openai.com/index/gpt-4o-system-card/
Padó, U., Eryilmaz, Y., & Kirschner, L. (2024). Short-answer grading for German: Addressing the challenges. International Journal of Artificial Intelligence in Education, 34(4), 1488-1510. https://doi.org/10.1007/s40593-023-00383-w
Pang, J., Ye, F., Wong, D. F., Yu, D., Shi, S., Tu, Z., & Wang, L. (2025). Salute the classic: Revisiting challenges of machine translation in the age of large language models. Transactions of the Association for Computational Linguistics, 13, 73-95. https://doi.org/10.1162/tacl_a_00730
Petridou, E., & Lao, L. (2024). Identifying challenges and best practices for implementing AI additional qualifications in vocational and continuing education: A mixed methods analysis. International Journal of Lifelong Education, 43(4), 385-400. https://doi.org/10.1080/02601370.2024.2351076
Qiu, Y., & Jin, Y. (2024). ChatGPT and finetuned BERT: A comparative study for developing intelligent design support systems. Intelligent Systems with Applications, 21, 200308. https://doi.org/10.1016/j.iswa.2023.200308
Rezaei, A. R. (2015). Frequent collaborative quiz taking and conceptual learning. Active Learning in Higher Education, 16(3), 189-204. https://doi.org/10.1177/1469787415589627
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119-144. https://doi.org/10.1007/BF00117714
Saha, S., Dhamecha, T. I., Marvaniya, S., Sindhgatta, R., & Sengupta, B. (2018). Sentence-level or token-level features for automatic short answer grading? Use both. In Lecture Notes in Computer Science (Vol. 10947, pp. 475-486). https://doi.org/10.1007/978-3-319-93843-1_37
Sangapu, I. (2018). Artificial intelligence in education: From a teacher and a student perspective. SSRN. https://doi.org/10.2139/ssrn.3372914
Schneider, J., Richner, R., & Riser, M. (2023). Towards trustworthy autograding of short, multi-lingual, multi-type answers. International Journal of Artificial Intelligence in Education, 33(1), 1-29. https://doi.org/10.1007/s40593-022-00289-z
Senthilnathan, V., Sakthi Vaibhav, M., & Alexander, R. (2025). Semantic refined prompting based automated essay scoring system. In Proceedings of the 2025 International Conference on Electronics and Renewable Systems (ICEARS) (pp. 1344-1348). IEEE. https://doi.org/10.1109/ICEARS64219.2025.10940227
Siddiqi, R., & Harrison, C. (2008). A systematic approach to the automated marking of short-answer questions. In Proceedings of IEEE INMIC 2008: 12th International Multitopic Conference (pp. 281-286). https://doi.org/10.1109/INMIC.2008.4777758
Soulimani, Y. A., El Achaak, L., & Bouhorma, M. (2024). Deep learning-based Arabic short answer grading in serious games. International Journal of Electrical and Computer Engineering, 14(1), 841-853. https://doi.org/10.11591/ijece.v14i1.pp841-853
Souza, F., Nogueira, R., & Lotufo, R. A. (2019). Portuguese named entity recognition using BERT-CRF [Preprint]. arXiv. https://doi.org/10.48550/arXiv.1909.10649
Sung, C., Ma, T., Dhamecha, T. I., Reddy, V., Saha, S., & Arora, R. (2019). Pre-training BERT on domain resources for short answer grading. In Proceedings of EMNLP-IJCNLP 2019 (pp. 6076–6086). https://doi.org/10.18653/v1/D19-1628
van Wynsberghe, A. (2021). Sustainable AI: AI for sustainability and the sustainability of AI. AI and Ethics, 1(3), 213-218. https://doi.org/10.1007/s43681-021-00043-6
Villarroel, V., Bloxham, S., Bruna, D., Bruna, C., & Herrera-Seda, C. (2018). Authentic assessment: Creating a blueprint for course design. Assessment & Evaluation in Higher Education, 43(5), 840-854. https://doi.org/10.1080/02602938.2017.1412396
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (pp. 353-355). https://doi.org/10.18653/v1/W18-5446
Wang, H., & Lehman, J. D. (2021). Using achievement goal-based personalized motivational feedback to enhance online learning. Educational Technology Research and Development, 69(2), 807-836. https://doi.org/10.1007/s11423-021-09940-3
Wang, Y., Wang, C., Li, R., & Lin, H. (2022). On the use of BERT for automated essay scoring: Joint learning of multi-scale essay representation. In Proceedings of NAACL 2022 (pp. 3432-3444). https://doi.org/10.18653/v1/2022.naacl-main.249
Wang, Z., Lan, A. S., Waters, A. E., Grimaldi, P., & Baraniuk, R. G. (2019). A meta-learning augmented bidirectional transformer model for automatic short answer grading. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019).
Winstone, N. E., Nash, R. A., Parker, M., & Rowntree, J. (2017). Supporting learners’ agentic engagement with feedback: A systematic review and a taxonomy of recipience processes. Educational Psychologist, 52(1), 17-37. https://doi.org/10.1080/00461520.2016.1207538
Xavier, C., Rodrigues, L., Costa, N., Neto, R., Alves, G., Falcão, T. P., Gašević, D., & Mello, R. F. (2025). Empowering instructors with AI: Evaluating the impact of an AI-driven feedback tool in learning analytics. IEEE Transactions on Learning Technologies, 18, 498-512. https://doi.org/10.1109/TLT.2025.3562379
Xie, X., & Li, X. (2018). Research on personalized exercises and teaching feedback based on big data. In Proceedings of the ACM International Conference (pp. 217-221). https://doi.org/10.1145/3232116.3232143
Xu, Z., & Zhu, P. (2023). Using BERT-based textual analysis to design a smarter classroom mode for computer teaching in higher education institutions. International Journal of Emerging Technologies in Learning, 18(19), 120-133. https://doi.org/10.3991/ijet.v18i19.42483
Zhang, H., Cai, J., Xu, J., & Wang, J. (2019). Pretraining-based natural language generation for text summarization. In Proceedings of CoNLL 2019 (pp. 789-798). https://doi.org/10.18653/v1/K19-1074
Zhang, H., Yu, P. S., & Zhang, J. (2025). A systematic survey of text summarization: From statistical methods to large language models. ACM Computing Surveys. https://doi.org/10.1145/3731445
Zhang, Z., Zhang, Z., Chen, H., & Zhang, Z. (2019). A joint learning framework with BERT for spoken language understanding. IEEE Access, 7, 168849–168858. https://doi.org/10.1109/ACCESS.2019.2954766
Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., Wang, S., Yin, D., & Du, M. (2024). Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 15(2), Article 26. https://doi.org/10.1145/3639372
Zheng, L., Long, M., Chen, B., & Fan, Y. (2023). Promoting knowledge elaboration, socially shared regulation, and group performance in collaborative learning: An automated assessment and feedback approach based on knowledge graphs. International Journal of Educational Technology in Higher Education, 20(1), 12. https://doi.org/10.1186/s41239-023-00415-4
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2026 David Bañeres Besora, Ana-Elena Guerrerro Roldán, M. Elena Rodríguez González

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
Las obras que se publican en esta revista están sujetos a los siguientes términos:
1. Los autores ceden de forma no exclusiva los derechos de explotación de los trabajos aceptados para su publicación en "RIED. Revista Iberoamericana de Educación a Distancia," y garantizan a la revista el derecho a ser la primera en publicar ese trabajo, igualmente, permiten a la revista distribuir obras publicadas bajo la licencia indicada en el punto 2.
2. Las obras se publican en la edición electrónica de la revista bajo una licencia Creative Commons Reconocimiento 4.0 Internacional (CC BY 4.0). Se permite copiar y redistribuir el material en cualquier medio o formato, adaptar, remezclar, transformar y crear a partir del material para cualquier finalidad, incluso comercial. Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios.
3. Condiciones de auto-archivo. Se permite e incentiva a los autores a difundir electrónicamente la versión OnlineFirst (versión evaluada y aceptada para su publicación) de su obra antes de su publicación definitiva, siempre con referencia a su publicación en RIED, ya que favorece su circulación y difusión antes y así propiciar un posible aumento de su citación y alcance entre la comunidad académica. Color RoMEO: verde.

