Recomendador de evaluación para preguntas cortas utilizando modelos de lenguaje en propiedad intelectual

Autores/as

DOI:

https://doi.org/10.5944/ried.45541

Palabras clave:

evaluación, feedback, Moodle, test, preguntas cortas, inteligencia artificial

Resumen

El uso de la Inteligencia Artificial (IA) en educación está creciendo rápidamente, transformando el proceso de enseñanza-aprendizaje y también el proceso de evaluación. Este trabajo presenta SLASys, una herramienta para recomendar al profesorado la evaluación de preguntas cortas mediante técnicas de IA semántica, difiriendo de otros trabajos basados en IA generativa por el uso del modelo de lenguaje BERT que es más ligero, comprende mejor los conceptos en un contexto específico, mejora la eficiencia computacional y reduce los problemas éticos y de privacidad. SLASys implementa comparación semántica y modelos predictivos de clasificación de respuestas basados en BERT. Se ha seguido una metodología de investigación mixta, combinando investigación de acción con un enfoque de diseño y creación, para desarrollar y perfeccionar SLASys a lo largo de cuatro ediciones de un curso de nivel de máster sobre examen de patentes en el contexto de la propiedad intelectual. SLASys se ha integrado en Moodle, permitiendo su uso por parte de profesorado sin conocimientos técnicos, y ha sido probada por 120 estudiantes. Los resultados muestran su efectividad, tanto en el marco de la experiencia descrita como según la literatura existente, incluso con conjuntos de datos reducidos y un número limitado de participantes, y ha sido valorada positivamente por el profesorado y el estudiantado. Este trabajo contribuye a mostrar la viabilidad del uso de la IA en la educación superior, tanto en entornos híbridos como en línea, ofreciendo una solución para mejorar la evaluación y el feedback en preguntas cortas en contextos reales de aprendizaje.

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

David Bañeres Besora, Universitat Oberta de Catalunya, UOC (España)

Profesor agregado en la UOC. Recibió los títulos de grado en Ingeniería Informática y de doctorado por la Universitat Politècnica de Catalunya. Sus intereses de investigación incluyen sistemas innovadores de aprendizaje en línea, como los sistemas de tutoría inteligente, la evaluación automatizada, las analíticas de aprendizaje y la aplicación de la inteligencia artificial en contextos educativos.

Ana-Elena Guerrerro Roldán, Universitat Autónoma de Barcelona, UAB (España)

Profesora agregada en la Facultad de Educación de la UAB. Licenciada en Pedagogía por la Universidad Ramon Llull y doctora por la Universitat Oberta de Catalunya. Su investigación se centra en la mejora del proceso de enseñanza y aprendizaje mediante el uso de las Tecnologías de la Información y la Comunicación, especialmente en el proceso de evaluación, retroalimentación y acompañamiento del alumnado.

M. Elena Rodríguez González, Universitat Oberta de Catalunya, UOC (España)

Profesora agregada de la UOC. Licenciada en Informática por la Universitat Politècnica de Catalunya y doctora por la Universidad de Alcalá. Su investigación se centra en el aprendizaje apoyado por la tecnología para la mejora del proceso de enseñanza-aprendizaje, la evaluación, la toma de decisiones, y los aspectos de capacitación derivados.

Citas

Abu Khurma, O., Albahti, F., Ali, N., & Bustanji, A. (2024). AI ChatGPT and student engagement: Unraveling dimensions through PRISMA analysis for enhanced learning experiences. Contemporary Educational Technology, 16(2). https://doi.org/10.30935/cedtech/14334

Adhikari, A., Ram, A., Tang, R., & Lin, J. (2019). DocBERT: BERT for document classification. arXiv. https://doi.org/10.48550/arXiv.1904.08398

Aggarwal, D., Sil, P., Raman, B., & Bhattacharyya, P. (2025). “I understand why I got this grade”: Automatic short answer grading with feedback. In Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-031-98420-4_22

Akçapınar, G. (2015). How automated feedback through text mining changes plagiaristic behavior in online assignments. Computers & Education, 87, 123-130. https://doi.org/10.1016/j.compedu.2015.04.007

Almasre, M. (2024). Development and evaluation of a custom GPT for the assessment of students’ designs in a typography course. Education Sciences, 14(2), Article 148. https://doi.org/10.3390/educsci14020148

Arefeen, M. A., Debnath, B., & Chakradhar, S. (2024). LeanContext: Cost-efficient domain-specific question answering using LLMs. Natural Language Processing Journal, 7, 100065. https://doi.org/10.1016/j.nlp.2024.100065

Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural machine translation by jointly learning to align and translate. arXiv. https://doi.org/10.48550/arXiv.1409.0473

Banihashem, S. K., Kerman, N. T., Noroozi, O., Moon, J., & Drachsler, H. (2024). Feedback sources in essay writing: Peer-generated or AI-generated feedback? International Journal of Educational Technology in Higher Education, 21(1), 23. https://doi.org/10.1186/s41239-024-00455-4

Baral, S., Botelho, A. F., Erickson, J. A., Benachamardi, P., & Heffernan, N. T. (2021). Improving automated scoring of student open responses in mathematics. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021).

Bergmann, J., & Sams, A. (2012). Flip your classroom: Reach every student in every class every day. International Society for Technology in Education.

Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60-117. https://doi.org/10.1007/s40593-014-0026-8

Calimeris, L., & Kosack, E. (2020). Immediate feedback assessment technique (IF-AT) quizzes and student performance in microeconomic principles courses. Journal of Economic Education, 51(3–4), 304-319. https://doi.org/10.1080/00220485.2020.1804501

Camus, L., & Filighera, A. (2020). Investigating transformers for automatic short answer grading. In I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. Millán (Eds.), Artificial intelligence in education (Lecture Notes in Computer Science, Vol. 12164, pp. 43-48). Springer. https://doi.org/10.1007/978-3-030-52240-7_8

Dai, Y., Lai, S., Lim, C. P., & Liu, A. (2025). University policies on generative AI in Asia: Promising practices, gaps, and future directions. Journal of Asian Public Policy, 18(2), 260-281. https://doi.org/10.1080/17516234.2024.2379070

del Gobbo, E., Guarino, A., Cafarelli, B., & Grilli, L. (2023). GradeAid: A framework for automatic short answers grading in educational contexts-Design, implementation and evaluation. Knowledge and Information Systems, 65(10), 4479-4507. https://doi.org/10.1007/s10115-023-01892-9

De La Cruz Martínez, G., Eslava-Cervantes, A.-L., & Ramírez, S. (2024, July 1). Analysis of solutions of ChatGPT to logic problems based on critical thinking. In Proceedings of the 16th International Conference on Education and New Learning Technologies (EDULEARN24) (pp. 10324-10331). IATED. https://doi.org/10.21125/edulearn.2024.2525

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4171-4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423

Dhananjaya, G. M., Goudar, R. H., Kulkarni, A. A., Rathod, V. N., & Hukkeri, G. S. (2024). A digital recommendation system for personalized learning to enhance online education: A review. IEEE Access, 12, 33591–33615. https://doi.org/10.1109/ACCESS.2024.3369901

Escalante, J., Pack, A., & Barrett, A. (2023). AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20(1), 40. https://doi.org/10.1186/s41239-023-00425-2

European Commission. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689

Evans, C. (2013). Making sense of assessment feedback in higher education. Review of Educational Research, 83(1), 70–120. https://doi.org/10.3102/0034654312474350

Gaddipati, S. K., Nair, D., & Plöger, P. G. (2020). Comparative evaluation of pretrained transfer learning models on automatic short answer grading. arXiv. https://arxiv.org/abs/2009.01303

Gaddipati, S. K., Plöger, P., Hochgeschwender, N., & Metzler, M. (2021, April 5). Automatic formative assessment for students’ short text answers through feature extraction [Doctoral dissertation, Hochschule Bonn-Rhein-Sieg].

Gaona, J., Reguant, M., Valdivia, I., Vásquez, M., & Sancho-Vinuesa, T. (2018). Feedback by automatic assessment systems used in mathematics homework in the engineering field. Computer Applications in Engineering Education, 26(4), 921-934. https://doi.org/10.1002/cae.21950

García-Peñalvo, F. J., Alier, M., Pereira, J., & Casany, M. J. (2024). Safe, transparent, and ethical artificial intelligence: Keys to quality sustainable education (SDG4). International Journal of Educational Research and Innovation, 22, 1–21. https://doi.org/10.46661/ijeri.11036

González Fernández, M. O., Romero-López, M. A., Sgreccia, N. F., & Latorre Medina, M. J. (2025). Marcos normativos para una IA ética y confiable en la educación superior: Estado de la cuestión. RIED-Revista Iberoamericana de Educación a Distancia, 28(2), 181-208. https://doi.org/10.5944/ried.28.2.43511

Grévisse, C. (2024). LLM-based automatic short answer grading in undergraduate medical education. BMC Medical Education, 24(1), 1060. https://doi.org/10.1186/s12909-024-06026-5

György, A., & Vajda, I. (2007). Intelligent mathematics assessment in eMax. In IEEE AFRICON Conference (pp. 1-7). https://doi.org/10.1109/AFRCON.2007.4401512

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487

Huisman, B., Saab, N., van Driel, J., & van den Broek, P. (2017). Peer feedback on college students’ writing: Exploring the relation between students’ ability match, feedback quality and essay performance. Higher Education Research & Development, 36(7), 1433-1446. https://doi.org/10.1080/07294360.2017.1325854

Husein, R. A., Aburajouh, H., & Catal, C. (2025). Large language models for code completion: A systematic literature review. Computer Standards & Interfaces, 92, 103917. https://doi.org/10.1016/j.csi.2024.103917

Hustad, E., & Arntzen, A. A. B. (2013). Facilitating teaching and learning capabilities in social learning management systems: Challenges, issues, and implications for design. Journal of Integrated Design and Process Science, 17(1), 33-46. https://doi.org/10.3233/JID-2013-0003

Jia, Q., Cui, J., Xi, R., Liu, C., Rashid, P., Li, R., & Gehringer, E. (2024). On assessing the faithfulness of LLM-generated feedback on student assignments. In B. Paaßen & C. D. Epp (Eds.), Proceedings of the 17th International Conference on Educational Data Mining (pp. 491-499). International Educational Data Mining Society. https://doi.org/10.5281/zenodo.12729868

Kim, T. W. (2023). Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: A narrative review. Journal of Educational Evaluation for Health Professions, 20, 38. https://doi.org/10.3352/jeehp.2023.20.38

Klein, R., Kyrilov, A., & Tokman, M. (2011). Automated assessment of short free-text responses in computer science using latent semantic analysis. In Proceedings of ITiCSE ’11: The 16th Annual Conference on Innovation and Technology in Computer Science Education (pp. 158-162). https://doi.org/10.1145/1999747.1999793

Kuechler, W., & Vaishnavi, V. (2012). A framework for theory development in design science research: Multiple perspectives. Journal of the Association for Information Systems, 13(6), 395-423. https://doi.org/10.17705/1JAIS.00300

Li, T. W., Hsu, S., Fowler, M., Zhang, Z., Zilles, C., & Karahalios, K. (2023). Am I wrong, or is the autograder wrong? Effects of AI grading mistakes on learning. In Proceedings of the 2023 ACM Conference on International Computing Education Research (ICER ’23) (pp. 85-97). https://doi.org/10.1145/3568813.3600124

Liu, T., Ding, W., Wang, Z., Tang, J., Huang, G. Y., & Liu, Z. (2019). Automatic short answer grading via multiway attention networks. In Lecture Notes in Computer Science (Vol. 11626, pp. 376–388). https://doi.org/10.1007/978-3-030-23207-8_32

Lun, J., Zhu, J., Tang, Y., & Yang, M. (2020). Multiple data augmentation strategies for improving performance on automatic short answer scoring. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, pp. 13381-13388). https://doi.org/10.1609/aaai.v34i09.7062

Mehrafarin, H., Rajaee, S., & Pilehvar, M. T. (2022). On the importance of data size in probing fine-tuned models. In Findings of the Association for Computational Linguistics (pp. 239–248). https://doi.org/10.18653/v1/2022.findings-acl.20

Messer, M., Brown, N. C. C., Kölling, M., & Shi, M. (2024). Automated grading and feedback tools for programming education: A systematic review. ACM Transactions on Computing Education, 24(1), Article 1. https://doi.org/10.1145/3636515

Metzler, T., Plöger, P. G., & Hees, J. (2024). Computer-assisted short answer grading using large language models and rubrics. In INFORMATIK 2024: AI@WORK (pp. 1383-1393). Gesellschaft für Informatik e.V. https://doi.org/10.18420/inf2024_121

Nguyen, H., Bhat, S., Moore, S., Bier, N., & Stamper, J. (2022). Towards generalized methods for automatic question generation in educational domains. In S. Isotani, E. Millán, A. Ogan, P. Hastings, B. McLaren, & R. Luckin (Eds.), Intelligent Tutoring Systems. ITS 2022. Lecture Notes in Computer Science (Vol. 13450, pp. 272-284). Springer. https://doi.org/10.1007/978-3-031-16290-9_20

Nicol, D., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199-218. https://doi.org/10.1080/03075070600572090

Novak, G. M. (2012). Just-in-time teaching. New Directions for Teaching and Learning, 2012(128), 63-73. https://doi.org/10.1002/tl.469

Oates, B. J. (2006). Researching information systems and computing. SAGE Publications Ltd.

OpenAI. (2024). GPT-4o system card. https://openai.com/index/gpt-4o-system-card/

Padó, U., Eryilmaz, Y., & Kirschner, L. (2024). Short-answer grading for German: Addressing the challenges. International Journal of Artificial Intelligence in Education, 34(4), 1488-1510. https://doi.org/10.1007/s40593-023-00383-w

Pang, J., Ye, F., Wong, D. F., Yu, D., Shi, S., Tu, Z., & Wang, L. (2025). Salute the classic: Revisiting challenges of machine translation in the age of large language models. Transactions of the Association for Computational Linguistics, 13, 73-95. https://doi.org/10.1162/tacl_a_00730

Petridou, E., & Lao, L. (2024). Identifying challenges and best practices for implementing AI additional qualifications in vocational and continuing education: A mixed methods analysis. International Journal of Lifelong Education, 43(4), 385-400. https://doi.org/10.1080/02601370.2024.2351076

Qiu, Y., & Jin, Y. (2024). ChatGPT and finetuned BERT: A comparative study for developing intelligent design support systems. Intelligent Systems with Applications, 21, 200308. https://doi.org/10.1016/j.iswa.2023.200308

Rezaei, A. R. (2015). Frequent collaborative quiz taking and conceptual learning. Active Learning in Higher Education, 16(3), 189-204. https://doi.org/10.1177/1469787415589627

Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119-144. https://doi.org/10.1007/BF00117714

Saha, S., Dhamecha, T. I., Marvaniya, S., Sindhgatta, R., & Sengupta, B. (2018). Sentence-level or token-level features for automatic short answer grading? Use both. In Lecture Notes in Computer Science (Vol. 10947, pp. 475-486). https://doi.org/10.1007/978-3-319-93843-1_37

Sangapu, I. (2018). Artificial intelligence in education: From a teacher and a student perspective. SSRN. https://doi.org/10.2139/ssrn.3372914

Schneider, J., Richner, R., & Riser, M. (2023). Towards trustworthy autograding of short, multi-lingual, multi-type answers. International Journal of Artificial Intelligence in Education, 33(1), 1-29. https://doi.org/10.1007/s40593-022-00289-z

Senthilnathan, V., Sakthi Vaibhav, M., & Alexander, R. (2025). Semantic refined prompting based automated essay scoring system. In Proceedings of the 2025 International Conference on Electronics and Renewable Systems (ICEARS) (pp. 1344-1348). IEEE. https://doi.org/10.1109/ICEARS64219.2025.10940227

Siddiqi, R., & Harrison, C. (2008). A systematic approach to the automated marking of short-answer questions. In Proceedings of IEEE INMIC 2008: 12th International Multitopic Conference (pp. 281-286). https://doi.org/10.1109/INMIC.2008.4777758

Soulimani, Y. A., El Achaak, L., & Bouhorma, M. (2024). Deep learning-based Arabic short answer grading in serious games. International Journal of Electrical and Computer Engineering, 14(1), 841-853. https://doi.org/10.11591/ijece.v14i1.pp841-853

Souza, F., Nogueira, R., & Lotufo, R. A. (2019). Portuguese named entity recognition using BERT-CRF [Preprint]. arXiv. https://doi.org/10.48550/arXiv.1909.10649

Sung, C., Ma, T., Dhamecha, T. I., Reddy, V., Saha, S., & Arora, R. (2019). Pre-training BERT on domain resources for short answer grading. In Proceedings of EMNLP-IJCNLP 2019 (pp. 6076–6086). https://doi.org/10.18653/v1/D19-1628

van Wynsberghe, A. (2021). Sustainable AI: AI for sustainability and the sustainability of AI. AI and Ethics, 1(3), 213-218. https://doi.org/10.1007/s43681-021-00043-6

Villarroel, V., Bloxham, S., Bruna, D., Bruna, C., & Herrera-Seda, C. (2018). Authentic assessment: Creating a blueprint for course design. Assessment & Evaluation in Higher Education, 43(5), 840-854. https://doi.org/10.1080/02602938.2017.1412396

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (pp. 353-355). https://doi.org/10.18653/v1/W18-5446

Wang, H., & Lehman, J. D. (2021). Using achievement goal-based personalized motivational feedback to enhance online learning. Educational Technology Research and Development, 69(2), 807-836. https://doi.org/10.1007/s11423-021-09940-3

Wang, Y., Wang, C., Li, R., & Lin, H. (2022). On the use of BERT for automated essay scoring: Joint learning of multi-scale essay representation. In Proceedings of NAACL 2022 (pp. 3432-3444). https://doi.org/10.18653/v1/2022.naacl-main.249

Wang, Z., Lan, A. S., Waters, A. E., Grimaldi, P., & Baraniuk, R. G. (2019). A meta-learning augmented bidirectional transformer model for automatic short answer grading. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019).

Winstone, N. E., Nash, R. A., Parker, M., & Rowntree, J. (2017). Supporting learners’ agentic engagement with feedback: A systematic review and a taxonomy of recipience processes. Educational Psychologist, 52(1), 17-37. https://doi.org/10.1080/00461520.2016.1207538

Xavier, C., Rodrigues, L., Costa, N., Neto, R., Alves, G., Falcão, T. P., Gašević, D., & Mello, R. F. (2025). Empowering instructors with AI: Evaluating the impact of an AI-driven feedback tool in learning analytics. IEEE Transactions on Learning Technologies, 18, 498-512. https://doi.org/10.1109/TLT.2025.3562379

Xie, X., & Li, X. (2018). Research on personalized exercises and teaching feedback based on big data. In Proceedings of the ACM International Conference (pp. 217-221). https://doi.org/10.1145/3232116.3232143

Xu, Z., & Zhu, P. (2023). Using BERT-based textual analysis to design a smarter classroom mode for computer teaching in higher education institutions. International Journal of Emerging Technologies in Learning, 18(19), 120-133. https://doi.org/10.3991/ijet.v18i19.42483

Zhang, H., Cai, J., Xu, J., & Wang, J. (2019). Pretraining-based natural language generation for text summarization. In Proceedings of CoNLL 2019 (pp. 789-798). https://doi.org/10.18653/v1/K19-1074

Zhang, H., Yu, P. S., & Zhang, J. (2025). A systematic survey of text summarization: From statistical methods to large language models. ACM Computing Surveys. https://doi.org/10.1145/3731445

Zhang, Z., Zhang, Z., Chen, H., & Zhang, Z. (2019). A joint learning framework with BERT for spoken language understanding. IEEE Access, 7, 168849–168858. https://doi.org/10.1109/ACCESS.2019.2954766

Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., Wang, S., Yin, D., & Du, M. (2024). Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 15(2), Article 26. https://doi.org/10.1145/3639372

Zheng, L., Long, M., Chen, B., & Fan, Y. (2023). Promoting knowledge elaboration, socially shared regulation, and group performance in collaborative learning: An automated assessment and feedback approach based on knowledge graphs. International Journal of Educational Technology in Higher Education, 20(1), 12. https://doi.org/10.1186/s41239-023-00415-4

Publicado

2026-01-02

Cómo citar

Bañeres Besora, D., Guerrerro Roldán, A.-E., & Rodríguez González, M. E. (2026). Recomendador de evaluación para preguntas cortas utilizando modelos de lenguaje en propiedad intelectual. RIED-Revista Iberoamericana de Educación a Distancia, 29(1), 321–352. https://doi.org/10.5944/ried.45541

Artículos similares

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 > >> 

También puede Iniciar una búsqueda de similitud avanzada para este artículo.