Retroalimentación de aprendizajes con inteligencia artificial generativa en estudiantes universitarios

María Verónica Leiva-Guerrero; Ignacio Araya Zamorano; Rafael Escobar Collins; Francisca Silva Castro

doi:10.5944/ried.45547

Authors

María Verónica Leiva-Guerrero Pontificia Universidad Católica de Valparaíso, PUCV (Chile) https://orcid.org/0000-0002-7641-0087
Ignacio Araya Zamorano Pontificia Universidad Católica de Valparaíso, PUCV (Chile) https://orcid.org/0000-0001-5882-6217
Rafael Escobar Collins Pontificia Universidad Católica de Valparaíso, PUCV (Chile) https://orcid.org/0000-0002-3429-5017
Francisca Silva Castro Pontificia Universidad Católica de Valparaíso, PUCV (Chile) https://orcid.org/0009-0000-7510-4806

DOI:

https://doi.org/10.5944/ried.45547

Keywords:

feedback, formative evaluation, generative artificial intelligence, ChatGPT, Ladder of Feedback, university students

Abstract

Assessment for learning has become increasingly important in university teaching, particularly regarding the feedback process. However, there is still a perception of student dissatisfaction with the quality of feedback provided by faculty, highlighting the need to innovate in feedback strategies. This study aimed to explore the pedagogical and technological relevance of integrating Wilson's Feedback Ladder with generative artificial intelligence, specifically GPT-4o, to strengthen formative feedback in university students. The study was conducted using a qualitative and exploratory approach in two phases. First, a prompt was designed and validated using the Delphi method with the participation of eight experts in assessment and artificial intelligence, applying it to seven state-of-the-art language models. In the second phase, the validated prompt was implemented in two university courses of different nature, Assessment for Learning and Data Structures, integrating automatic feedback into the Moodle platform. The results showed that the experts agreed on the suitability of AI-mediated Wilson’s Ladder and highlighted the superior performance of GPT-4o. At the classroom level, students valued the clarity, usefulness, and immediacy of the feedback, although they identified limitations in the tool's lack of contextualization and impersonal tone. It is concluded that the integration of Wilson’s Ladder with generative artificial intelligence represents a promising innovation, but one that requires disciplinary adjustments, teacher supervision, and careful attention to the human dimension of feedback in e-learning contexts.

Downloads

Download data is not yet available.

Author Biographies

María Verónica Leiva-Guerrero, Pontificia Universidad Católica de Valparaíso, PUCV (Chile)

Doctor in Didactics in Educational Sciences. Full Professor at the Pontifical Catholic University of Valparaíso (PUCV). Her research addresses educational policies, with an emphasis on management, school leadership, and evaluation, areas in which she has developed relevant projects and publications.

Ignacio Araya Zamorano, Pontificia Universidad Católica de Valparaíso, PUCV (Chile)

Doctor in Computer Science and Assistant Professor at the Pontifical Catholic University of Valparaíso (PUCV). His research focuses on artificial intelligence, particularly on solving optimization problems, the use of heuristics, and machine learning techniques.

Rafael Escobar Collins, Pontificia Universidad Católica de Valparaíso, PUCV (Chile)

History Teacher, Master in Evaluation, and Master in Educational Innovation. Specialist in digital technologies for learning, instructional design, teacher training, and the development of digital competencies in higher education.

Francisca Silva Castro, Pontificia Universidad Católica de Valparaíso, PUCV (Chile)

Computer Civil Engineering student at the Pontifical Catholic University of Valparaíso (PUCV). She has served as a teaching assistant in Data Structures and in Algorithm Analysis and Design. She is interested in user experience (UX) design, as well as in the study of statistics, artificial intelligence, and optimization.

References

Al-Azawei, A., Abdullah, A. A., Mohammed, M. K., & Abod, Z. A. (2023). Predicting online learning success based on learners’ perceptions: The integration of the information system success model and the security triangle framework. International Review of Research in Open and Distributed Learning, 24(2), 7295. https://doi.org/10.19173/irrodl.v24i2.6895

Andrade, H. L., & Brookhart, S. M. (2019). Classroom assessment as the co-regulation of learning. Assessment in Education: Principles, Policy & Practice, 26(1), 103-117. https://doi.org/10.1080/0969594X.2019.1571992

Anthropic. (2024). Claude 3.7 [Large language model]. https://www.anthropic.com

Ayeni, O. O., Al Hamad, N. M., Chisom, O. N., Osawaru, B., & Adewusi, O. E. (2024). AI in education: A review of personalized learning and educational technology. GSC Advanced Research and Reviews, 18(2), 261-271. https://doi.org/10.30574/gscarr.2024.18.2.0062

Baral, S., Worden, E., Lim, W.-C., Luo, Z., Santorelli, C., Gurung, A., & Heffernan, N. (2024). Automated feedback in math education: A comparative analysis of LLMs for open-ended responses. arXiv. https://doi.org/10.48550/arXiv.2411.08910

Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551-575. https://doi.org/10.1080/0969594X.2018.1441807

Bonales-Daimiel, G., Martínez-Estrella, E. C., & Sierra-Sánchez, J. (2025). Evolución del perfil docente y surgimiento de nuevos roles profesionales en la era de la inteligencia artificial (IA). Pixel-Bit. Revista de Medios y Educación, 73, Article 3. https://doi.org/10.12795/pixelbit.109085

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd ed.). University of Chicago Press. https://doi.org/10.7208/chicago/9780226062648.001.0001

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165

Carless, D., & Boud, D. (2018). The development of student feedback literacy: Enabling uptake of feedback. Assessment & Evaluation in Higher Education, 43(8), 1315-1325. https://doi.org/10.1080/02602938.2018.1463354

Carless, D., & Winstone, N. (2023). Teacher feedback literacy and its interplay with student feedback literacy. Teaching in Higher Education, 28(1), 150-163. https://doi.org/10.1080/13562517.2020.1782372

Dai, W., Lin, J., Jin, F., Li, T., Tsai, Y. S., Gašević, D., & Chen, G. (2023). Can large language models provide feedback to students? A case study on ChatGPT. EdArXiv. https://doi.org/10.35542/osf.io/hcgzj

DeepSeek. (2024). DeepSeek R1 [Generative AI model]. https://www.deepseek.com/

European Commission. (2023). Teaching with AI – Assessment, feedback and personalisation. Briefing report No. 7 (European Digital Education Hub). Erasmus+ Programme. https://resitve.sio.si/wp-content/uploads/sites/7/2023/11/AI-squad-output_briefing-report-7.pdf

Galindo-Domínguez, H., Delgado, N., Losada, D., & Etxabe, J. M. (2023). An analysis of the use of artificial intelligence in education in Spain: The in-service teacher’s perspective. Journal of Digital Learning in Teacher Education, 40(1), 41-56. https://doi.org/10.1080/21532974.2023.2284726

García-Peñalvo, F. J. (2023). La percepción de la inteligencia artificial en contextos educativos tras el lanzamiento de ChatGPT: ¿Disrupción o pánico? Education in the Knowledge Society (EKS), 24, e31279-e31279. https://doi.org/10.14201/eks.31279

García Peñalvo, F. J., Llorens-Largo, F., & Vidal, J. (2024). La nueva realidad de la educación ante los avances de la inteligencia artificial generativa. RIED-Revista Iberoamericana de Educación a Distancia, 27(1), 9-39. https://doi.org/10.5944/ried.27.1.37716

Goodrich, H. (2011). Ladder of Feedback [Adaptation based on Ron Berger’s Ladder of Feedback]. In A. Goodrich (Ed.), Protocols in the classroom. Harvard Project Zero.

Google DeepMind. (2024). Gemini 2.5 [Multimodal AI model]. https://deepmind.google/

Hao, Y., Sun, Y., Dong, L., Han, Z., Gu, Y., & Wei, F. (2022). Structured prompting: Scaling in-context learning to 1,000 examples. arXiv. https://doi.org/10.48550/arXiv.2212.06713

Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial intelligence in education: Promises and implications for teaching and learning. Center for Curriculum Redesign.

Jauhiainen, J. S., & Garagorry Guerra, A. (2024). Generative AI in education: ChatGPT-4 in evaluating students’ written responses. Innovations in Education and Teaching International, 1-18. https://doi.org/10.1080/14703297.2024.2422337

Landeta, J. (1999). El método Delphi: Una técnica de previsión para la incertidumbre. Ariel.

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35. https://doi.org/10.1145/3560815

LLaMa. (2024). LLaMA4 [Large language model]. https://www.llama.com/

López Regalado, O., Núñez-Rojas, N., López Gil, O. R., & Sánchez-Rodríguez, J. (2024). El análisis del uso de la inteligencia artificial en la educación universitaria: Una revisión sistemática. Pixel-Bit. Revista de Medios y Educación, 70, 97-122. https://doi.org/10.12795/pixelbit.106336

Luckin, R., & Holmes, W. (2016). Intelligence unleashed: An argument for AI in education. Pearson. https://www.pearson.com/content/dam/one-dot-com/one-dot-com/global/Files/about-pearson/innovation/open-ideas/IntelligenceUnleashedSPANISH.pdf

Mayring, P. (2000). Qualitative content analysis. Forum: Qualitative Social Research, 1(2), 1-10. https://doi.org/10.17169/fqs-1.2.1089

Ministry of Education of Chile. (2019). Orientaciones para la implementación del Decreto 67/2018 de evaluación, calificación y promoción. MINEDUC. https://bibliotecadigital.mineduc.cl/bitstream/handle/20.500.12365/14279/orientaciones%20decreto%2067.pdf

Mistral AI. (2024). Mistral [Large language model]. https://mistral.ai/

Molloy, E., Boud, D., & Henderson, M. (2020). Developing a learning-centred framework for feedback literacy. Assessment & Evaluation in Higher Education, 45(4), 527-540. https://doi.org/10.1080/02602938.2019.1667955

OpenAI. (2023). Best practices for prompt engineering with the OpenAI API. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api

OpenAI. (2024). GPT-4o, GPT-4.5 model cards. https://openai.com

Ossa, C., & Willatt, C. (2023). Uso de inteligencia artificial generativa para retroalimentar escritura académica en procesos de formación inicial docente. European Journal of Education and Psychology, 16(2), 1-16. https://doi.org/10.32457/ejep.v16i2.2412

Puertas, E., & Cano, E. (2024). ¿Puede la inteligencia artificial proporcionar un feedback más sostenible? Digital Education Review, 45(1), 50-58.

Quezada, S., & Salinas, C. (2021). Modelo de retroalimentación para el aprendizaje: Una propuesta basada en la revisión de literatura. Revista Mexicana de Investigación Educativa, 26(88), 225-251.

Reynolds, L., & McDonell, K. (2021, May). Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-7). https://doi.org/10.1145/3411763.3451760

Romero Alonso, R., Araya Carvajal, K., & Reyes Acevedo, N. (2025). Rol de la inteligencia artificial en la personalización de la educación a distancia: Una revisión sistemática. RIED-Revista Iberoamericana de Educación a Distancia, 28(1), 9-36. https://doi.org/10.5944/ried.28.1.41538

Sahoo, S. S., Plasek, J. M., Xu, H., Uzuner, Ö., Cohen, T., Yetisgen, M., & Wang, Y. (2024). Large language models for biomedicine: Foundations, opportunities, challenges, and best practices. Journal of the American Medical Informatics Association, 31(9), 2114-2124. https://doi.org/10.1093/jamia/ocae074

Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., Li, Y., Gupta, A., Han, H., Schulhoff, S., Dulepet, P. S., Vidyadhara, S., Ki, D., Agrawal, S., Pham, C., Kroiz, G., Li, F., Tao, H., Srivastava, A., ... Resnik, P. (2025). The prompt report: A systematic survey of prompt engineering techniques. arXiv. https://doi.org/10.48550/arXiv.2406.06608

Shute, V. J., & Rahimi, S. (2017). Review of computer-based assessment for learning in elementary and secondary education. Journal of Computer Assisted Learning, 33(1), 1-19. https://doi.org/10.1111/jcal.12172

Steurer, J. (2011). The Delphi method: An efficient procedure to generate knowledge. Skeletal Radiology, 40, 959-961. https://doi.org/10.1007/s00256-011-1145-z

Valenzuela Caico, R., & Pérez Carvajal, A. (2025). Inteligencia artificial en educación superior: ¿Un reemplazo para los profesores o una herramienta de apoyo? Revista Iberoamericana de Investigación en Educación, (9). https://doi.org/10.58663/riied.vi9.221

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2023). Chain-of-thought prompting elicits reasoning in large language models. arXiv. https://doi.org/10.48550/arXiv.2201.11903

Weng, X., Xia, Q., Gu, M., Rajaram, K., & Chiu, T. K. (2024). Assessment and learning outcomes for generative AI in higher education: A scoping review on current research status and trends. Australasian Journal of Educational Technology, 40(6), 37-55. https://doi.org/10.14742/ajet.9540

Wiliam, D. (2011). What is assessment for learning? Studies in Educational Evaluation, 37(1), 3-14. https://doi.org/10.1016/j.stueduc.2011.03.001

Wilson, D. (2013). Ladder of feedback [Working paper]. Project Zero, Harvard Graduate School of Education. https://pz.harvard.edu/resources/ladder-of-feedback

Winstone, N. E., Boud, D., Dawson, P., & Heron, M. (2022). From feedback-as-information to feedback-as-process: A linguistic analysis of the feedback literature. Assessment & Evaluation in Higher Education, 47(2), 213-230. https://doi.org/10.1080/02602938.2021.1902467

xAI. (2024). Grok 3 [Large language model]. https://grok.com/

Zhang, Z., Dong, Z., Shi, Y., Price, T., Matsuda, N., & Xu, D. (2024). Students’ perceptions and preferences of generative artificial intelligence feedback for programming. In Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23250–23258. https://doi.org/10.1609/aaai.v38i21.30372

Feedback on learning with generative artificial intelligence in university students

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

María Verónica Leiva-Guerrero, Pontificia Universidad Católica de Valparaíso, PUCV (Chile)

Ignacio Araya Zamorano, Pontificia Universidad Católica de Valparaíso, PUCV (Chile)

Rafael Escobar Collins, Pontificia Universidad Católica de Valparaíso, PUCV (Chile)

Francisca Silva Castro, Pontificia Universidad Católica de Valparaíso, PUCV (Chile)

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

metrics

Language

Information

Make a Submission

multilingual

Sindication

issn

Portal revistas UNED