La IA como instrumento de evaluación: Examen de soluciones manuscritas en el contexto de representaciones matemáticas
DOI:
https://doi.org/10.5944/ried.47084Palabras clave:
evaluación basada en IA, reconocimiento de escritura manuscrita, puntuación automatizada, representaciones matemáticas, IA generativa, educación matemáticaResumen
La rápida integración de la inteligencia artificial generativa (GenAI) en la educación requiere una reevaluación de las prácticas de evaluación, especialmente en el caso de tareas complejas como las soluciones matemáticas escritas a mano que implican múltiples representaciones. Este estudio evalúa el rendimiento de una herramienta basada en la inteligencia artificial (IA) (ExEvAI) en la evaluación de respuestas escritas a mano que incluyen representaciones simbólicas, tabulares y gráficas. Empleando un diseño secuencial explicativo de métodos mixtos, se recopilaron datos de 89 estudiantes universitarios de matemáticas y se evaluaron tanto por la IA como por expertos humanos. Los análisis cuantitativos revelaron una fuerte correlación positiva entre las puntuaciones de la IA y las de los humanos, lo que indica una alta consistencia en la clasificación de los estudiantes. Sin embargo, la IA demostró un significativo «sesgo de puntuación positiva», calificando con mayor indulgencia que los expertos en preguntas simbólicas y tabulares, mientras que no se encontraron diferencias significativas en las representaciones gráficas. Cualitativamente, la IA mostró una estricta fidelidad a las instrucciones, proporcionando comentarios transparentes y diagnósticos y aplicando eficazmente la puntuación parcial. Estos resultados sugieren que la IA puede servir como un valioso “coevaluador” para reducir la carga de trabajo del profesor y proporcionar comentarios detallados. Sin embargo, debido a los sesgos de puntuación observados, la presencia del factor humano en el proceso de evaluación sigue siendo importante para garantizar la fiabilidad. El estudio recomienda modelos híbridos en los que la IA apoye, sin reemplazar, el juicio pedagógico docente.
Descargas
Citas
Alshehri, A. (2025). AI's effectiveness in language testing and feedback provision. Social Sciences & Humanities Open, 12, Article 101892. https://doi.org/10.1016/j.ssaho.2025.101892
Baltà-Salvador, R., Brasó-Vives, E., & Peña, M. (2026). Evaluating AI-assisted creative ideation: A crossover study in higher education. Thinking Skills and Creativity, 59, Article 101958. https://doi.org/10.1016/j.tsc.2025.101958
Belda-Medina, J., & Kokošková, V. (2023). Integrating chatbots in education: Insights from the Chatbot-Human Interaction Satisfaction Model (CHISM). International Journal of Educational Technology in Higher Education, 20, Article 62. https://doi.org/10.1186/s41239-023-00432-3
Ben Zion, Y., Yakov, S., Abramovitch, E., Balter, G., & Davidovitch, N. (2025). AI-based teaching evaluations: How well do they reflect student perceptions? Computers and Education: Artificial Intelligence, 9, Article 100448. https://doi.org/10.1016/j.caeai.2025.100448
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa
Caro, M. F., Flórez, E. P., & Muñoz, I. C. (2026). A formal model for assessing the learning outcomes of academic programs. Evaluation and Program Planning, 114, Article 102644. https://doi.org/10.1016/j.evalprogplan.2025.102644
Chapelle, C. A. (2025). Generative AI as game changer: Implications for language education. System, 132, Article 103672. https://doi.org/10.1016/j.system.2025.103672
Chauhan, A., Khaliq, F., & Nayak, K. R. (2025). Assessing Quality of scenario-based multiple-choice questions in physiology: Faculty-generated vs. ChatGPT-generated questions among phase I medical students. International Journal of Artificial Intelligence in Education, 35, 2315-2344. https://doi.org/10.1007/s40593-025-00471-z
Choiriyah, S., Ramadhan, S., Nugroho, A., Pembangunan, H. R. P., & Muharom, F. (2025). Artificial intelligence-driven learning assessment in faculties of education: An exploratory study. Munaddhomah: Jurnal Manajemen Pendidikan Islam, 6(3), 482-495. https://doi.org/10.31538/munaddhomah.v6i3.1937
Ciampa, K., Wolfe, Z., & Hensley, M. (2025). From entry to transformation: Exploring AI integration in teachers' K-12 assessment practices. Technology, Pedagogy and Education, 34(2), 141-160. https://doi.org/10.1080/1475939X.2024.2413378
Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches (5th ed.). SAGE Publications.
Dosumu, O., Porumb, V.-A., Stafford, A., & Zimmer, A. (2025). In the wake of ChatGPT: Early reflections on marking open-book online accounting assessments. Accounting Education, 1-32. https://doi.org/10.1080/09639284.2025.2487487
Dreher, A., & Kuntze, S. (2015). Teachers' professional knowledge and noticing: The case of multiple representations in the mathematics classroom. Educational Studies in Mathematics, 88(1), 89-114. https://doi.org/10.1007/s10649-014-9577-8
Duval, R. (1999). Representation, vision and visualization: Cognitive functions in mathematical thinking. Basic issues for learning. In F. Hitt & M. Santos (Eds.), Proceedings of the 21st PME (pp. 3-26).
Duval, R. (2006). A cognitive analysis of problems of comprehension in a learning of mathematics. Educational Studies in Mathematics, 61(1), 103-131. https://doi.org/10.1007/s10649-006-0400-z
Duval, R. (2017). Understanding the mathematical way of thinking: The registers of semiotic representations (T. M. M. Campos, Ed.). Springer. https://doi.org/10.1007/978-3-319-56910-9
Evangelista, E. D. L. (2025). Ensuring academic integrity in the age of ChatGPT: Rethinking exam design, assessment strategies, and ethical AI policies in higher education. Contemporary Educational Technology, 17(1), Article ep559. https://doi.org/10.30935/cedtech/15775
Garcia-Beltran, E., & Vilchez Tornero, J. L. (2025). Dissertation on the adequacy of the CRETA-R methodology as a future teaching resource and analysis of the relevance of the evaluation conducted by ChatGPT and professors: Can ChatGPT facilitate teaching? Innovations in Education and Teaching International, 1-16. https://doi.org/10.1080/14703297.2025.2533385
Goldin, G. A. (2014). Mathematical representations. In S. Lerman (Ed.), Encyclopedia of mathematics education (pp. 409-413). Springer Netherlands. https://doi.org/10.1007/978-94-007-4978-8_103
Goldin, G. A., & Kaput, J. J. (1996). A joint perspective on the idea of representation in learning and doing mathematics. In L. Steffe, P. Nesher, P. Cobb, G. Goldin, & B. Greer (Eds.), Theories of mathematical learning (pp. 397-430). Lawrence Erlbaum.
Gundu, T. (2024). Strategies for e-assessments in the era of generative artificial intelligence. Electronic Journal of E-Learning, 22(7), 40-50. https://doi.org/10.34190/ejel.22.7.3477
Haudek, K. C., & Zhai, X. (2024). Examining the effect of assessment construct characteristics on machine learning scoring of scientific argumentation. International Journal of Artificial Intelligence in Education, 34(4), 1482-1509. https://doi.org/10.1007/s40593-023-00385-8
Heil, J., Ifenthaler, D., Cooper, M., Mascia, M. L., Conti, R., & Penna, M. P. (2025). Students' perceived impact of GenAI tools on learning and assessment in higher education: The role of individual AI competence. Smart Learning Environments, 12, Article 37. https://doi.org/10.1186/s40561-025-00395-0
Ho, A. D. (2024). Artificial intelligence and educational measurement: Opportunities and threats. Journal of Educational and Behavioral Statistics, 49(5), 715-722. https://doi.org/10.3102/10769986241248771
Iori, M. (2017). Objects, signs, and representations in the semio-cognitive analysis of the processes involved in teaching and learning mathematics: A Duvalian perspective. Educational Studies in Mathematics, 94(3), 275-291. https://doi.org/10.1007/s10649-016-9726-3
Karaca, H. (2025). Investigating the effect of artificial intelligence based tools on pre-service mathematics teachers' concept images. E-Kafkas Journal of Educational Research, 12(1), 213-231. https://doi.org/10.30900/kafkasegt.1534014
Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163. https://doi.org/10.1016/j.jcm.2016.02.012
Lee, S. C., Baby, T., Vongvit, R., Lee, J., Kim, Y. W., Cha, M. C., & Yoon, S. H. (2026). Development and validation of Generative AI Competence Scale (GenAIComp) among university students. Technology in Society, 84, Article 103059. https://doi.org/10.1016/j.techsoc.2025.103059
Lesh, R., Post, T., & Behr, M. (1987). Representations and translations among representations in mathematics learning and problem solving. In C. Janvier (Ed.), Problems of representations in the teaching and learning of mathematics (pp. 33-40). Lawrence Erlbaum.
Liebenow, L. W., Schmidt, F. T. C., Meyer, J., & Fleckenstein, J. (2025). Self-assessment accuracy in the age of artificial intelligence: Differential effects of LLM-generated feedback. Computers & Education, 237, Article 105385. https://doi.org/10.1016/j.compedu.2025.105385
Morris, W., Holmes, L., Choi, J. S., & Crossley, S. (2025). Automated scoring of constructed response items in math assessment using large language models. International Journal of Artificial Intelligence in Education, 35(2), 559-586. https://doi.org/10.1007/s40593-024-00418-w
National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics.
Pack, A., Barrett, A., & Escalante, J. (2024). Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability. Computers and Education: Artificial Intelligence, 6, Article 100234. https://doi.org/10.1016/j.caeai.2024.100234
Paivio, A. (1990). Dual coding theory. In A. Paivio (Ed.), Mental representations (pp. 53-83). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195066661.003.0004
Peters, M., & Angelov, D. (2025). Redefining assessment tasks to promote students' creativity and integrity in the age of generative artificial intelligence. International Journal for Educational Integrity, 21, Article 25. https://doi.org/10.1007/s40979-025-00201-x
Richardson, M., & Clesham, R. (2021). Rise of the machines? The evolving role of AI technologies in high-stakes assessment. London Review of Education, 19(1), 1-13. https://doi.org/10.14324/LRE.19.1.09
Stanoyevitch, A. (2024). Online assessment in the age of artificial intelligence. Discover Education, 3, Article 126. https://doi.org/10.1007/s44217-024-00212-9
Sureda, P., Corica, A., Parra, V., Godoy, D., & Schiaffino, S. (2024). La evaluación en educación matemática: Aportes de chatbots y futuros profesores de matemática. Edutec, Revista Electrónica de Tecnología Educativa, 89, 64-83. https://doi.org/10.21556/edutec.2024.89.3243
Ul Haq, I., Pifarré, M., & Fraca, E. (2024). Novelty evaluation using sentence embedding models in open-ended cocreative problem-solving. International Journal of Artificial Intelligence in Education, 34(4), 1599-1626. https://doi.org/10.1007/s40593-024-00392-3
Wiyaka, W., Silitonga, L. M., Sunardi, S., & Pramudi, Y. T. C. (2024). From nervous to fluent: The impact of AI chatbot-assisted assessment on english reading anxiety and performance in Indonesia. Theory and Practice in Language Studies, 14(12), 3851-3860. https://doi.org/10.17507/tpls.1412.20
Yildirim-Erbasli, S., Bulut, O., Epp, C. D., & Cui, Y. (2025). Advancing higher education students' assessment experiences with conversational agents. Educational Technology Research and Development, 73(3), 1811-1834. https://doi.org/10.1007/s11423-025-10472-3
Yusuf, H., Money, A., & Daylamani-Zad, D. (2025). Towards reducing teacher burden in performance-based assessments using aivaluate: An emotionally intelligent LLM-Augmented pedagogical AI conversational agent. Education and Information Technologies, 30(17), 24649-24693. https://doi.org/10.1007/s10639-025-13755-7
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2026 Hilmi KARACA

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
Las obras que se publican en esta revista están sujetos a los siguientes términos:
1. Los autores ceden de forma no exclusiva los derechos de explotación de los trabajos aceptados para su publicación en "RIED. Revista Iberoamericana de Educación a Distancia," y garantizan a la revista el derecho a ser la primera en publicar ese trabajo, igualmente, permiten a la revista distribuir obras publicadas bajo la licencia indicada en el punto 2.
2. Las obras se publican en la edición electrónica de la revista bajo una licencia Creative Commons Reconocimiento 4.0 Internacional (CC BY 4.0). Se permite copiar y redistribuir el material en cualquier medio o formato, adaptar, remezclar, transformar y crear a partir del material para cualquier finalidad, incluso comercial. Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios.
3. Condiciones de auto-archivo. Se permite e incentiva a los autores a difundir electrónicamente la versión OnlineFirst (versión evaluada y aceptada para su publicación) de su obra antes de su publicación definitiva, siempre con referencia a su publicación en RIED, ya que favorece su circulación y difusión antes y así propiciar un posible aumento de su citación y alcance entre la comunidad académica. Color RoMEO: verde.


