La IA como instrumento de evaluación: Examen de soluciones manuscritas en el contexto de representaciones matemáticas

Autores/as

DOI:

https://doi.org/10.5944/ried.47084

Palabras clave:

evaluación basada en IA, reconocimiento de escritura manuscrita, puntuación automatizada, representaciones matemáticas, IA generativa, educación matemática

Resumen

La rápida integración de la inteligencia artificial generativa (GenAI) en la educación requiere una reevaluación de las prácticas de evaluación, especialmente en el caso de tareas complejas como las soluciones matemáticas escritas a mano que implican múltiples representaciones. Este estudio evalúa el rendimiento de una herramienta basada en la inteligencia artificial (IA) (ExEvAI) en la evaluación de respuestas escritas a mano que incluyen representaciones simbólicas, tabulares y gráficas. Empleando un diseño secuencial explicativo de métodos mixtos, se recopilaron datos de 89 estudiantes universitarios de matemáticas y se evaluaron tanto por la IA como por expertos humanos. Los análisis cuantitativos revelaron una fuerte correlación positiva entre las puntuaciones de la IA y las de los humanos, lo que indica una alta consistencia en la clasificación de los estudiantes. Sin embargo, la IA demostró un significativo «sesgo de puntuación positiva», calificando con mayor indulgencia que los expertos en preguntas simbólicas y tabulares, mientras que no se encontraron diferencias significativas en las representaciones gráficas. Cualitativamente, la IA mostró una estricta fidelidad a las instrucciones, proporcionando comentarios transparentes y diagnósticos y aplicando eficazmente la puntuación parcial. Estos resultados sugieren que la IA puede servir como un valioso “coevaluador” para reducir la carga de trabajo del profesor y proporcionar comentarios detallados. Sin embargo, debido a los sesgos de puntuación observados, la presencia del factor humano en el proceso de evaluación sigue siendo importante para garantizar la fiabilidad. El estudio recomienda modelos híbridos en los que la IA apoye, sin reemplazar, el juicio pedagógico docente.

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Hilmi Karaca, Aksaray University, ASÜ (Turquía)

Doctor y académico especializado en educación matemática en la Facultad de Educación de la Universidad de Aksaray, Turquía. Su investigación se centra en las representaciones matemáticas, las habilidades espaciales, la integración de la inteligencia artificial (IA), la realidad virtual y aumentada (RV/RA) y las tecnologías de seguimiento ocular en la educación matemática, así como en herramientas didácticas innovadoras.

Citas

Alshehri, A. (2025). AI's effectiveness in language testing and feedback provision. Social Sciences & Humanities Open, 12, Article 101892. https://doi.org/10.1016/j.ssaho.2025.101892

Baltà-Salvador, R., Brasó-Vives, E., & Peña, M. (2026). Evaluating AI-assisted creative ideation: A crossover study in higher education. Thinking Skills and Creativity, 59, Article 101958. https://doi.org/10.1016/j.tsc.2025.101958

Belda-Medina, J., & Kokošková, V. (2023). Integrating chatbots in education: Insights from the Chatbot-Human Interaction Satisfaction Model (CHISM). International Journal of Educational Technology in Higher Education, 20, Article 62. https://doi.org/10.1186/s41239-023-00432-3

Ben Zion, Y., Yakov, S., Abramovitch, E., Balter, G., & Davidovitch, N. (2025). AI-based teaching evaluations: How well do they reflect student perceptions? Computers and Education: Artificial Intelligence, 9, Article 100448. https://doi.org/10.1016/j.caeai.2025.100448

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa

Caro, M. F., Flórez, E. P., & Muñoz, I. C. (2026). A formal model for assessing the learning outcomes of academic programs. Evaluation and Program Planning, 114, Article 102644. https://doi.org/10.1016/j.evalprogplan.2025.102644

Chapelle, C. A. (2025). Generative AI as game changer: Implications for language education. System, 132, Article 103672. https://doi.org/10.1016/j.system.2025.103672

Chauhan, A., Khaliq, F., & Nayak, K. R. (2025). Assessing Quality of scenario-based multiple-choice questions in physiology: Faculty-generated vs. ChatGPT-generated questions among phase I medical students. International Journal of Artificial Intelligence in Education, 35, 2315-2344. https://doi.org/10.1007/s40593-025-00471-z

Choiriyah, S., Ramadhan, S., Nugroho, A., Pembangunan, H. R. P., & Muharom, F. (2025). Artificial intelligence-driven learning assessment in faculties of education: An exploratory study. Munaddhomah: Jurnal Manajemen Pendidikan Islam, 6(3), 482-495. https://doi.org/10.31538/munaddhomah.v6i3.1937

Ciampa, K., Wolfe, Z., & Hensley, M. (2025). From entry to transformation: Exploring AI integration in teachers' K-12 assessment practices. Technology, Pedagogy and Education, 34(2), 141-160. https://doi.org/10.1080/1475939X.2024.2413378

Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches (5th ed.). SAGE Publications.

Dosumu, O., Porumb, V.-A., Stafford, A., & Zimmer, A. (2025). In the wake of ChatGPT: Early reflections on marking open-book online accounting assessments. Accounting Education, 1-32. https://doi.org/10.1080/09639284.2025.2487487

Dreher, A., & Kuntze, S. (2015). Teachers' professional knowledge and noticing: The case of multiple representations in the mathematics classroom. Educational Studies in Mathematics, 88(1), 89-114. https://doi.org/10.1007/s10649-014-9577-8

Duval, R. (1999). Representation, vision and visualization: Cognitive functions in mathematical thinking. Basic issues for learning. In F. Hitt & M. Santos (Eds.), Proceedings of the 21st PME (pp. 3-26).

Duval, R. (2006). A cognitive analysis of problems of comprehension in a learning of mathematics. Educational Studies in Mathematics, 61(1), 103-131. https://doi.org/10.1007/s10649-006-0400-z

Duval, R. (2017). Understanding the mathematical way of thinking: The registers of semiotic representations (T. M. M. Campos, Ed.). Springer. https://doi.org/10.1007/978-3-319-56910-9

Evangelista, E. D. L. (2025). Ensuring academic integrity in the age of ChatGPT: Rethinking exam design, assessment strategies, and ethical AI policies in higher education. Contemporary Educational Technology, 17(1), Article ep559. https://doi.org/10.30935/cedtech/15775

Garcia-Beltran, E., & Vilchez Tornero, J. L. (2025). Dissertation on the adequacy of the CRETA-R methodology as a future teaching resource and analysis of the relevance of the evaluation conducted by ChatGPT and professors: Can ChatGPT facilitate teaching? Innovations in Education and Teaching International, 1-16. https://doi.org/10.1080/14703297.2025.2533385

Goldin, G. A. (2014). Mathematical representations. In S. Lerman (Ed.), Encyclopedia of mathematics education (pp. 409-413). Springer Netherlands. https://doi.org/10.1007/978-94-007-4978-8_103

Goldin, G. A., & Kaput, J. J. (1996). A joint perspective on the idea of representation in learning and doing mathematics. In L. Steffe, P. Nesher, P. Cobb, G. Goldin, & B. Greer (Eds.), Theories of mathematical learning (pp. 397-430). Lawrence Erlbaum.

Gundu, T. (2024). Strategies for e-assessments in the era of generative artificial intelligence. Electronic Journal of E-Learning, 22(7), 40-50. https://doi.org/10.34190/ejel.22.7.3477

Haudek, K. C., & Zhai, X. (2024). Examining the effect of assessment construct characteristics on machine learning scoring of scientific argumentation. International Journal of Artificial Intelligence in Education, 34(4), 1482-1509. https://doi.org/10.1007/s40593-023-00385-8

Heil, J., Ifenthaler, D., Cooper, M., Mascia, M. L., Conti, R., & Penna, M. P. (2025). Students' perceived impact of GenAI tools on learning and assessment in higher education: The role of individual AI competence. Smart Learning Environments, 12, Article 37. https://doi.org/10.1186/s40561-025-00395-0

Ho, A. D. (2024). Artificial intelligence and educational measurement: Opportunities and threats. Journal of Educational and Behavioral Statistics, 49(5), 715-722. https://doi.org/10.3102/10769986241248771

Iori, M. (2017). Objects, signs, and representations in the semio-cognitive analysis of the processes involved in teaching and learning mathematics: A Duvalian perspective. Educational Studies in Mathematics, 94(3), 275-291. https://doi.org/10.1007/s10649-016-9726-3

Karaca, H. (2025). Investigating the effect of artificial intelligence based tools on pre-service mathematics teachers' concept images. E-Kafkas Journal of Educational Research, 12(1), 213-231. https://doi.org/10.30900/kafkasegt.1534014

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163. https://doi.org/10.1016/j.jcm.2016.02.012

Lee, S. C., Baby, T., Vongvit, R., Lee, J., Kim, Y. W., Cha, M. C., & Yoon, S. H. (2026). Development and validation of Generative AI Competence Scale (GenAIComp) among university students. Technology in Society, 84, Article 103059. https://doi.org/10.1016/j.techsoc.2025.103059

Lesh, R., Post, T., & Behr, M. (1987). Representations and translations among representations in mathematics learning and problem solving. In C. Janvier (Ed.), Problems of representations in the teaching and learning of mathematics (pp. 33-40). Lawrence Erlbaum.

Liebenow, L. W., Schmidt, F. T. C., Meyer, J., & Fleckenstein, J. (2025). Self-assessment accuracy in the age of artificial intelligence: Differential effects of LLM-generated feedback. Computers & Education, 237, Article 105385. https://doi.org/10.1016/j.compedu.2025.105385

Morris, W., Holmes, L., Choi, J. S., & Crossley, S. (2025). Automated scoring of constructed response items in math assessment using large language models. International Journal of Artificial Intelligence in Education, 35(2), 559-586. https://doi.org/10.1007/s40593-024-00418-w

National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics.

Pack, A., Barrett, A., & Escalante, J. (2024). Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability. Computers and Education: Artificial Intelligence, 6, Article 100234. https://doi.org/10.1016/j.caeai.2024.100234

Paivio, A. (1990). Dual coding theory. In A. Paivio (Ed.), Mental representations (pp. 53-83). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195066661.003.0004

Peters, M., & Angelov, D. (2025). Redefining assessment tasks to promote students' creativity and integrity in the age of generative artificial intelligence. International Journal for Educational Integrity, 21, Article 25. https://doi.org/10.1007/s40979-025-00201-x

Richardson, M., & Clesham, R. (2021). Rise of the machines? The evolving role of AI technologies in high-stakes assessment. London Review of Education, 19(1), 1-13. https://doi.org/10.14324/LRE.19.1.09

Stanoyevitch, A. (2024). Online assessment in the age of artificial intelligence. Discover Education, 3, Article 126. https://doi.org/10.1007/s44217-024-00212-9

Sureda, P., Corica, A., Parra, V., Godoy, D., & Schiaffino, S. (2024). La evaluación en educación matemática: Aportes de chatbots y futuros profesores de matemática. Edutec, Revista Electrónica de Tecnología Educativa, 89, 64-83. https://doi.org/10.21556/edutec.2024.89.3243

Ul Haq, I., Pifarré, M., & Fraca, E. (2024). Novelty evaluation using sentence embedding models in open-ended cocreative problem-solving. International Journal of Artificial Intelligence in Education, 34(4), 1599-1626. https://doi.org/10.1007/s40593-024-00392-3

Wiyaka, W., Silitonga, L. M., Sunardi, S., & Pramudi, Y. T. C. (2024). From nervous to fluent: The impact of AI chatbot-assisted assessment on english reading anxiety and performance in Indonesia. Theory and Practice in Language Studies, 14(12), 3851-3860. https://doi.org/10.17507/tpls.1412.20

Yildirim-Erbasli, S., Bulut, O., Epp, C. D., & Cui, Y. (2025). Advancing higher education students' assessment experiences with conversational agents. Educational Technology Research and Development, 73(3), 1811-1834. https://doi.org/10.1007/s11423-025-10472-3

Yusuf, H., Money, A., & Daylamani-Zad, D. (2025). Towards reducing teacher burden in performance-based assessments using aivaluate: An emotionally intelligent LLM-Augmented pedagogical AI conversational agent. Education and Information Technologies, 30(17), 24649-24693. https://doi.org/10.1007/s10639-025-13755-7

Descargas

Publicado

18-03-2026

Cómo citar

Karaca, H. (2026). La IA como instrumento de evaluación: Examen de soluciones manuscritas en el contexto de representaciones matemáticas . RIED-Revista Iberoamericana de Educación a Distancia, 29(2). https://doi.org/10.5944/ried.47084

Artículos similares

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 > >> 

También puede Iniciar una búsqueda de similitud avanzada para este artículo.