A language model-based recommender assessment system for short-answer questions in the intellectual property domain
DOI:
https://doi.org/10.5944/ried.45541Keywords:
assessment, feedback, Moodle, test, short-answer, artificial intelligenceAbstract
The use of Artificial Intelligence (AI) in education is growing rapidly, transforming the teaching-learning process as well as the assessment process. This work introduces SLASys, a tool to recommend the assessment of short-answer questions using semantic AI techniques. Unlike other works based on generative AI, SLASys uses the lightweight BERT language model, which better understands specific domain language concepts, improves computational efficiency, and reduces ethical and privacy concerns. SLASys implements semantic comparison and predictive classification models based on BERT. A mixed research methodology was followed, combining action research with a design and creation approach, to develop and refine SLASys over four editions of a master's-level course on patent examination within the intellectual property domain. SLASys has been integrated into Moodle, enabling its use by teachers without technical expertise, and has been tested by 120 students. The results demonstrate its effectiveness even with small datasets and limited participants within the described experience and according to existing literature. Additionally, it has been positively evaluated by both teachers and students. This work shows the feasibility of using AI in higher education, in both hybrid and online environments, offering a practical solution to improve assessment and feedback for short-answer questions in real learning contexts.
Downloads
References
Abu Khurma, O., Albahti, F., Ali, N., & Bustanji, A. (2024). AI ChatGPT and student engagement: Unraveling dimensions through PRISMA analysis for enhanced learning experiences. Contemporary Educational Technology, 16(2). https://doi.org/10.30935/cedtech/14334
Adhikari, A., Ram, A., Tang, R., & Lin, J. (2019). DocBERT: BERT for document classification. arXiv. https://doi.org/10.48550/arXiv.1904.08398
Aggarwal, D., Sil, P., Raman, B., & Bhattacharyya, P. (2025). “I understand why I got this grade”: Automatic short answer grading with feedback. In Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-031-98420-4_22
Akçapınar, G. (2015). How automated feedback through text mining changes plagiaristic behavior in online assignments. Computers & Education, 87, 123-130. https://doi.org/10.1016/j.compedu.2015.04.007
Almasre, M. (2024). Development and evaluation of a custom GPT for the assessment of students’ designs in a typography course. Education Sciences, 14(2), Article 148. https://doi.org/10.3390/educsci14020148
Arefeen, M. A., Debnath, B., & Chakradhar, S. (2024). LeanContext: Cost-efficient domain-specific question answering using LLMs. Natural Language Processing Journal, 7, 100065. https://doi.org/10.1016/j.nlp.2024.100065
Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural machine translation by jointly learning to align and translate. arXiv. https://doi.org/10.48550/arXiv.1409.0473
Banihashem, S. K., Kerman, N. T., Noroozi, O., Moon, J., & Drachsler, H. (2024). Feedback sources in essay writing: Peer-generated or AI-generated feedback? International Journal of Educational Technology in Higher Education, 21(1), 23. https://doi.org/10.1186/s41239-024-00455-4
Baral, S., Botelho, A. F., Erickson, J. A., Benachamardi, P., & Heffernan, N. T. (2021). Improving automated scoring of student open responses in mathematics. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021).
Bergmann, J., & Sams, A. (2012). Flip your classroom: Reach every student in every class every day. International Society for Technology in Education.
Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60-117. https://doi.org/10.1007/s40593-014-0026-8
Calimeris, L., & Kosack, E. (2020). Immediate feedback assessment technique (IF-AT) quizzes and student performance in microeconomic principles courses. Journal of Economic Education, 51(3–4), 304-319. https://doi.org/10.1080/00220485.2020.1804501
Camus, L., & Filighera, A. (2020). Investigating transformers for automatic short answer grading. In I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. Millán (Eds.), Artificial intelligence in education (Lecture Notes in Computer Science, Vol. 12164, pp. 43-48). Springer. https://doi.org/10.1007/978-3-030-52240-7_8
Dai, Y., Lai, S., Lim, C. P., & Liu, A. (2025). University policies on generative AI in Asia: Promising practices, gaps, and future directions. Journal of Asian Public Policy, 18(2), 260-281. https://doi.org/10.1080/17516234.2024.2379070
del Gobbo, E., Guarino, A., Cafarelli, B., & Grilli, L. (2023). GradeAid: A framework for automatic short answers grading in educational contexts-Design, implementation and evaluation. Knowledge and Information Systems, 65(10), 4479-4507. https://doi.org/10.1007/s10115-023-01892-9
De La Cruz Martínez, G., Eslava-Cervantes, A.-L., & Ramírez, S. (2024, July 1). Analysis of solutions of ChatGPT to logic problems based on critical thinking. In Proceedings of the 16th International Conference on Education and New Learning Technologies (EDULEARN24) (pp. 10324-10331). IATED. https://doi.org/10.21125/edulearn.2024.2525
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4171-4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423
Dhananjaya, G. M., Goudar, R. H., Kulkarni, A. A., Rathod, V. N., & Hukkeri, G. S. (2024). A digital recommendation system for personalized learning to enhance online education: A review. IEEE Access, 12, 33591–33615. https://doi.org/10.1109/ACCESS.2024.3369901
Escalante, J., Pack, A., & Barrett, A. (2023). AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20(1), 40. https://doi.org/10.1186/s41239-023-00425-2
European Commission. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689
Evans, C. (2013). Making sense of assessment feedback in higher education. Review of Educational Research, 83(1), 70–120. https://doi.org/10.3102/0034654312474350
Gaddipati, S. K., Nair, D., & Plöger, P. G. (2020). Comparative evaluation of pretrained transfer learning models on automatic short answer grading. arXiv. https://arxiv.org/abs/2009.01303
Gaddipati, S. K., Plöger, P., Hochgeschwender, N., & Metzler, M. (2021, April 5). Automatic formative assessment for students’ short text answers through feature extraction [Doctoral dissertation, Hochschule Bonn-Rhein-Sieg].
Gaona, J., Reguant, M., Valdivia, I., Vásquez, M., & Sancho-Vinuesa, T. (2018). Feedback by automatic assessment systems used in mathematics homework in the engineering field. Computer Applications in Engineering Education, 26(4), 921-934. https://doi.org/10.1002/cae.21950
García-Peñalvo, F. J., Alier, M., Pereira, J., & Casany, M. J. (2024). Safe, transparent, and ethical artificial intelligence: Keys to quality sustainable education (SDG4). International Journal of Educational Research and Innovation, 22, 1–21. https://doi.org/10.46661/ijeri.11036
González Fernández, M. O., Romero-López, M. A., Sgreccia, N. F., & Latorre Medina, M. J. (2025). Marcos normativos para una IA ética y confiable en la educación superior: Estado de la cuestión. RIED-Revista Iberoamericana de Educación a Distancia, 28(2), 181-208. https://doi.org/10.5944/ried.28.2.43511
Grévisse, C. (2024). LLM-based automatic short answer grading in undergraduate medical education. BMC Medical Education, 24(1), 1060. https://doi.org/10.1186/s12909-024-06026-5
György, A., & Vajda, I. (2007). Intelligent mathematics assessment in eMax. In IEEE AFRICON Conference (pp. 1-7). https://doi.org/10.1109/AFRCON.2007.4401512
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487
Huisman, B., Saab, N., van Driel, J., & van den Broek, P. (2017). Peer feedback on college students’ writing: Exploring the relation between students’ ability match, feedback quality and essay performance. Higher Education Research & Development, 36(7), 1433-1446. https://doi.org/10.1080/07294360.2017.1325854
Husein, R. A., Aburajouh, H., & Catal, C. (2025). Large language models for code completion: A systematic literature review. Computer Standards & Interfaces, 92, 103917. https://doi.org/10.1016/j.csi.2024.103917
Hustad, E., & Arntzen, A. A. B. (2013). Facilitating teaching and learning capabilities in social learning management systems: Challenges, issues, and implications for design. Journal of Integrated Design and Process Science, 17(1), 33-46. https://doi.org/10.3233/JID-2013-0003
Jia, Q., Cui, J., Xi, R., Liu, C., Rashid, P., Li, R., & Gehringer, E. (2024). On assessing the faithfulness of LLM-generated feedback on student assignments. In B. Paaßen & C. D. Epp (Eds.), Proceedings of the 17th International Conference on Educational Data Mining (pp. 491-499). International Educational Data Mining Society. https://doi.org/10.5281/zenodo.12729868
Kim, T. W. (2023). Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: A narrative review. Journal of Educational Evaluation for Health Professions, 20, 38. https://doi.org/10.3352/jeehp.2023.20.38
Klein, R., Kyrilov, A., & Tokman, M. (2011). Automated assessment of short free-text responses in computer science using latent semantic analysis. In Proceedings of ITiCSE ’11: The 16th Annual Conference on Innovation and Technology in Computer Science Education (pp. 158-162). https://doi.org/10.1145/1999747.1999793
Kuechler, W., & Vaishnavi, V. (2012). A framework for theory development in design science research: Multiple perspectives. Journal of the Association for Information Systems, 13(6), 395-423. https://doi.org/10.17705/1JAIS.00300
Li, T. W., Hsu, S., Fowler, M., Zhang, Z., Zilles, C., & Karahalios, K. (2023). Am I wrong, or is the autograder wrong? Effects of AI grading mistakes on learning. In Proceedings of the 2023 ACM Conference on International Computing Education Research (ICER ’23) (pp. 85-97). https://doi.org/10.1145/3568813.3600124
Liu, T., Ding, W., Wang, Z., Tang, J., Huang, G. Y., & Liu, Z. (2019). Automatic short answer grading via multiway attention networks. In Lecture Notes in Computer Science (Vol. 11626, pp. 376–388). https://doi.org/10.1007/978-3-030-23207-8_32
Lun, J., Zhu, J., Tang, Y., & Yang, M. (2020). Multiple data augmentation strategies for improving performance on automatic short answer scoring. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, pp. 13381-13388). https://doi.org/10.1609/aaai.v34i09.7062
Mehrafarin, H., Rajaee, S., & Pilehvar, M. T. (2022). On the importance of data size in probing fine-tuned models. In Findings of the Association for Computational Linguistics (pp. 239–248). https://doi.org/10.18653/v1/2022.findings-acl.20
Messer, M., Brown, N. C. C., Kölling, M., & Shi, M. (2024). Automated grading and feedback tools for programming education: A systematic review. ACM Transactions on Computing Education, 24(1), Article 1. https://doi.org/10.1145/3636515
Metzler, T., Plöger, P. G., & Hees, J. (2024). Computer-assisted short answer grading using large language models and rubrics. In INFORMATIK 2024: AI@WORK (pp. 1383-1393). Gesellschaft für Informatik e.V. https://doi.org/10.18420/inf2024_121
Nguyen, H., Bhat, S., Moore, S., Bier, N., & Stamper, J. (2022). Towards generalized methods for automatic question generation in educational domains. In S. Isotani, E. Millán, A. Ogan, P. Hastings, B. McLaren, & R. Luckin (Eds.), Intelligent Tutoring Systems. ITS 2022. Lecture Notes in Computer Science (Vol. 13450, pp. 272-284). Springer. https://doi.org/10.1007/978-3-031-16290-9_20
Nicol, D., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199-218. https://doi.org/10.1080/03075070600572090
Novak, G. M. (2012). Just-in-time teaching. New Directions for Teaching and Learning, 2012(128), 63-73. https://doi.org/10.1002/tl.469
Oates, B. J. (2006). Researching information systems and computing. SAGE Publications Ltd.
OpenAI. (2024). GPT-4o system card. https://openai.com/index/gpt-4o-system-card/
Padó, U., Eryilmaz, Y., & Kirschner, L. (2024). Short-answer grading for German: Addressing the challenges. International Journal of Artificial Intelligence in Education, 34(4), 1488-1510. https://doi.org/10.1007/s40593-023-00383-w
Pang, J., Ye, F., Wong, D. F., Yu, D., Shi, S., Tu, Z., & Wang, L. (2025). Salute the classic: Revisiting challenges of machine translation in the age of large language models. Transactions of the Association for Computational Linguistics, 13, 73-95. https://doi.org/10.1162/tacl_a_00730
Petridou, E., & Lao, L. (2024). Identifying challenges and best practices for implementing AI additional qualifications in vocational and continuing education: A mixed methods analysis. International Journal of Lifelong Education, 43(4), 385-400. https://doi.org/10.1080/02601370.2024.2351076
Qiu, Y., & Jin, Y. (2024). ChatGPT and finetuned BERT: A comparative study for developing intelligent design support systems. Intelligent Systems with Applications, 21, 200308. https://doi.org/10.1016/j.iswa.2023.200308
Rezaei, A. R. (2015). Frequent collaborative quiz taking and conceptual learning. Active Learning in Higher Education, 16(3), 189-204. https://doi.org/10.1177/1469787415589627
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119-144. https://doi.org/10.1007/BF00117714
Saha, S., Dhamecha, T. I., Marvaniya, S., Sindhgatta, R., & Sengupta, B. (2018). Sentence-level or token-level features for automatic short answer grading? Use both. In Lecture Notes in Computer Science (Vol. 10947, pp. 475-486). https://doi.org/10.1007/978-3-319-93843-1_37
Sangapu, I. (2018). Artificial intelligence in education: From a teacher and a student perspective. SSRN. https://doi.org/10.2139/ssrn.3372914
Schneider, J., Richner, R., & Riser, M. (2023). Towards trustworthy autograding of short, multi-lingual, multi-type answers. International Journal of Artificial Intelligence in Education, 33(1), 1-29. https://doi.org/10.1007/s40593-022-00289-z
Senthilnathan, V., Sakthi Vaibhav, M., & Alexander, R. (2025). Semantic refined prompting based automated essay scoring system. In Proceedings of the 2025 International Conference on Electronics and Renewable Systems (ICEARS) (pp. 1344-1348). IEEE. https://doi.org/10.1109/ICEARS64219.2025.10940227
Siddiqi, R., & Harrison, C. (2008). A systematic approach to the automated marking of short-answer questions. In Proceedings of IEEE INMIC 2008: 12th International Multitopic Conference (pp. 281-286). https://doi.org/10.1109/INMIC.2008.4777758
Soulimani, Y. A., El Achaak, L., & Bouhorma, M. (2024). Deep learning-based Arabic short answer grading in serious games. International Journal of Electrical and Computer Engineering, 14(1), 841-853. https://doi.org/10.11591/ijece.v14i1.pp841-853
Souza, F., Nogueira, R., & Lotufo, R. A. (2019). Portuguese named entity recognition using BERT-CRF [Preprint]. arXiv. https://doi.org/10.48550/arXiv.1909.10649
Sung, C., Ma, T., Dhamecha, T. I., Reddy, V., Saha, S., & Arora, R. (2019). Pre-training BERT on domain resources for short answer grading. In Proceedings of EMNLP-IJCNLP 2019 (pp. 6076–6086). https://doi.org/10.18653/v1/D19-1628
van Wynsberghe, A. (2021). Sustainable AI: AI for sustainability and the sustainability of AI. AI and Ethics, 1(3), 213-218. https://doi.org/10.1007/s43681-021-00043-6
Villarroel, V., Bloxham, S., Bruna, D., Bruna, C., & Herrera-Seda, C. (2018). Authentic assessment: Creating a blueprint for course design. Assessment & Evaluation in Higher Education, 43(5), 840-854. https://doi.org/10.1080/02602938.2017.1412396
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (pp. 353-355). https://doi.org/10.18653/v1/W18-5446
Wang, H., & Lehman, J. D. (2021). Using achievement goal-based personalized motivational feedback to enhance online learning. Educational Technology Research and Development, 69(2), 807-836. https://doi.org/10.1007/s11423-021-09940-3
Wang, Y., Wang, C., Li, R., & Lin, H. (2022). On the use of BERT for automated essay scoring: Joint learning of multi-scale essay representation. In Proceedings of NAACL 2022 (pp. 3432-3444). https://doi.org/10.18653/v1/2022.naacl-main.249
Wang, Z., Lan, A. S., Waters, A. E., Grimaldi, P., & Baraniuk, R. G. (2019). A meta-learning augmented bidirectional transformer model for automatic short answer grading. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019).
Winstone, N. E., Nash, R. A., Parker, M., & Rowntree, J. (2017). Supporting learners’ agentic engagement with feedback: A systematic review and a taxonomy of recipience processes. Educational Psychologist, 52(1), 17-37. https://doi.org/10.1080/00461520.2016.1207538
Xavier, C., Rodrigues, L., Costa, N., Neto, R., Alves, G., Falcão, T. P., Gašević, D., & Mello, R. F. (2025). Empowering instructors with AI: Evaluating the impact of an AI-driven feedback tool in learning analytics. IEEE Transactions on Learning Technologies, 18, 498-512. https://doi.org/10.1109/TLT.2025.3562379
Xie, X., & Li, X. (2018). Research on personalized exercises and teaching feedback based on big data. In Proceedings of the ACM International Conference (pp. 217-221). https://doi.org/10.1145/3232116.3232143
Xu, Z., & Zhu, P. (2023). Using BERT-based textual analysis to design a smarter classroom mode for computer teaching in higher education institutions. International Journal of Emerging Technologies in Learning, 18(19), 120-133. https://doi.org/10.3991/ijet.v18i19.42483
Zhang, H., Cai, J., Xu, J., & Wang, J. (2019). Pretraining-based natural language generation for text summarization. In Proceedings of CoNLL 2019 (pp. 789-798). https://doi.org/10.18653/v1/K19-1074
Zhang, H., Yu, P. S., & Zhang, J. (2025). A systematic survey of text summarization: From statistical methods to large language models. ACM Computing Surveys. https://doi.org/10.1145/3731445
Zhang, Z., Zhang, Z., Chen, H., & Zhang, Z. (2019). A joint learning framework with BERT for spoken language understanding. IEEE Access, 7, 168849–168858. https://doi.org/10.1109/ACCESS.2019.2954766
Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., Wang, S., Yin, D., & Du, M. (2024). Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 15(2), Article 26. https://doi.org/10.1145/3639372
Zheng, L., Long, M., Chen, B., & Fan, Y. (2023). Promoting knowledge elaboration, socially shared regulation, and group performance in collaborative learning: An automated assessment and feedback approach based on knowledge graphs. International Journal of Educational Technology in Higher Education, 20(1), 12. https://doi.org/10.1186/s41239-023-00415-4
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 David Bañeres Besora, Ana-Elena Guerrerro Roldán, M. Elena Rodríguez González

This work is licensed under a Creative Commons Attribution 4.0 International License.
The articles that are published in this journal are subject to the following terms:
1. The authors grant the exploitation rights of the work accepted for publication to RIED, guarantee to the journal the right to be the first publication of research understaken and permit the journal to distribute the work published under the license indicated in point 2.
2. The articles are published in the electronic edition of the journal under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. You can copy and redistribute the material in any medium or format, adapt, remix, transform, and build upon the material for any purpose, even commercially. You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
3. Conditions for self-archiving. Authors are encouraged to disseminate electronically the OnlineFirst version (assessed version and accepted for publication) of its articles before publication, always with reference to its publication by RIED, favoring its circulation and dissemination earlier and with this a possible increase in its citation and reach among the academic community.

