Evaluation of students’ academic results through the analysis of their use of Version Control Systems
DOI:
https://doi.org/10.5944/ried.23.2.26539Keywords:
computer application, information processing, machine learning, learning process.Abstract
Version Control Systems are commonly used by Information and Communication Technology professionals. These systems allow for monitoring programmers' activity working in a project. Thus, the usage of such systems should be encouraged by educational institutions. The aim of this work is to evaluate if students’ academic success can be predicted by monitoring their interaction with a Version Control System. In order to do so, we have built a model that predicts students’ results in a specific practical assignment of the Operating Systems Extension subject. A second-year subject in the degree in Computer Science at the University of León. In order to obtain a prediction, the model analyzes students’ interaction with a Git repository. To build the model, several classifiers and predictors have been evaluated by using the MoEv tool. The tool allows for evaluating several classification and prediction models in order to get the most suitable one for a specific problem. Prior to the model development, Moev performs a feature selection from input data to select the most significant ones. The resulting model has been trained using results from the 2016 – 2017 course year. Later, in order to ensure an optimal generalization, the model has been validated by using results from the 2017 – 2018 course. Results conclude that the model predicts students' outcomes? with a success high percentage.
Downloads
References
Agudo-Peregrina, Á. F., Iglesias-Pradas, S., Conde-González, M. Á., & Hernández-García, Á. (2014). Can we predict success from log data in VLEs? Classification of interactions for learning analytics and their relation with performance in VLE-supported F2F and online learning. Computers in Human Behavior, 31(0), 542-550. doi:http://dx.doi.org/10.1016/j.chb.2013.05.031
Barber, R., & Sharkey, M. (2012). Course correction: using analytics to predict course success. Vancouver, British Columbia, Canada: Association for Computing Machinery.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Corbet, J., Rubini, A., & Kroah-Hartman, G. (2005). Linux Device Drivers: Where the Kernel Meets the Hardware: " O'Reilly Media, Inc.".
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4), 303-314.
De Alwis, B., & Sillito, J. (2009). Why are software projects moving from centralized to decentralized version control systems? Paper presented at the Proceedings of the 2009 ICSE Workshop on cooperative and human aspects on software engineering.
Devroye, L., Györfi, L., & Lugosi, G. (2013). A probabilistic theory of pattern recognition (Vol. 31): Springer Science & Business Media.
Duda, R. O., Hart, P. E., & Stork, D. G. (2012). Pattern classification: John Wiley & Sons.
Fischer, M., Pinzger, M., & Gall, H. (2003). Populating a Release History Database from Version Control and Bug Tracking Systems. Paper presented at the Proceedings of the International Conference on Software Maintenance.
Gašević, D., Dawson, S., Rogers, T., & Gasevic, D. (2016). Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. The Internet and Higher Education, 28, 68-84. doi:https://doi.org/10.1016/j.iheduc.2015.10.002
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42.
Griffin, T., & Seals, S. (2013). GitHub in the classroom: not just for group projects (Vol. 28): Consortium for Computing Sciences in Colleges.
Guerrero-Higueras, Á. M., DeCastro-García, N., & Matellán, V. (2018). Detection of Cyber-attacks to indoor real time localization systems for autonomous robots. Robotics and Autonomous Systems, 99, 75-83. doi:https://doi.org/10.1016/j.robot.2017.10.006
Guerrero-Higueras, Á. M., DeCastro-García, N., Matellán, V., & Conde, M. Á. (2018). Predictive models of academic success: a case study with version control systems. Salamanca, Spain: Association for Computing Machinery.
Guerrero-Higueras, Á. M., DeCastro-García, N., Rodríguez-Lera, F. J., & Matellán, V. (2017). Empirical analysis of cyber-attacks to an indoor real time localization system for autonomous robots. Computers & Security, 70, 422-435.
Guerrero-Higueras, Á. M., Matellán-Olivera, V., Costales, G. E., Fernández-Llamas, C., Rodriguez-Sedano, F. J., & Conde, M. Á. (2018). Model for Evaluating Student Performance Through Their Interaction With Version Control Systems. Paper presented at the Proceedings of the Learning Analytics Summer Institute Spain 2018, León, Spain.
Guerrero-Higueras Ángel, M., DeCastro-García, N., Rodriguez-Lera Francisco, J., Matellán, V., & Conde Miguel, Á. (2019). Predicting academic success through students’ interaction with Version Control Systems. Open Computer Science, 9(1), 243. doi:https://doi.org/10.1515/comp-2019-0012
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction: Springer Science & Business Media.
Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques: MIT press.
Kovacic, Z. (2012). Predicting student success by mining enrolment data. Research in Higher Education Journal, 15, 1-20.
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective: MIT Press.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Dubourg, V. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.
Pilato, C. M., Collins-Sussman, B., & Fitzpatrick, B. W. (2008). Version Control with Subversion: Next Generation Open Source Version Control: " O'Reilly Media, Inc.".
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. (No. ICS-8506). California Univ San Diego La Jolla Inst for Cognitive Science.
Siemens, G., Dawson, S., & Lynch, G. (2013). Improving the quality and productivity of the higher education sector. Policy and Strategy for Systems-Level Deployment of Learning Analytics. Canberra, Australia: Society for Learning Analytics Research for the Australian Office for Learning and Teaching.
Siemens, G., & Gasevic, D. (2012). Guest editorial-learning and knowledge analytics. Journal of Educational Technology & Society, 15(3), 1-2.
Spinellis, D. (2005). Version control systems. IEEE Software, 22(5), 108-109.
Torvalds, L., & Hamano, J. (2010). Git: Fast version control system. URL http://git-scm.com.
Zhang, H. (2004). The optimality of naive Bayes. AA, 1(2), 3.