Data-driven educational algorithms pedagogical framing

Data from students and learning practices are essential for feeding the artificial intelligence systems used in education. Recurrent data trains the algorithms so that they can be adapted to new situations, either to optimize coursework or to manage repetitive tasks. As the algorithms spread in different learning contexts and the actions which they perform expand, pedagogical interpretative frameworks are required to use them properly. Based on case analyses and a literature review, the paper analyses the limits of learning practices based on the massive use of data from a pedagogical approach. The focus is on data capture, biases associated with datasets

datos generados recurrentemente son fundamentales para entrenar los algoritmos, de manera que puedan adaptarse a nuevas situaciones, ya sea para mejorar el ciclo de aprendizaje en su conjunto o para gestionar tareas repetitivas. A medida que los algoritmos se propagan en diferentes contextos de aprendizaje y se amplía su capacidad de acción, se requieren marcos pedagógicos que ayuden a interpretarlos y que amparen su uso adecuado. Basándose en el análisis de casos y en una revisión de la literatura científica, en este artículo se analizan los límites de las prácticas de aprendizaje fundamentadas en el uso masivo de datos desde un enfoque pedagógico. Se toman en consideración procesos clave como la captura de los datos, los sesgos en las bases de datos y el factor humano que está presente en el diseño de algoritmos de inteligencia artificial y de sistemas de Aprendizaje Automático. Con el fin de facilitar la gestión adecuada de los algoritmos educativos basados en datos, se plantea la idoneidad de introducir un marco pedagógico que permita analizar la adecuación de los sistemas de inteligencia artificial y apoyar su evaluación, considerando su impacto en el proceso de aprendizaje. En ese sentido, se propone finalmente un conjunto de reglas de enfoque heurístico con el fin de mejorar los vacíos pedagógicos identificados y que puedan apoyar el uso educativo de los algoritmos basados en datos.
The ability to access directly the large amounts of data from online learning platforms is affecting the establishment of the purposes, procedures and the very consideration of educational practices based on digital data. At the same time, the growth of digital learning spaces is boosting basic research on learning processes based on the huge volume of digital data available.
In order to address the challenges of massive data analysis in the study of digital learning experiences, new disciplines -such as learning analytics (Siemens et al., 2011;Buckingham & Ferguson, 2012;Greller & Drachsler, 2012)-combining computer science, mathematics and applied statistics have been introduced (Gitelman, 2013;Kitchin, 2014). Educational research is also increasingly using automatic processes that rely on available information to intervene directly in the learning cycle -i.e. predictive learning analytics, student modelling, recommendation systems, or educational process trace analysis (Breslow et al., 2013;Thille et al., 2014) are all methods that use Artificial Intelligence (AI) algorithms to adapt course design to student needs-. In addition, digital data are also used to design and train Machine Learning (ML) based applications to guide students, and monitor and evaluate learning (Hew, Qiao, & Tang, 2018;Hussain, Zhu, Zhang, Abidi, & Ali, 2019).
Along with the emergence of new methods and disciplines, there is a debate about the change involved in accessing information on student behaviour directly, without previous filters or, at least, without the type of conceptual and methodological filters used previously -i.e. statistical inference, sampling, theoretical framing, etc.-. And, in the same way, radical changes are being discussed in the epistemic conditions that support the ethical regulation of research and intervention in students' daily lives (Crawford, 2016;Farrow, 2016;Metcalf, Keller, & boyd, 2016;Amo et al., 2019).
In that context, this document attempts to frame the main current debates on the use of AI in education by providing a pedagogical view from the educational sciences (Goksel & Bozkurt, 2019;Luckin & Cukurova, 2019;Sharma, Kawachi, & Bozkurt, 2019;Sloane & Moss, 2019;UNESCO, 2019;Zawacki-Richter, Marín, Bond, & Gouverneur, 2019). AI is the combination of a certain type of technologyan algorithm-and a large set of data; and it also includes non-human data, product design and the software used (Sinders, 2019a). AI-based systems and products can affect learning in many ways and, above all, is currently changing the face of educational research and technological interventions aimed at improving the learning cycle. Thus, applying AI in learning contexts involves addressing many of the conceptual and epistemic concerns of data-based educational research (Domínguez, Álvarez, & Gil-Jaurena, 2016).
The paper discusses the pedagogical principles associated with data-driven educational algorithms in order to provide useful rules to guide their design and application in educational spaces. According to the previous analysis by Houlden & Veletsianos (2019), a critical and relevant example-based approach together with a literature review is applied here to conduct the analysis. Firstly, the importance of the human component in the design of AI and ML systems is described. It then analyses the need to introduce a pedagogical dimension that frames the specifically educational aspects arising from data privacy, algorithmic biases and enhanced surveillance systems. Finally, based on the identified pedagogical elements, a heuristic approach is used to propose a set of rules to guide the design and evaluation of data-driven AI applications in education. It is intended to serve as a theoretical precedent to empirically validate a set of criteria for the implementation of AI-based learning systems in education.

THE HUMAN FACTOR IN EDUCATIONAL ALGORITHMS
The data determine much of what educational algorithms do. The data that feed the educational algorithms are a variety of inputs that people make, such as what they choose to like online, what they comment on, how often they check something, and when they use something. They are constantly feeding into the algorithm within the myriad of existing AI-based products, such as recommendation systems, text editors, conversation robots, or activity supervisors. In this way, the data are activated: they have a particular purpose and can become as important as the code of the algorithm (Sinders, 2019b).
But the data is not the main element that determines how the algorithms behave. System design and, especially, human decisions about how to combine data sets are fundamental to understanding how an algorithm uses data.

Core decisions in predictive analytics
This is the case, for example, with learning recommendation systems, which is one of the outstanding features in e-learning products and also supports institutional strategies for student recruitment and retention (Bodily & Verbert, 2017;Prabhakar, Spanakis, & Zaiane, 2017;Romero & Ventura, 2017). In general, recommendation systems are algorithms that aim to suggest relevant elements to users such as movies to watch, products to buy, text to read, learning activities to do, or courses to enrol in. In education, recommendation systems are the main product of predictive analysis, which many colleges and universities use to achieve their student recruitment objectives, focusing on enrolment strategies and adjusting scholarship policies. Demographic and performance data can help educational institutions predict whether a student will enrol in a course, whether once enrolled he/she will stay on track during his/her learning cycle, and whether he/she will require support not to fall behind before completion. Predictive analytics are also used to better tailor counselling services and to personalize learning with the goal of improving student performance (Domínguez, 2018).
To explain how these systems work, as well as the human component in algorithm modelling, the case of Spotify's recommendation app called Discover Weekly is described (see Figure 1). Discover Weekly is a playlist of songs created from a combination of user data and algorithmic inference. In order to display a suitable playlist to a target person, the system initially relies on other people's playlists. Spotify commences by looking at all the playlists created by users, which contain a reflection of their interests and sensitivities. These human-made song selections and groupings are at the heart of Discover Weekly's recommendations. From there, the algorithm gives extra weight to the company's own playlists and the lists that have the most followers. It then attempts to fill in the gaps between the target person's listening habits and those with similar interests. Consequently, if Spotify detects that two of the target user's favourite tracks tend to appear in other playlists along with a third track that the target has not listened to before, it will suggest the new track. In addition, Spotify also creates a profile of each user with their particular music interests, grouped into singer sets and music genres. Finally, the algorithms are responsible for connecting the data from the millions of playlists and the personal interest profile (Pasick, 2015;Sinders, 2019c).
The approaches behind this process of configuring Spotify's algorithms include collaborative filtering and natural language processing, which are automatic selection systems, along with deep learning, which is a technique for recognising patterns in huge amounts of data using powerful computers that are trained by humans to improve their selections (Johnson & Newett, 2015).  In educational contexts, automatic referral systems meet the same requirements as Discover Weekly. To make the results fit the interests of the students, it requires previous access to the trace data generated in the interaction with the educational software, mainly with the Learning Management Systems (LMS). The decisions about which data to obtain or how to combine them do not correspond to the algorithm, but to the people in charge of modelling the information and designing the automatic processes that will later be executed by the algorithm. When it happens in learning contexts, many questions arise that have a clear pedagogical component.
On the one hand, students may wonder how a certain sequence of recommendations came to exist. Which concrete data trained the algorithm. Whether the algorithm infers only from the learning habits of a single student, or whether it takes into account the most popular patterns among the set of actions performed by all students in the LMS. If it takes into account one gender over another, or the time when the actions happen. Whether the actions made by friends -i.e. people you have contact with within the LMS, or eventually outside on social networks-have an effect on the suggestions made.
What is more, from the perspective of the teacher who uses AI-based software in the classroom (Smith, 2019), it is necessary to know the rudiments behind the technologies employed. To improve teaching, it is equally necessary to have the ability to adapt the system to the specific learning practices that arise spontaneously. This aims to prevent the biases and issues associated with current AI-based learning systems which, as mentioned above, require human intervention -and in this case, also the application of a pedagogical vision-decisively to operate properly in a given learning context.

Machine learning pipelines in educational contexts
In addition to AI systems for recommendations, there are educational applications of ML -a subset of AI-especially oriented to the grading process (Alsuwaiket, Blasi, & Al-Msie'deen, 2019), predictive analytics (Uskov, Bakken, Byerly, & Shah, 2019), and identification of learning paths adapted to each student (Kurilovas, 2019). And as it happens with the data-based AI applications, also in the design of educational systems based on ML there is an outstanding human component that requires a pedagogical approach.
ML pipelines consists of the steps to train a data model. It helps to automate the workflows leading to the design of an ML algorithm. It is a cyclical and iterative process, as each step is repeated to continuously improve the accuracy of the model and to have an efficient algorithm. Many of the current ML models are trained neural networks, capable of executing a specific task or providing knowledge derived from what happened to what is likely to happen. They are complex models that are never completed. Rather, through repetition of mathematical or computational procedures, they are applied to the previous result and improved each time to obtain closer approximations to problem solving. Thus, a huge amount of data, processed iteratively, are required to provide the resources to train the ML models (see Figure  2).
One last element to consider in order to obtain good results in the processing of large volumes of data, is the value of the metadata. Metadata resides with the captured data and provides descriptive information about the digital objects -which aggregate data-and the autonomous data. Metadata extraction and correlations between them are the basis of ML models. This is due to the need to work with tags in order to associate data that considered independently would be difficult to handle with each other (Zhou, 2018). Moving that process into the field of education, the main consideration relates to the types of essentially educational tasks required to work with ML systems. Designing the model, training the model, and testing and tagging the data are all human tasks. People are needed to train the models, because currently this task cannot be done without the participation of people. And it is those people who make decisions about what happens to the ML systems, where they are going to be used and for what purposes.
The main pedagogical concerns here are related to the evaluation of the whole system, so that the training of the model is properly oriented to the requirements in terms of learning improvements, without deviations, once several iteration cycles have passed. Additionally, we must also consider the adequate pedagogical approach of the whole system, in terms of fostering the adequate development of skills and competencies of students (Reich, 2014).

MISALIGNMENTS IN DESIGNING DATA-DRIVEN ALGORITHMS
Over the past few years educational sciences have developed a set of conceptual, policy and institutional resources based on how to work with data from learning practices. But AI educational systems are questioning the strict application of that framework to the case of digital data. When researching in a digital context, many open questions arise on substantive issues: whether research methods and programmes based on digital data should be excluded from current ethical frameworks, or are required to comply with existing standards; whether these current standards should be adapted to the special circumstances of digital systems, or whether completely new standards and institutional commitments are needed.
So working on AI requires expanding the framework for educational research. Data from students' digital practices become -at least, in theory-indefinitely connectable and reusable, continuously updateable and easily removable from the context in which they were collected (boyd & Crawford, 2012;Zwitter, 2014). These features that characterize digital systems challenge the limits corresponding to analogue practices, which depend on data that are bounded in time and context, and which are highly constrained by technical infrastructure and financial cost.
A set of methodological challenges associated with the educational use of automatic data processing technologies is analysed below. The concerns involved in the socio-educational use of data-based technologies are raised (Tufekci, 2013;Pitcan, 2016;Bulger, 2016;Caplan, Donovan, Hanson, & Matthews, 2018;Perrotta & Selwyn, 2019), and from there a renewed approach is provided to improve learning based on the management of students' digital data.

Data set and platform bias
As mentioned, to suggest recommendations predictive AI systems study people's behaviour and relate it to some pattern that can explain their actions and, especially, predict their behaviour in the future. In the case of e-learning, the data analysed come from highly complex situations, with multiple meanings and whose interpretation depends largely on the context in which they have been collected. The main element that determines the context is the specific digital platform where the learning activity takes place. This is so important that the same behaviour could have different meanings depending on the platform on which it occurred.
For example, in the case of research on social behaviour on the Internet, the most analysed platform has been Twitter. However, Twitter is far from being a platform that represents the set of digital applications that allow social interaction. Each platform incorporates certain specific functionalities that may not be representative of other social platforms or of human social behaviour in general.
As for Twitter and social networks, in education the platform that has been most researched from learning analytics methodologies has been Open edX. This is mainly due to the fact that it is a free, open source tool that was originally developed for the courses of the edX project, which is the main MOOC site on the Internet.
The multiple studies and experiments on student activity in Open edX have led academics to suggest a general framework for student behaviour in online courses. The framework addresses such important issues as communication in the forums, course completion rates and teacher assignments. However, the Open edX platform does not have some of the features that are common and widely used in other tools, such as Moodle, Canvas or Blackboard platforms, which are leaders in the LMS market. For example, Open edX differs from Moodle in aspects such as the integration of visual elements into text, the monitoring of forum discussions or the management of assessment tests. Open edX's simple interface is well suited for use on mobile devices, making it the preferred platform for studying in mobile situations or from low-bandwidth environments. The mechanism for consulting video classes also causes a particular behaviour, since it is based on a series of viewing rules that are not necessarily equivalent or correspond to the way audio-visual content is consumed on other digital platforms.
To compensate for the shortcomings of the single-platform research models, the data sets involved should be extended to cover the emerging ecology of the contexts that are related to the phenomenon under analysis (Ruipérez-Valiente, Halawa, Slama & Reich, 2019). This does not mean that nothing valuable can be investigated from a single-platform analysis. Rather, it is to assume that these analyses are examining a closed system. And that, ultimately, the solution to this limitation of research based on specific data sets may not be solved by learning analytics methodologies alone.

Searching for tags and keywords in single case studies
Many educational studies with big data -later taken as a reference for modelling AI software-extract relevant text from a platform using tags or keywords. For example, in a course's virtual forum, messages are analysed for words such as exam, query, or thanks. While studies based on tags and keywords can be a powerful method to examine the flow and subject matter of conversations in a course, they are analyses built on the basis of selecting the dependent variable, which is the one that corresponds to the case under study, with all the characteristics and weaknesses that entails using such a methodological route.
In a social investigation, a sample comprising one or several cases has limited analytical power and could offer misleading results, since the variation in the dependent variable is limited (Geddes, 1990). For example, if research is conducted on the essential conditions for students to better understand a topic within a course by looking only at cases of successful courses that have occurred, the explanatory power will necessarily be limited. To improve explanatory power, it would be necessary to also include cases that might have similar characteristics, but where failures have occurred and students have not adequately understood the topics. In the same vein, in keyword-based datasets, a message is included in the dataset precisely because it has a particular outcome already associated with it. In addition, most keywords used to create large datasets are examples of successful terms, which are well known, widely distributed and generate great interest. This calls into question the capacity of this type of study and points to the need to open up the design of research by incorporating a wider variety of techniques and instruments for analysis.

Correlation does not imply causation, even for algorithms
Related to the above assumption, there is a close relationship between the selection of dependent variable features and the attribution of specific factors on which the uncorrelated sample features depend. That is, a self-selected population will not only have general characteristics different from those of the general population, but may also exhibit significantly different correlation trends. This creates -at least-two types of problems.
On the one hand, there is confusion in the variables analysed. Following the example of the tags associated with a message in a forum, these are often related to assumptions, meanings and the cultural or political structure of the context where the conversation takes place. Therefore, the use of tags, in addition to being a method of self-selection, often involves participation and commitment to the framework that the tag integrates. The biases inherent in this situation prevent the conclusions from being generalized to other contexts, which limits the research.
However, the main mistake that research designs that confuse the dependent and independent variable can make is the assumption that the correlation between the factors or traits observed simultaneously in the variables implies some kind of causality between them. This is a common fallacy in the field of statistics, which consists of inferring that there is a causal relationship between two or more events because a statistical correlation between them has been observed, and that big data studies have helped to generalise in part for the reasons given above (Muller, 2018).
Big data studies often emphasize the variations and slides that occur in large volumes of data and assign simple explanations to the complex phenomena behind those variations (Michael & Miller, 2013;Poel, Meyer, & Schroeder, 2018;Brady, 2019). One example is studies at the level of the education system, such as those that analyse the segregation of students in neighbourhoods according to socioeconomic level (Ball, Bowe, & Gewirtz, 1995;Orfield & Lee, 2005), or those that make comparisons between academic performance and other geographic variables such as the country or region of residence of the students (Coleman, 1966;Sirin, 2005). In the history of education there has been much research that has sought correlations between simple variables in order to respond to complex problems, and these have often been questioned over the years. Currently, access to large data sources has opened the door to new and increasingly creative interpretations that are closer to the theoretical approach that supports the studies than the observed evidence (Hansen & Reich, 2015;Monarrez, 2018). Limited funding and time constraints also lead researchers to find causality between factors where there is only apparent correlation that does not always explain the variance in variables analysed in the studies.

Sample limitations
When a study is based on big data, there is a risk of not sufficiently understanding the value of the underlying sample. In social research, the sample corresponds to the selection of people chosen to represent the population where the conclusions are to be applied. Since often not all of the population is available, you must choose a sample that represents it and is manageable. The study is applied to the sample with the expectation that the conclusions obtained can also be replicated in the whole population.
In the case of big data, the research is usually very extensive and the populations to which the studies are projected are often very large. For example, they may concern all Internet users (González-Bailón, Wang, Rivero, Borge-Holthoefer & Moreno, 2012;Ruths & Pfeffer, 2014;Pfeffer, Mayer & Morstatter, 2018), or in the case of education there may be studies whose findings are intended to apply to all students participating in digital courses, all university students or all schools located in a particular type of neighbourhood (Warnakulasooriya & Black, 2018). As the information available in the massive databases is very numerous, the researcher tends to think that these data are sufficient to represent the population. However, this is not always the case and, if one moves forward without an adequate sample selection, one will be assuming a certain risk. Thus, problems may arise in guaranteeing representativeness and equity when attempting to generalize results to populations that, because they are so broad, are characterized by great heterogeneity.
The lack of representativeness of the sample in the case of massive information sources can be tackled by using selection methods appropriate to the size of the population. This includes using big data also in the previous phases of the study, so that it is possible to segment the large volumes of data available. And, on the other hand, social research is called to imitate experimental sciences and incorporate scales close to 1:1 both in the process of information analysis and in the inference of results, thus expanding the commitment to the social reality which intends to study.

The network structure does not reveal everything
Most big data research uses social network analysis methods. In education, it is common for LMSs to incorporate the feature of displaying network structures created from relationships between students or from their interactions with learning resources in online courses. Social networks analysis tries to know the evolution of the information flows provided by the people who are interacting in a certain context and, for this purpose, it uses graphic representations that show the connections between the nodes that make up the network -which can be people, messages sent to a forum, interactions with a resource, etc.-, filtered by the attributes of those nodes -for example, the subject of a message, the type of interaction, etc.-and according to the weight of the links between those nodes -more or less weight depending on the role of the person sending the message, whether the resource is autonomous or part of a learning sequence, etc.-In many cases, researchers using social network analysis take into consideration the structural properties of the whole network to infer from them other properties of the links between the different nodes. For example, one of the most common practices is to connect the links between the alters -an individual's network consists of an ego representing that individual, and his alter, which are the others to whom that ego is connected-to the properties of the network structure. This is true only under certain strict conditions where bridging relationships between groups of networks would be more likely to be weak links (Onnela et al., 2007). These are technical issues, but they can lead to inaccuracies as the information contained only in the network structure is limited.

A PEDAGOGICAL FRAMEWORK FOR EDUCATIONAL ALGORITHMS
A set of heuristic top-to-bottom dimensions aimed at filling the gap detected in the design of algorithms in the educational context is proposed below (see Table 1). In social sciences, heuristic-based analytical frameworks are associated with dynamic and open assessment methodologies. Their main utility lies in the formulation of simple evidence-based rules that provide a wide margin for the analysis of cases that depend on a large number of variables, helping to limit the high degree of complexity in those cases. Based on these rules, key performative indicators can be proposed that function under the logic of criteria satisfaction. The criteria are considered satisfied if a minimum percentage of achievement associated with the indicator is covered, which makes the analysis process more open and flexible than control methods based on dichotomous criteria such as A/B type (Gigerenzer & Selten, 2002;Sundar & Singh, 2013;Mousavi & Gigerenzer, 2014).
The proposed scheme is based on the principles already presented and also benchmarked the existing frameworks on the appropriate use of AI systems in other non-educational settings (Saurwein, Just, & Latzer, 2015;Caplan, Donovan, Hanson & Matthews, 2018;Bunker & Thabtah, 2019;;Floridi & Cowls, 2019;Jobin, Ienca, & Vayena, 2019;). The aim is to introduce a pedagogical layer in the general rules that guide the design of data-based algorithms (Reif, n.d.), for which dimensions and questions are posed to guide the action, here following the model of Diakopoulos et al. (2017) and US-ACM (2017). Guiding questions Accountability: In education, Algorithms are used to make decisions and allocate learning resources based on large datasets. And algorithmic accountability is the process of assigning responsibility for harm when algorithmic decision-making results in discriminatory and inequitable outcomes.
• Are interested audiences informed about the algorithmic decision-making? • Is there a system of internal rules on transparent behaviour? • Are users warned about taking responsibility when interacting with the system? • Are there public measurement criteria for the system's performance? Biases: When algorithms produce unfair results, we refer to them as biased. Algorithmic biases can occur in many ways: by the social context in which an algorithm is created, as a result of technical constraints, or by the way the algorithm is used in practice.
• Is the system design focused on trust?
• Is there a decision review mechanism?
• Is there a system for social/automatic monitoring of bias? • Is there a system for modification in case of bias?
Data provenance: The data within the algorithms are symbiotic with the algorithm itself. So in product design, the data entered into the algorithms determine the characteristics of a product. When data sets are opaque, there is no way to accurately evaluate the results of digital products.
• Is the data properly tagged?
• Is the algorithm trained to discriminate cultural variants in the data? • What data is used to feed the suggestions? • Does the data of others affect the suggestions in particular cases?
Explainability: Ensure that both the algorithmic decisions and the data that drive them can be explained to end users and learning management stakeholders in nontechnical terms.
• Who are the end-users and who are the stakeholders? • What part of the system can be explained to users and stakeholders? • How much of the data sources can be disclosed? • How many of the decisions assumed by the algorithm can be explained?
Fairness: Ensure that algorithmic decisions do not generate discrimination or unfair impacts when different social profile variables are considered.
• Is there control of users who may be favoured over the disadvantaged? • Is there control of potentially harmful effects generated by the mistakes of other users? • Is there control over the context in which the system operates? • Are cultural rules taken into account?

Dimensions
Guiding questions Harmful content: The design of an AIbased product has to consider the type of content that users can add to a repository. It is detrimental not to check and verify whether that content is harmful or not. Policies are required that define the possible damages caused to third parties in terms of containment and actions.
• Is there control of false identities?
• Is there verification of suspicious content? • Is there a social/automatic damage control system? • Is there a protocol against possible damages?
Pedagogical approach: Not only, but mainly, the design and use of educational data-drive algorithms requires a pedagogical approach. This means addressing, at least, essential issues such as the learning theories behind the AI model, attention to the context of the data, and the usefulness of the output to improve learning.
• Have the features of the people involved in the proposal that are of pedagogical interest been properly framed? • What is the educational theory behind the algorithmic decision-making scheme? • Have evidence-based alternatives in the field of learning been considered? • Have the attributes that are of pedagogical interest been adequately contextualized in the data used? Privacy: The data used in educational algorithms come from individuals. They are intimate data, because conversations and social interactions are various forms of intimacy. So the lack of privacy gradients in the design of the algorithms can facilitate harassment and violations of student privacy.
• Have privacy gradients been defined? • Have intimate, personal, social and public spaces been delimited? • Are there mechanisms for user consent?
• Is the user allowed to move between the variations of public and private?
The dimensions and guiding questions of the framework are intended to provide operational shortcuts to educational professionals on how to incorporate a pedagogical approach as well as student sovereignty into the practice of algorithm design. It also aims to focus on the orientation of algorithms to the achievement of student competencies and skills, on the basis that decisions about recommendations and nudges should be guided by pedagogical evidence. All this seeks to foster safer and more inclusive learning spaces and interactions with IA.

DISCUSSION AND FUTURE WORK
The arguments provided in this paper are intended to complement existing evidence in the scientific literature about providing educators with resources to face the introduction of AI in learning spaces. Proposing key methodologies and guidelines grouped in heuristic rules is considered an appropriate way, since this allows for the management of resources in particularly complex situations.
As a non-technical theoretical proposal, the ability to implement the presented framework in practice will depend on further empirical validations referred to in future studies. Thus, the discussion on the construction of the heuristic scheme points to a set of research references on the design of theoretical frameworks and the subsequent empirical validation of rules and constructs.
Another issue with heuristics that can be discussed concerns the so-called consistency of the context. Heuristics are a great contribution when the assumptions on which they are based are sufficiently consistent in the contexts where they are applied. Therefore, the proposed scheme should also be validated in the variety of phases/territories where it is intended to be applied: either in the design of an algorithm, or in the implementation in practice situations, or if it is a technical development context, or one of educational instructional design, etc.
Simple rule frameworks provide shortcuts that assist both the algorithm design process and the use of digital tools in teaching. However, they cannot be directly applied. It is necessary to previously analyse the effective practices of the subjects in the digital spaces, trying to understand their behaviour in a global way. It is assumed that large data sets -either inherently or as a result of their size-do not have direct answers to the most interesting questions. That is why heuristic rule-based approaches advocate simplifying decision-making in complex learning situations, while optimizing the effect by placing the greatest emphasis on analysing the set of actions that produce a given learning.
The next steps in the field of data-driven educational algorithms aim at deepening from a pedagogical perspective the implementation of derived technologies in real educational practice situations, so that the implications of AI in decision making and in the enrichment of learning processes are fully understood. Also, to advance in the analysis of the challenges that AI implies for educational research. And equally, to be open to the validation -both theoretical and empirical-of schemes such as the one proposed here, which serve as a guide for professionals and academics to manage data-driven digital technologies in learning processes. Ball, S. J., Bowe, R., & Gewirtz, S. (1995