Analysis of the scientific literature on Massive Open Online Courses (MOOCs)

Massive Open Online Courses (MOOCs) have been gaining attention from Academy by the disruptive innovation with which technology is brought into the educational system. Due to the emergence of the theme, the area needs recent bibliometric studies that point at previous studies about it and indicate directions for further research. Therefore, this study aims to map MOOC researches on published until December 2014 in two scientific databases: Scopus and Web of Science. Relevant aspects of scientific literature on MOOCs were explored from the collected data such as: (i) the number of publications per year; (ii) mapping of the institutions; (iii) authors with the most publications; (iv) classification in macro-theme of the identified jobs and (v) references most used by the authors. For the analysis, it was used 294 papers written by 694 authors, affiliated with 266 institutions. In the articles analysed, the authors used 5,060 different references, as well as 634 different keywords. This work, in addition to a mapping of research, aims to contribute to the spreading of the idea that the MOOC theme is emerging, promising and it needs to go further in the development of new researches.

Since 2013, the theme MOOC has been widely discussed in the academy (Aires, 2016) through publications with different theoretical and practical perspectives. Due to the rising of the theme, the area lacks recent bibliometric studies (Ng'Ambi & Bozalek, 2015;Sangrà & Wheeler, 2013) that point at which surveys have been conducted and which direction should be taken by the new ones. Some work has been done for classifying/in order to classify publications related to MOOCs such as: Liyanagunawardena, Adams, and Williams (2013), Kennedy (2014) and Yousef et al. (2014). As for Abad, Conde, and Peñalvo (2014), the authors conducted a survey listing the terms e-learning and MOOCs. However, these studies did not include the works conducted in 2014. For this reason, this study aims to map the scientific literature on MOOCs until December 2014 in order to identify: a) the main authors and the institutions to which they are affiliated; b) the theoretical basis of such studies; c) the classification of research in macro-themes and; d) references most used by the authors. For this purpose, it is presented, in section 2, the details of the methodological procedures used for the development of this research. Section 3 describes the results and Section 4 shows the final considerations, followed by the used references.

METHODOLOGICAL PROCEDURES
The main characteristic of a bibliometric research is to review scientific literature in order to identify indicators that can portray the development of a particular area (Bufrem & Prates, 2005;Horst, 2013). This work has been carried out in three phases: 1) searching, selecting and listing the works; 2) standardizing and classifying them and 3) data analysis and writing the final document. These stages, when expanded, comprise seven steps, which are shown in Figure 1 and are described below.

Stage 1: Defining research keywords
At this stage, the purpose was to identify scientific papers published on Massive Open Online Course topic and indexed in international scientific databases. To carry out the search on the databases, the use of the exact term "Massive Open Online Course" has been chosen and from it, through reading, the main macro-themes addressed in the studies have been identified.

Stage 2: Searching scientific databases
The searches were conducted in two international databases: Scopus and Web of Science (WoS), without limiting the year or language of publication. Those databases were chosen because they are multidisciplinary, internationally recognized by the scientific community, widely used for bibliometric studies (Brambilla & Stumpuf, 2012;Regolini & Jannès-Ober, 2013) and for being "an international benchmark for measuring scientific production of countries" (Packer, 2011, p. 29). In addition, they were chosen because they have records on the researched topic and allow data exporting to EndNote bibliography management software in a standardized format.

Stage 3: Exporting results to the reference managing software
The information from consulting the selected databases, such as title, author, place of publication and keywords, has been exported to reference managing software, thus forming a single set of articles.

Stage 4: Adopting criteria for work selection
In the reference managing software, the following criteria for the selection of the works were applied, removing those which: a) had no authorship; b) were duplicates (indexed articles in more than one database); c) were not written in English, Spanish or Portuguese; d) have costs to access the full text; e) could not provide access to the full text; f) were published after December 2014 and; g) were out of the study context.
It has been decided to select publications in English due to its importance as an international knowledge communication tool. However, Spanish and Portuguese were also considered, due to the authors' geographical nature and nationality. In relation to the temporal delimitation of publications, it has been chosen to select all works published until December 31st, 2014, allowing complete annual reviews.

Stage 5: Classifying works in macro-themes
In order to classify the works in macro-themes, the title, abstract and keywords of the selected works were read and, in case of any doubt, the full text was read. To help the listing process, the work by Liyanagunawardena et al. (2013) and Youset et al. (2014) was used as a basis.

Stage 6: Data standardization
The records identified in Scopus and WoS have different criteria for spelling information such as authors' names, which change the results, as the ranking of researchers' and institutions' productivity. For that reason, it was necessary to standardize data, to ensure homogeneity. In addition, information regarding the authors' affiliation and references are not available in the metadata, creating the need to pick them up directly in the article text. Information on keywords has also been complemented. To help this process, a new database was created by the use of Microsoft Access. Each item has been standardized and data have been complemented manually. As for the references, those, that did not have the date of publication or access, were discarded. The standardization process takes time and it is essential for the bibliometric study.

Stage 7: Data analysis and writing the final report
In the final set of selected and standardized work, it was possible to generate queries and images that best display the data, and the results were described in a final report.

RESULTS
Initially, the research in the scientific databases Scopus and WoS was held on 13th July 2015. After the adoption of some filters, described in step 4 of the methodological procedures, it has come up with a set of 294 selected works. The selecting process of publications for the final analysis is presented in Table 1. Among the 294 selected papers, 140 are indexed articles in scientific journals and 154 are publications in conference proceedings. As for the language, 283 were published in English and 11 in Spanish. Table 2 presents the general bibliometric research data. Source: Research results * Source, in this study, refers to where (Journals or conference proceedings) the work has been published.

Time Trends
By analyzing the 294 selected works, it was observed, through the graph in Figure  In 2014, of the 180 identified studies, 88 originated in 44 conferences and 92 were published in 58 journals. Emphasis is given to the conferences 1st ACM Conference on Learning at Scale, with 16 articles, and MITE, with 10 published articles; also, to the IRRODL journals, with 17 published works, and Profesorado, with 5 published articles.
It can be seen that in 2012 and 2013 there was a greater interest by researchers, in publications at conferences, while in 2014 there were more publications in journals, which may suggest a maturing in research on MOOCs.

Main sources of publications
Of the 294 selected works, 26 were published in the IRRODL and of the 26 publications carried out in the journal, 17 were held in the year 2014. Table 3 presents the main journals used by the authors to publish their work.

Main authors and institutions
By analyzing the authorship of the selected works, in Table 4, it is illustrated the main authors and institutions to which they are affiliated. It is worth noting that information about the authors' affiliation was collected in the articles and may not reflect their current position.   From the location data of the institutions to which the authors are affiliated, it was possible to build a map (Figure 3) showing where there has been research on MOOCs. Each point on the map represents an institution and the larger the circle, the larger the number of research in that place.  Note that research on MOOCs in the United States is distributed across the country, demonstrating the spread and interest in the subject in the various institutions.

Main keywords and Macro-themes of publications
Of the 294 works selected in this study, the authors used 634 different keywords. The most frequently used term was MOOC with 130 repetitions and "Massive Open Online Course" appearing in 70 articles. Even though both terms have the same meaning in 26 papers, the authors chose to use the two terms as a form of identification.
The other most mentioned terms are: e-learning used in 29 works; higher education, used in 21; online learning in 20; Open Educational Resources, in 16; online education, in 15 and connectivism, used in 12 papers. The tag cloud shown in Figure 5 illustrates the terms used as keywords in the analyzed studies. It can be observed that out of the 634 different keywords used to identify the works, 515 (81.2%) are terms that are not repeated, suggesting the existence of different themes being addressed in research on MOOCs.
Based on the classification adopted by Liyanagunawardena et al. (2013) and Youset et al. (2014), it was possible to distribute the articles selected in macrothemes, from the reading of the title, abstract, keywords and, when in doubt, of the complete work ( Figure 6). It is worth mentioning that the same analyzed work may be in more than one macro-theme and due to the high number of articles, only some works will be exemplified in each of the macro-themes shown in Table 6.  It includes works that deal with the conceptualization and history of MOOCs as in Nechifor and Purcaru (2014), of open education as in Pisutova (2012), of open educational resources, such as in Deimann and Farrow (2013) and of Open Couserware as in Rhoads, Berdan, and Toven-Lindsey (2013); they make a counterpoint between education X technology as in the article by Comeau and Cheng (2013); and address issues such as communities of practice as in Overmyer (2013), pedagogical innovation as in Sangrà and Wheeler (2013), and the advantages, disadvantages and timeliness of MOOCs such as in Stuchlíková and Kósa (2013).

Design and technology
Works that address accessibility have been identified as in Sanchez-Gordon and Lujan-Mora (2013); the machine learning, such as Singh and Lal (2013); of the evaluation platforms as in Kay, Reimann, Diebold, and Kummerfeld (2013); instructional design, course format and material production in Grünewald et al. (2013); engagement, incentives and tools for production as discussed by Anderson, Huttenlocher, Kleinberg, and Leskovec (2014); flipped classroom, forums, interaction, gamifications, metadata and issues related to course quality as in Speck et al. (2014) and Sadykova (2014).

154
Learning theories The works searched were those that directly address the learning theories, connectivism being among them, as approached by Clarà and Barberà (2013).

Types of study
The following types of work were identified: dealing with course evaluations; bibliometric research; framework; quantitative research; report of experiences and use; theoretical and empirical work.

278
Business Models Works dealing with institutional challenges of sustainability projects, as in Burd et al. (2014), and with discussions related to the business model as discussed by Kalman (2014), were identified.

21
Target public They include aspects related to learners' learning characteristics and courses for seniors as in Sanchez-Gordon and Lujan-Mora (2013); retention of students, as in Adamopoulos (2013); and use of social media, as approached by Kravvaris, Ntanis, and Kermanidis (2013).

Evaluation
It was sought studies that deal with self-assessment, electronic evaluation, peer review and evaluation process in general, as in the work by Admiraal, Huisman, and van de Ven (2014). 8

Analysis and research
It was classified works that directly address research related to access, view and add-on fees of the courses, as the work by Zhuhadar and Butterfield (2014); sentiment analysis and social interactions, as in Harris, Zheng, Kumar, and Kinshuk (2014) and students' engagement, as in Hew (2014).

23
Others It was classified the works that address issues such as Big Data, the future of MOOCs, institutional policies and publications in the form of videos, such as in the work by Daries et al. (2014).

Main references cited
The authors of the 294 studies analyzed used 5,060 different references. Table  7 shows the main references used by the authors. Of the 10 most frequently studies cited in the articles, only two are in the set of analyzed studies: "Deconstructing Disengagement: Analyzing Learner Subpopulations in Massive Open Online Courses" and "The Challenges to Connectivist Learning on Open Online Networks: Learning Experiences during a Massive Open Online Course". In the article "Making Sense of MOOCs: Musings in a maze of myth, Paradox and Possibility", Daniel makes an overview about MOOCs addressing from the definition to issues related to the platforms, pedagogical aspects and quality involving MOOCs, in addition to certification. Still, the evolution of the North American platforms Coursera, Udacity and edX as well as reflections on the present and the future of MOOCs, are issues addressed in the article "The year of the MOOC", published in The New York Times on November 2, 2012, by Pappano.
With respect to learning theories, Siemens, in his article "Connectivism: The learning theory for the digital age", discusses the limitations that exist among behavioral, cognitive and constructivist theories in order to introduce an alternative theory, the connectivism.
In the case of the analysis of courses in the article "Deconstructing Disengagement: Analyzing Learner subpopulations in Massive Open Online Courses", the authors investigated three MOOCs of the computer science area with the focus on apprentices' engagement, in order to increase the completeness of the MOOC courses.
Also, in the article "Studying learning in the classroom worldwide: Research into edXs first MOOC", the authors present the data collected relating to students' behaviour in the first course offered by the edX, called "Circuits and Electronics".
In the work "The Ideals and Reality of Participating in a MOOC", they analyze the CCK08 course (Connectivism and Connective Knowledge) conducted in 2008 and they highlighted that autonomy, diversity, openness and connectivity / interaction are characteristic factors of MOOCs, but they are difficult to resolve in online courses due to lack of infrastructure and monitoring of courses by tutors.
Still, in the work "The Challenges to Connectivism Learning on Open Online Networks: Learning Experiences During a Massive Open Online Course", Kop analyzed the self-learning, the presence (student's participation in online activities) and critical literacy (skills to use ICTs in MOOCs). Rodriguez, in the article "MOOCs and the AI-like Stanford Courses: Two Distinct and Successful Course Formats for Massive Open Online Courses", makes a comparison between two different course formats successfully applied: the cMOOCs and the AI-Stanford.
It can be observed, therefore, that the authors of these studies agree that, in addition to an evaluation, it is necessary to advance in MOOCs research in order to improve the methodology of online courses.
In the report "MOOCs and Open Education: Implications for Higher Education", Yuan and Powell discuss open education and the changes taking place in higher education in order to assist managers in understanding the disruptive innovation occurring through MOOCs in this universe and its political implications.

Discussions
It can be seen an increasing consolidation of academic research on MOOC after 2012, and confirmed in the studies by Yousef et al. (2014)  Data also indicate that 257 affiliated researchers at 82 US institutions have published 106 studies, in which only 13 were in collaboration with foreign institutions. This behaviour is also observed in other countries that concentrate more researchers in the MOOC theme: considering the 22 Indian institutions with 45 affiliated researchers, out of the 18 published works, only four were in partnership; as for the 21 Spanish institutions with 78 affiliated researchers, out of the 31 published works, eight were in partnership; and concerning the 16 institutions from the UK, with 36 affiliated researchers, out of the 25 studies, seven were in partnership. However, Asian and European institutions, even having a small number of researchers, are more open to international research collaboration.

and Liyanagunawardena
Among the set of selected works, 20.4% are mentioned in the references of other articles, representing 1.8% of references used by the authors. In a pooled analysis, it is assumed that there is still no central group of articles because of the rising of the theme. However, it is believed that the work identified in Table 7 can be considered by the academic community as seminal and used to support further research on MOOCs.
In relation to the limitations of the work, of the 294 analyzed articles, in 73 (24.8%), the authors did not report the keywords in the text, which impaired a more accurate analysis. Lack of keywords in the metadata hinders the location of the works by the search engines. In addition, in 13 studies there is no clear identification of the authors' affiliation in the documents, making it impossible to analyze the overall data in these requisites.

CONCLUSION
This study aimed to identify and analyze the scientific production on MOOCs published until December 31, 2014 in English, Spanish and Portuguese, in international databases Scopus and WoS.
From the mapping and analysis of scientific literature on MOOCs, it was possible to see the current state of research in this area. To this end, this work enabled to (a) identify the increase of scientific studies published in the limited time period; (b) show the main sources in which the selected works were published; (c) present the main authors and their institutions; (d) point out the most used keywords; (e) classify the articles in macro-themes and (f) reveal the main theoretical frameworks used in the identified articles. It has become, thus, a theoretical framework that has the intention to help those interested in expanding the studies and the development of MOOCs, besides enriching the discussion on the direction of research and trends on MOOCs.
Even though this work is limited to two scientific databases and to articles published until December 2014, it is noted that MOOCs have aroused great interest in the academic community for bringing innovation into the education system, enabling new business models. However, there is the need to develop research regarding the sustainability of long-term projects and the technological infrastructure needed to store, manage and deliver courses in MOOCs format. For future work, we suggest the expansion of research bases and research period, as well as using other analytical techniques that allow the comparison between different types of "free and paid" scientific basis.
The paradigm of online learning is relatively new to certain global realities, however it influences the way in which knowledge is disseminated. The MOOCs can contribute to make knowledge a public good and available to a larger number of people.