Introduction

By the end of the 1990s, developments in technology had increased the potential of distance education. The Internet had emerged as a service that was accessible to everyone. At that time, the client–server model was the common architecture used to design collaborative systems (Wilenksky 1996). Systems offered an interface to consult and add new information, which was always hosted on a server. Update and consistency mechanisms were needed to guarantee the quality of shared data and to show the user-groups the most recently updated work. During this period, the Spanish National Distance Education University (UNED) did not even provide corporate e-mail and the only computer infrastructure we had available was a server provided for our research projects. Students had to supply their own equipment and pay for a connection. These scarce resources shaped how we designed our technological solutions to collaborative learning. The feasibility of designing non-stand-alone applications meant that the user could remotely access the system via a web interface and that they would not have to install other applications on their computer. The student would only need an http address and login name. Furthermore, this configuration ensured that all students would access the applications under the same conditions because, at that time, web applications did not include the concept of a session or profile. Thus, our distance-education students were provided with access to the system without having to follow a complicated installation process and were offered a cheap and easy way to work collaboratively. However, as UNED was an institution with a strong tradition of individual and isolated study, it was essential to engage volunteer students by demonstrating the clear added-value of collaborative learning activities.

In order to not increase the teachers’ workload, we had to undertake the technological work of setting up the collaborative learning activities and the monitoring process. Our technological solutions had to facilitate all the phases of a learning experience for all of the main actors: the learning designers, teachers, tutors, and students.

The main aims of the research project described in our 2000 article (Barros and Verdejo 2000) were as follows:

  • To design principled and meaningful collaborative activities to be conducted during an academic period involving distance-learning students who had no previous experience of collaboration.

  • To provide a fully technologically integrated environment that would address all the phases involved:

    • Configure and deploy computer-supported activities that facilitate the teachers’ tasks;

    • Provide the participants with an environment that fosters collaboration;

    • Monitor and analyse behaviour to enhance individual and group processes;

    • Provide a means for defining organizational memory (task models and outcomes) for future reuse by teachers and students.

  • To look for opportunities for intelligent support in all the phases to facilitate learning and teaching tasks.

  • To automatically analyse individual and group activities to develop a computational model that would advance the state of the art.

We worked on this research project for 5 years and described its development in several articles. We described the first web-based prototype (Barros et al. 1998), presented the approach to declarative mechanisms for the specification of learning tasks based on Activity Theory (Barros and Verdejo 1998), provided a detailed analysis of the iterative design process followed to build this system (Verdejo and Barros 1999), addressed the issue of the analysis framework to characterize individual and group behaviour (Barros and Verdejo 1999), and presented the final version of the system in AIED journal in which we provided an in-depth description of the computational model for automatically analysing the collaboration process (Barros and Verdejo 2000).

In this article, we revisit our research with DEGREE and analyse how it was referred to by other researchers in their research studies. Our research is framed within the context of computer-based interaction analysis and the development of CSCL tools. This article is organized as follows: in the next section, we analyse our approach and core contributions and describe how the CSCL community has perceived our results. Next, we describe how computer-based interaction analysis has recently evolved. We conclude by highlighting a recurrent research problem in the development and design of technology enhanced learning systems.

Approach, Core Contributions, and Practical Impact

We originally developed a research project to explore how to define collaborative learning scenarios and how to analyse group interactions during their performance of a collaborative task and the outcome. We followed a conversation-based approach to collaboration. The result of the analysis was a set of indicators on group activity, the way the individual participated in group activity, and the role of the group in relation to the whole task. The aim was to define tools to automatically gain information on how the collaboration process was proceeding and to give feedback to the learners to guide the collaborative process.

The core contributions of the project addressed four main topics:

  1. i.

    A methodology based on Activity Theory to define collaborative learning scenarios. Our approach provided declarative mechanisms for the specification of a task structure, an associated semi-structured dialogue model, and a schema of the outcome. This methodology was originally implemented in DEGREE and then improved and re-implemented in subsequent versions of the system (Barros et al. 2008; Verdejo et al. 2003). In these versions, we defined an intermediate language to create a generic template with three elements (dialogue, task, and outcome) that could be used to build a scenario-instance based on a taxonomy of values and a fully integrated configuration program. This instance was interpreted by a framework to launch an interactive online learning environment and provide collaboration support to implement this scenario with student groups. This framework provided a common platform, provided the scenario-instance with semantics, managed the users (register, log profile, preferences), and stored the processes and the results as learning objects (Verdejo et al. 2002). This approach evolved as a research field within CSCL, scripts (Dillenbourg 2002), and collaborative learning flow patterns (CLFPs) (Hernandez-Leo et al. 2005), and lay within the general scope of educational modelling languages, such as IMS (www.imsglobal.org) and PALO (Rodríguez-Artacho and Verdejo 2004). The latter language was developed by our research group.

  2. ii.

    DEGREE as a collaborative learning tool that uses semi-structured communication (with sentence openers, tagged dialogs), and offers tools to configure a collaborative experiment, provide an analysis of the process, and offer interventions to improve the interaction process. This model of sentence openers to facilitate the collaborative interaction has been re-implemented in a collaborative version of the SIETTE system (Conejo et al. 2013), which collaboratively assesses small-group activities and analyses interactions.

  3. iii.

    An approach to characterizing group and individual behaviour in CSCL in terms of a taxonomy of indicators is based on three perspectives: the group in relation to similar groups, an individual in relation to his/her colleagues, and the group itself. This taxonomy was the first step in building a CSCL ontology (Barros et al. 2002) for analysing collaborative learning tasks. Subsequently, this experience was shared and discussed with other researchers in the Kaleidoscope NoE (www.noe-kaleidoscope.org), and the conclusions on indicators for collaborative processes were summarized in an article in which we also participated (Dimitricopoulus et al. 2004).

  4. iv.

    A method to automatically analyse a collaborative process log using a fuzzy model that infers qualitative indicators from a set of quantitative indicators (directly obtained by statistical methods from three sources: the log, the dialogue model, and the outcome schema) and from the expert knowledge of the teachers. The inference process was implemented using fuzzy methods that involved modelling the teachers’ criteria for the different values of each indicator when the learning activity was designed. This process was very tedious and not easily scalable, and so we redesigned the method to use Bayesian reasoning to analyse larger groups, such as small learning communities (Barros et al. 2007).

We now summarize our findings about how the collaborative learning community referred to our article (i.e., Barros and Verdejo 2000). Based on a search of Google academic (www.googleacademics.com), we defined this community according to the number of articles (276) that have cited this article up to the present time. We attempted to answer two questions: (i) What are the interests of the research community that cited the article?; and (ii) What aspects are appreciated by our colleagues that led them to cite the article?

To answer these questions, we studied the keywords and abstracts of the articles that cited our article, while noting the number of citations received by these articles. We only took into account the articles that cited Barros and Verdejo (2000) and that had been cited more than once. We obtained the main keywords that defined each article from the title and abstract. We also annotated the number of citations of each title and made a list with two values: (value1: keyword; value2: Σ(number-citations)). We then represented the list in a cloud of words (Fig. 1).

Fig. 1
figure 1

Cloud of keywords/concepts of the articles that cite Barros and Verdejo (2000)

The community that responded to our article comprised the following groups: (i) researchers interested in instructional design for collaborative learning, scripting, and methodologies to define scenarios for group learning; and (ii) researchers interested in interaction analysis, regulation, or assessment and all related challenges, such as mirroring and guiding, interaction regulation, characterizing group behaviour, evaluation, providing individuals and teachers with support, or investigating effective feedback mechanisms to improve knowledge acquisition in groups.

We attempted to answer the second question by reading each article that had cited our article to see what result, idea, or conclusion they referred to. Although we did not review the articles in detail, we obtained a general overview of what they found to be of interest in our work. The sample was taken from the 276 articles mentioned and comprised 53 articles published in English. They were freely accessible and/or published by Elsevier or Springer, and had been cited by other articles at least 10 times. We found that the aspects referred to formed six groups; Group 1: “advice mechanism and feedback”, which included the advice mechanism and feedback on the quality of group interactions, group characteristics, and individual behaviour; Group 2: “indicators on collaboration”, which included references to a set of specific indicators, to the taxonomy of indicators, or to the indicators as a whole; Group 3: “semi-structured communications”, which referred to the use of open sentences (tagged dialogues) to regulate communication in the DEGREE system; Group 4: “computer-based interaction analysis”, which addressed references to the methodology used to analyse collaborative processes, the analytical procedure, and the three aspects of the interaction analysis (by groups, by comparing an individual with the group, or by the evolution of the task); Group 5: “DEGREE as a framework for configuration, execution, and interaction analysis”; and “Group 6: DEGREE as a CSCL system”.

We mapped the relevant aspects and represented each group on a diagram so that all the ideas addressed could be seen as a whole (see Fig. 2). Computer-based interaction analysis is an approach that continues to grow and includes new aspects and challenges that have led to new approaches and results (see the next section). As shown in Fig. 2, DEGREE has been cited as an integrated approach to designing a collaborative experiment, conducting it, and analysing the process and results.

Fig. 2
figure 2

Graphic representation of the number of articles (sample size 55 articles) that cite Barros and Verdejo (2000)

Group 3 (see Fig. 2), refers to the works related with “semi-structured communication”. This approach of model communication was used to organize the argumentation process and facilitate the interaction analysis. When DEGREE was implemented, we decided not to address the automated tagging of discourse units and their textual content used in the collaborative analysis process, because the state of natural language processing (NLP) technology at that time would have prohibited us from obtaining a result sufficiently reliable for its use in our system. The decision to use or not use semi-structured approaches has been a source of debate in our research community. In general, participants prefer nonstructured interaction and the current trend is to employ short sentences and poor language, which raises new challenges for automatic processing. However, the situation has improved due to the large amount of available data and advances in NLP, especially the paradigm shift towards statistical and machine learning approaches. This change represents a research area in its own right, with active groups of researchers reporting progress not only in the field of CSCL (e.g., Adamson et al. 2014; Wang et al. 2008), but also in other approaches, such as tutorial dialogue or recognizing domain content in student discussions (Dragon et al. 2010). Nevertheless, this research area is complex, more research effort is needed, and the area should be extended to include other multimedia communication modalities, such as speech and video.

Regarding Group 5 and Group 6 (see Fig. 2), one line of research was to investigate educational language and design a tool to automatically create educational environments. The work conducted with DEGREE influenced the PALO system (Rodríguez-Artacho and Verdejo 2004) and the Active Document system (Verdejo et al. 2002). Finally IMS-LD was the initiative most widely adopted. (http://www.imsglobal.org/learningdesign/).

In summary, taking into account the aspects that have been cited by the community, we conclude that DEGREE has been viewed as a system with semi-structured communication (Group 3) that enables the analysis of collaborative learning interaction (Group 4) and the taxonomy of indicators used to characterize it (Group 2). In the next section, we analyse how computer-based interaction analysis (group 4) has recently evolved.

Computer-Based Interaction Analysis

Since the end of the 1990s, technology-enhanced learning has advanced in several ways. In this section we present a timeline of the period divided into three stages and point out some of the main features of our perspective. A comprehensive overview is beyond the scope of this article and the references should be taken as illustrative.

Figure 3 shows three stages in three columns: 1995–2005, 2000–2010, and 2005–2015. For each column there is a row that characterizes each stage in terms of the size of the groups, sources of data for analysis, techniques, and some examples of representative systems and approaches used during each period.

Fig. 3
figure 3

Computer-based interaction analysis from 2000 to 2015. By rows: size of groups, sources of data, techniques, and some examples of systems and approaches during each period

The first stage (1995–2005) represents early studies on computer-based interaction analysis, which centred on small groups that worked in an environment designed and implemented to enhance collaboration. Logs were created and used as a data source for the algorithms performing the interaction analysis. These tools enabled the collaborative process to be visualised, provided tools for raising awareness, and, in some cases, were used to diagnose group situations, such us impasse or leadership. During the same period, collaborative scripting arose as a relevant aspect for the design and management of collaborative learning processes. In general, these studies on interaction analysis were conducted using systems designed and implemented for a specific study. These systems were used to run experiments with students and subsequently used to analyse the interaction processes. Thus, the systems themselves acted as the data source for the analysis process.

The focus of later studies on interaction analysis changed from small groups to large groups or small learning communities. We decided to work at a small scale, i.e., no more than ten groups that included two to three students per group. During the second stage (2000–2010), other approaches addressed large-scale groups in unstructured collaboration settings (typical discussion forums) either to characterize the behaviour of groups with similar behaviour (Talavera and Gaudioso 2004) or the roles played by the individuals (Perera et al. 2009) to identify patterns of effective group practice.

When environments to support virtual learning communities appeared, it was easy to create virtual learning communities. Initially, these environments were simple applications for group working and with low interoperability features (e.g., Comtella). Subsequently, they evolved towards flexible and configurable frameworks (e.g., Moodle or BSCW). In this context, methods to collect data interaction (logs) and analysis algorithms could be implemented and integrated (Conejo et al. 2008; Barros et al. 2007; Daradoumis et al. 2006). Virtual learning communities comprise large groups of interacting users, and all this interaction data made it possible to study the learning process and the effects of collaboration on learners by using social network analysis (Martínez et al., 2003) or statistical or Bayesian methods (Barros et al. 2007). E-learning or collaborative environments have mainly been used to define and conduct educational activities and collect the logs created by these applications. Some of these interaction tools have been embedded in the e-learning environments and have taken advantage of open-source facilities. The use of created environments to collect data and focus research on the implementation of analysis methods themselves has given rise to a rich variety of methods and approaches to analyse interaction, communication, and collaboration for learning.

The last stage shown in Fig. 3 (2005–2015) shows that there has been an explosion of data and user interactions due to social networks and other resources (such us blogs, common spaces in social networks, twitter, and so on). It is now possible to use different data sources, including programs developed by different companies, programs not specifically designed for learning and, more recently, the MOOCs. The common advantage of these applications is open access to data and free-of-charge use, whereas their common disadvantage is that the data is mainly non-structured text. The interaction analysis in this new setting requires searching for and pre-processing data. The techniques used for this purpose include information retrieval data extraction and classification. Visualization techniques play an important role in presenting the outcome of these processes.

Discussion

The majority of current research on interaction analysis has little in common with the approach used in DEGREE in 2000. Currently, tasks and collaboration are less explicitly defined and directed. In addition, large amounts of data are available from different sources and the challenge lies in how to organize it and to provide mechanisms to configure collaboration or to model communication given the huge variety of types of tasks, users, and environments, all of which require resources to manage the amount and variety of data. Thus, it remains a challenge to find algorithms and formulas to infer how collaboration emerges and identify the best conditions for knowledge acquisition in very large and spontaneously created learning groups. The trade-off between symbolic and machine-learning approaches applies to AI in general; however, current trends in scalability and the large amount of data to be handled favours machine-learning techniques.

As explained in this article, DEGREE focuses its approach on small-group interaction using a method with fine-grained granularity. Current approaches work with large groups that form virtual communities and study interaction as a whole at a high-granularity level using massive datasets and automatic learning or SNA methods. These new approaches do not compare each utterance with the learning objectives of a task; rather, they study interaction “as whole”, where learning is inferred from the communication level of the group, and the interaction level of its members.

Nowadays, learning management systems and on-line educational software are rich in features to integrate a myriad of tools and to record user activity and interaction logs. Automatic methods are essential to this process given the radical change in the scale of people now accessing open courses. Currently, educational data mining has become an established research field in its own right, and the quest for better methods is involving more researchers from a wide variety of backgrounds. Although the many studies available show that data is collected from a variety of sources, processed (mainly by counting), and visualized with appealing graphic interfaces, there is still a gap in interpreting and modelling this information in terms of understanding the learning process and how the actors interact. Thus, a pending issue is how to exploit the potential of the mined data in the learning cycle (raising awareness, formative assessment, feedback, evaluation, intervention, etc.) and its effectiveness in improving either the processes or products involved, or both in the case of a CSCL perspective. Several researchers (e.g., Charlton et al. 2012; Siemens 2012) have drawn attention to the gap between technologically-driven or pedagogically-driven approaches, which seem to be historically repeated every time a new trend emerges. As in the case of other experimental disciplines, it may be time to carefully define challenging analysis tasks, build data collections, and establish benchmarks through competitive evaluations, such that new insights from the research community can increase the impact of these techniques on educational practitioners.