Keywords

1 Introduction

OSSFootnote 1 development is a knowledge-intensive activity that involves software developers, who are usually geographically dispersed, using online forums to coordinate their work activities [1,2,3]. These online forums are communication channels where software developers express their emotions concerning their degree of satisfaction [4] concerning a specific piece of software code (known as a patch) that is peer reviewed. Peer review is an important quality assurance mechanism in the OSS community but is less well understood when compared to other aspects of OSS development [5].

As the online forums facilitate peer reviews and interactions between members of the open source community, it offers a rich source of insights into community practices and social norms [3]. Previous research on online forums focused on discovering knowledge sharing practices [6], information seeking behaviours among developers [7], identifying active contributors [8], and the sentiment of members within the community [4, 9,10,11,12]. Research has shown that sentiment affect quality, productivity, creativity, group rapport, and job satisfaction [13]. Understanding the sentiment of software developers is important for project managers as it provides a better understanding of the social factors that affect the project and the corrective actions required to improve sentiment [4, 5].

OSS development is also a highly collaborative activity [2], requiring creativity and problem-solving skills, which are influenced by emotion [14]. Further, the sustainability of open source communities requires software developers to maintain healthy relationships with their peers in order to ensure their input and support [15]. It would therefore seem logical that the sentiment of project members plays an important role in the success or failure of a project, however project managers find it difficult to keep track of their people’s feelings [1].

As OSS projects are notoriously subject to contradictions (i.e. tensions, conflict, breakdown in communication), we use Activity Theory (AT) to examine contradictions because AT anticipates this [16]. Contradictions are “historically accumulating structural tensions within and between activity systems” and are a fundamental concept in AT [17, p. 137]. The identification of contradictions helps practitioners to focus their efforts on the root causes of problems. This collaborative analysis can lead to the creation of a shared vision for the solution of the contradictions [18]. [19] propose four distinct types of contradictions which they associate with discursive manifestations, namely, (i) double binds, (ii) conflicts, (iii) critical conflicts, and (iv) dilemmas. In this manner, discursive manifestations can be associated with a type of contradiction and with its resolution.

We argue that a greater scrutiny of discursive manifestations is necessary in the study of open source communities for three key reasons.

First, by illuminating discursive manifestations of contradictions rich insights into the social norms and practices of open source communities will be revealed. This is important as organisations in the 21st century play an active role in shaping the structure and direction of open source communities [20].

Second, there is a noticeable absence of research that progress from simply applying sentiment analytics [1, 4, 5] to advancing the accumulative body of knowledge via theoretical development. This lack of cumulative tradition [21, 22] resonates with the issue of ‘fragmented adhocracy’, which has previously overshadowed IS research [23,24,25]. By grounding the study in AT, we theorise how sentiment analytics can be used to provide a deeper understanding contradictions.

Third, in the context of online forums that are used by open source communities, [26] makes a call for a serious expansion of our understanding of organisations, work, and learning. This study answers this call, by examining sentiment in the context of collaborative work.

Using AT as the theoretical lens is pertinent in this study for three key reasons, namely (i) understanding context in which the words are used is important as it strongly influences accuracy [27, 28] and AT is oriented at understanding the activity in context [29]. AT acknowledges contradictions as a means of understanding and change [17, 30], a concept that is not explicit in other social theories [31]. Hence, we make the claim that it is more useful to integrate sentiment analytics with the analysis of discursive manifestations. In doing so, rich insights into how emotions permeate work and contradictions, that influence how people work on daily basis is revealed. Therefore, through the lens of AT the overarching aim of this study is to

“Explore how sentiment analytics can illuminate discursive manifestations of contradictions in the context of open source communities”.

The paper is structured as follows. First, a review of literature on contradictions from the perspective of AT is presented. Next, the method used to extract and clean data for the purpose of analysis is outlined. Then, key findings and analysis is presented. Followed by discussion and implications for practice, academia, and society. The paper ends with conclusions, limitations and future action.

2 Activity Theory

Contemporary thinking on AT, known as third-generation AT emerged from the seminal work of [32] who acknowledges the systemic relations between an individual and their environment, by highlighting the influential nature and interrelatedness of the larger social context.

A fundamental concept of AT is the notion of contradictions, which occur within an activity and/or between multiple interrelated activities and promote dialectical transformation [17, 33]. While the term ‘contradiction’ may be considered by some as a weakness, from the perspective of AT, they are a sign of richness and an opportunity to develop in the activity system [33, 34]. Contradictions are seen as the sources of learning and can become the driving force for change and development in a system, if they are addressed [16]. Essentially contradictions are ‘motors of change’ [35]. Contradictions can occur either inside the key constructs (e.g. community) or between them, or they may occur in networks of activity systems [17, 36]. Contradictions can be identified through their manifestations, which include, disturbances, errors, problems, rupture of communication, breakdowns, and clashes [17, 37, 38]. However, contradictions may not be obvious, openly discussed, or be culturally or politically challenging to confront [35, 39]. Researchers must therefore rely on indirect methods to make visible the contradictions and to explain the genesis of their development [40].

More recently, discursive manifestations of contradictions in organisational change efforts have been studied [19, 40]. Table 1 lists four distinct types of contradictions that [19] associate with discursive manifestations and its resolutions.

Table 1. Types of discursive manifestations of contradictions

Double bind is typically expressed “first by means of rhetorical questions indicating a cul-de-sac, a pressing need to do something and, at the same time, a perceived impossibility of action” [19]. Occurs when a person or group engages in interactions that raise paradoxical and contradictory demands, which make it difficult to step back from their current activities, and consequently create feelings of helplessness. A double bind is typically a situation which cannot be resolved by an individual alone [19]. Resolution requires making practical changes that are transformative and collective actions that go beyond words but is often accompanied with expressions such as “let us do that”, “we will make it” [19, 40].

Critical conflict are situations ‘in which people face inner doubts that paralyse them in front of contradictory motives unsolvable by the subject alone’ [19, p. 374]. These critical conflicts are very emotionally and morally charged, which makes it difficult, or even impossible, for them to be resolved solely by the subjects involved (ibid). The discourse is also marked by vivid metaphors [40]. Resolution occurs ‘via a renegotiation of meaning for the subject who was accompanied by the collective in order to allow the former to gain critical distance from their experience and to give it new meaning’ (ibid, p. 282).

Conflict takes the form of resistance, disagreement, argument and criticism, and occurs “when an individual or a group feels negatively affected by another individual or group, i.e. because of a perceived divergence of interests, or because of another’s incompatible behaviour” [41, p. 1]. [19] observed that people engaged in a conflict tend to argue and to criticise each other. Conflicts are resolved through compromise or submitting to authority or the majority [40].

Dilemma is an ‘expression or exchange of incompatible evaluations, either between people or within the discourse of a single person’ and is most often expressed in the form of hesitations, such as “yes, but” [19]. It is typically reproduced rather than resolved, often with the help of denial or reformulation (i.e. I didn’t mean that).

3 Methodology

This section outlines the process we used to analyse sentiment and discursive manifestations pertaining to discussions via the DPDKFootnote 2 community platform between 28th Feb and 4th May 2018. As sentiment analysis tools require customisation for the context of software development [42,43,44] we customised two popular sentiment analysis dictionaries – ‘Opinion Lexicon’ and ‘Comparative Words’. To analyse the sentiment in the message body content, we followed a similar approach to [9] where the message body is split into tokens and using a rule-based algorithm in combination with two dictionaries, assigned a positive, neutral, or negative score. The assigned sentiment scores ranged from ‘Strong negative’ (−20), Weak negative (−10), Neutral (0), Positive (+10), and Strong positive (+20). A token is assigned a score according to the matching word found in the dictionaries and the overall sentiment of a message was computed as the sum of all scores assigned to the tokens contained in that message. The research method consists of three inter-related phases, namely, (i) data extraction, (ii) data pre-processing, and (iii) data analysis.

  • Phase 1 Data Extraction: Comprised of extracting messages from the dpdk-dev mailing list archived at http://mails.dpdk.org/archives/dev/. A total of 13,461 messages were extracted in RAR file format.

  • Phase 2 Data Pre-processing: Executed using Python scripts, messages were converted from RAR file format into CSV file format and messages dated outside the release cycle removed. This resulted in 8,585 messages being included in this study. The message content was cleaned for analysis using regular expressions to ensure that only the message body and natural language remained. All message headers, code, file paths, and non-alphanumeric symbols/characters were removed. This activity was critical to reduce any instances of misclassification [1]. The remaining text was then converted into DataFrame format (tabular data structure in Python) for compatibility purposes with the sentiment analysis algorithm.

  • Phase 3 Data Analysis: As domain-specific terms influence sentiment analysis [1], the research team collaborated with members of the open source community to refine the dictionaries and data in an iterative manner. The natural language dictionary was augmented with domain-specific language of the open source community to include the following terms, ‘NIT’ (e.g. OK but a small problem), ‘NACK’ (e.g. Not accepted by the community), and ‘LGTM’ (e.g. Looks good to me). Also, as noted by [19], their categorisation of manifestations is not exhaustive. Therefore, the linguistic cues unique to the open source community studied are included in the analysis of discursive contradictions, namely, ‘NIT’ (e.g. Dilemma), and ‘NACK’ (e.g. Critical conflict). These findings are presented in the next section.

4 Findings and Analysis

We investigate sentiment around ‘nack’ and analyse the underlying discursive manifestations of contradictions, these are generally viewed by the community as wasted time and effort (i) of the developer who developed the patch, and (ii) of the community members who review the patch.

Sentiment Analysis:

Figure 1 illustrates the sentiment score plotted against time, during which activities (e.g. scoping, pre-merge code, bug fix, test, and release) are completed as part of the release cycle. The red bars are the dates that 15 ‘nacks’ occurred during the release cycle - 5 in March, 8 in April, and 2 in May.

Fig. 1.
figure 1

Sentiment score plotted against time (Color figure online)

Analysis of the sentiment reveals that the overall sentiment is minimally positive (0.210). A number of positive and negative outliers are present at the start and end of release cycle. The underlying reason for these is that initially a patch will have errors/defects but following a series of reviews and revisions, the quality of the patch improves, as does sentiment of the community. As overall sentiment is minimally positive, these findings challenge the assumptions of the community that messages containing ‘nack’ should have strong negative sentiment. This indicates that the ‘nack’ messages can also contain positive sentiment that can have a neutralising effect on the overall sentiment score.

This finding is supported by the distribution of sentiment scores represented in Fig. 2 below. The sentiment score distribution that is normally distributed and the mode is zero. This indicates that the majority of discussions were neutral due to the technical nature of the conversations for each review.

Fig. 2.
figure 2

Frequency distribution of sentiment scores

Table 2 provides a summary of the statistics for the 18.05 release cycle. These findings support the previous analysis such as the ‘mean’ progressing from −0.12 in Feb to +0.21 in April.

Table 2. Summary statistics of release cycle

To further investigate the underlying sentiment of ‘nack’ messages, the discursive manifestations of contradictions are analysed.

Analysis of Contradictions:

Table 3 below shows that a ‘nack’ can manifest as different types of contradictions – critical conflict, conflict, and dilemma – indicating that there are subtle differences around instances of ‘nack’ that require further investigation. For example, in the following excerpt from an email message (2nd Mar), “The proposed patch is a workaround that doesn’t address the underlying issue, thus NACK unless proven otherwise:)” we start to understand why sentiment around ‘nack’ are not strongly negative. Firstly, a smiley emoji at the end of the sentence indicates that the author is not adversarial with this comment. Secondly, the author rejects the patch, but leaves it to the community to prove that this patch is still useful for solving the “underlying issue”, which implies this is a conditional ‘nack’ and the author is willing to retract it. In another excerpt (12th Apr), “So, as it is, it’s a NACK from me, but let’s work together on something better:)” a positive sentiment is displayed by the author who encourages the community to work towards a better solution, despite the rejection of the patch.

Table 3. Discursive manifestations of contradictions

5 Discussion and Implications

The study revealed that although ‘nack’ is considered by the community to be extremely negative, 7 cases of ‘self-nack’ occurred. Rather than categorise ‘self-nack’ as a critical conflict manifestation, in the context of this study it is categorised as a ‘dilemma’. The reasoning for this is that a person who contemplates a ‘self-nack’ is faced with the dilemma of being ridiculed or rewarded by their peers, depending on when and why the ‘self-nack’ is initiated. The analysis of sentiment and contradictions collectively challenge the assumptions of the open source community, namely, that the community is overly negative due to the online platform that is used to communicate feedback on patch reviews, and that all instances of ‘nack’ are really negative and considered ‘bad’. Further, from the perspective of AT, our analysis highlighted that events that are perceived as “bad” are indeed opportunities for innovation, improved dialogue within the community, and better collaboration between all stakeholders of the open source ecosystem. Also, rather than view a ‘nack’ as a waste of time, resources, and finances, it can be used as an opportunity to create events (on/offline) that can build cohesion in the open source community and contribute to the overall health and sustainability of the community.

The findings from this study have important implications for software development research in academia, industry, and the wider society.

Implications for Industry:

First, understanding the pattern of communication is important because it provides an opportunity for management and project teams to stabilise the flow of work and patch reviews during the various activities (i.e. scoping) of a release cycle. Second, sentiment and contradictions provides insight into the emotional states of software developers and holds much promise for better management of people involved in software development projects in general. Third, it is a strategic advantage for organisations involved in open source projects to understand the circumstances of a ‘nack’ in order for corrective action to be taken.

Implications for Academia:

First, as all data analysis tools have limitations, researchers need to not only assess the suitability of such tools for their research project, but also need to carefully understand the social context of the research in order to draw meaningful and actionable insights that enable organisational change. A second implication, which is related to the first, is that sentiment analysis, by itself, does not provide rich contextual data to drive organisational change (i.e. at project level). Supplementing this approach with a robust theoretical framework such as AT provides researchers with the opportunity to analyse and conceptualise complex real‐world situations where the interrelationship between communities of people (open source community), mediating tools (online forum), and a cultural‐historical setting co‐evolve (new members join or leave the open community). Third, analysing the natural language used in the mailing list, from the perspective of discursive manifestations provides rich insights into the internal dynamics of online communities, which we know are not well understood [c.f. 6].

Implications for a Sustainable Society:

As social sustainability is a key dimension of sustainability [45], the role of big data and analytics can have positive and negative implications for society [46]. Remote working is recognised as a key strategy for a sustainable society as it reduces travel, which in turn reduces carbon emissions. The tools used in this study have a meaningful role to play in enabling sustainable work practices as part of a larger suite of technologies that enable and support distributed work. Combining sentiment analysis with analysis of contradictions are useful indicators of the social well-being of individuals and teams, as well as maintaining the social structure of communities [47]. For example, these indicators can provide companies with opportunities to develop interventions that improve the quality of life and well being of its employees and their families, which in turn would reduce health care costs, as prevention is better than a cure [46].

6 Conclusion, Limitations, and Future Action

Obtaining accurate sentiment from mailing lists remains a key challenge [2] but can be mitigated by customising sentiment analysis tools for the context of the study [42]. The research demonstrates that it is feasible to extract and analyse data from mailing lists with high accuracy. We presented sentiment analysis as a mechanism for extracting (i) sentiment expressed in mailing list patch review comments, and (ii) the four types of discursive manifestations and their frequency during the release cycle. While a limitation of the study is that one release cycle is not representative of the DPDK online community, the study does present opportunities for future work in order to gain a deeper understanding of the relationship between discursive manifestations of contradictions and sentiment, as well as the propensity of individual reviewers over time. Future work will indeed focus on multiple release cycles during a full year and/or compare sentiment across multiple projects. This study highlights the importance of not only considering sentiment as quantitative values but to take into consideration the context of the sentiment values and how discourse can directly and indirectly have a positive or negative impact on people within the activity system.