1 Introduction

The essential activity in designing object-oriented programs is to identify class candidates and to assign responsibility (i.e., data and operations) to them. An appropriate solution to this Class-Responsibility-Assignment (CRA) problem, on the one hand, intuitively reflects the problem domain and, on the other hand, exhibits acceptable quality measures [4]. In this context, refactoring has become a key technique for agile software development: productive program-evolution phases are interleaved with behavior-preserving code transformations for updating CRA decisions, to proactively maintain, or even improve, code-quality metrics [13, 29]. Each refactoring pursues a trade-off between two major, and generally contradicting, objectives: (1) maximizing code-quality metrics, including fine-grained coupling/cohesion measures as well as coarse-grained anti-pattern avoidance, and (2) minimizing the number of changes to preserve the initial program design as much as possible [8]. Manual search for refactorings sufficiently meeting both objectives becomes impracticable already for medium-size programs, as it requires to find optimal sequences of interdependent code transformations with complex constraints [10]. The very large search space and multiple competing objectives make the underlying optimization problem well-suited for search-based optimization [15] for which various semi-automated approaches for recommending refactorings have been recently proposed [18, 27, 28, 30, 34].

The validity of proposed refactorings is mostly concerned with purely functional behavior preservation [24], whereas their impact on extra-functional properties like program security has received little attention so far [22]. However, applying elaborated information-flow metrics for identifying security-preserving refactorings is computationally too expensive in practice [36]. As an alternative, we consider attack-surface metrics as a sufficiently reliable, yet easy-to-compute indicator for preservation of program security [20, 41]. Attack surfaces of programs comprise all conventional ways of entering a software by users/attackers (e.g., invoking API methods or inheriting from super-classes) such that an unnecessarily large surface increases the danger of exploiting vulnerabilities. Hence, the goal of a secure program design should be to grant least privileges to class members to reduce the extent to which data and operations are exposed to the world [41]. In Java-like languages, accessibility constraints by means of modifiers public, private and protected provide a built-in low-level mechanism for controlling and restricting information flow within and across classes, sub-classes and packages [38]. Accessibility constraints introduce compile-time security barriers protecting trusted system code from untrusted mobile code [19]. As a downside, restricted accessibility privileges naturally obstruct possibilities for refactorings, as CRA updates (e.g., moving members [34]) may be either rejected by those constraints, or they require to relax accessibility privileges, thus increasing the attack surface [35].

In this paper, we present a search-based technique to find optimal sequences of refactorings for object-oriented Java-like programs, by explicitly taking accessibility constraints into account. To this end, we do not propose novel refactoring operations, but rather apply established ones and control their impact on attack-surface metrics. We focus on MoveMethod refactorings which have been proven effective for improving CRA metrics [34], in combination with operations for on-demand strengthening and relaxing of accessibility declarations [38]. As objectives, we consider (O1) elimination of design flaws, particularly, (O1a) optimization of object-oriented coupling/cohesion metrics [5, 6] and (O1b) avoidance of anti-patterns, namely The Blob, (O2) preservation of original program design (i.e., minimizing the number of change operations), and (O3) attack-surface minimization. Our model-based tool implementation, called GOBLIN, represents individuals (i.e., intermediate refactoring results) as program-model instances complying to an EMF meta-model for Java-like programs [33]. Hence, instead of regenerating source code after every single refactoring step, we apply and evaluate sequences of refactoring operations, specified as model-transformation rules in Henshin [2], on the program model. To this end, we apply MOMoT [11], a generic framework for search-based model transformations. Our experimental evaluation results gained from applying GOBLIN as well as the recent tools JDeodorant [12] and Code-Imp [27] to a collection of real-world Java programs provide us with in-depth insights into the subtle interplay between traditional code-quality metrics and attack-surface metrics. Our tool and all experiment results are available on the GitHub site of the projectFootnote 1.

Fig. 1.
figure 1

UML class diagram of MailApp

2 Background and Motivation

We first introduce a running example to provide the necessary background and to motivate the proposed refactoring methodology.

Running Example. We consider a (simplified) e-mail client, called MailApp, implemented in Java. Figure 1 shows the UML class diagram of MailApp, where security-critical extensions (in gray) will be described below. We use stereo-type \(\langle \langle \mathsf {pkg:name}\rangle \rangle \) to annotate classes with package declarations. Central class MailApp is responsible for handling objects of classes Message and Contact both encapsulating application data and operations to access those attributes. The text of a message may be formatted as plain String, or it may be converted into HTML using method plainToHtml().

Design Flaws in Object-Oriented Programs. The over-centralized architectural design of MailApp, consisting of a predominant controller class (MailApp) intensively accessing inactive data classes (Message and Contact), is frequently referred to as The Blob anti-pattern [7]. As a consequence, method plainToHtml() in class MailApp frequently calls method getPlainText() in class Message across class- and even package-boundaries. The Blob and other design flaws are widely considered harmful with respect to software quality in general and program maintainability in particular [7]. For instance, assume a developer to extend MailApp by (1) adding further classes SecureMailApp and RsaAdapter for encrypting and signing messages, and by (2) extending class Contact with public RSA key handling: method findKey() searches for public RSA keys of contacts by repeatedly calling method findKeyFromServer() with the URL of available key servers. This program evolution further decays the already flawed design of MailApp as class SecureMailApp may be considered as a second instance of The Blob anti-pattern: method encryptMessage() of class SecureMailApp intensively calls method findKey() in class Contact. This example illustrates a well-known dilemma of agile program development in an object-oriented world: Class-Responsibility Assignment decisions may become unbalanced over time, due to unforeseen changes crosscutting the initial program design [31]. As a result, a majority of object-oriented design flaws like The Blob anti-pattern is mainly caused by low cohesion/high coupling ratios within/among classes and their members [5, 6].

Refactoring of Object-Oriented Programs. Object-oriented refactorings constitute an emerging and widely used counter-measure against design flaws [13]. Refactorings impose systematic, semantic-preserving program transformations for continuously improving code-quality measures of evolving source code. For instance, the MoveMethod refactoring is frequently used to update CRA decisions after program changes, by moving method implementations between classes [34]. Applied to our example, a developer may (manually) conduct two refactorings, R1 and R2, to counteract the aforementioned design flaws:

(R1) :

move method plainToHtml() from class MailApp to class Message, and

(R2) :

move method encryptMessage() from class SecureMailApp to class Contact.

However, concerning programs of realistic size and complexity, tool support for (semi-)automated program refactorings becomes more and more inevitable. The major challenges in finding effective sequences of object-oriented refactoring operations consists in detecting flawed program parts to be refactored, as well as in recommending program transformations applied to those parts to obtain an improved, yet behaviorally equivalent program design. The complicated nature of the underlying optimization problem stems from several phenomena.

  • Very large search-space due to the combinatorial explosion resulting from the many possible sequences of (potentially interdependent) refactoring-operation applications.

  • Multiple objectives including various (inherently contradicting) refactoring goals (e.g., O1O3).

  • Many invalid solutions due to (generally very complicated) constraints to be imposed for ensuring behavior preservation.

Further research especially on the last phenomenon is required to understand to what extent a refactoring actually alters (in a potentially critical way) the original program. For instance, for refactoring R2 to yield a correct result, it requires to relax declared accessibility constraints: method encryptMessage() has to become public instead of protected after being moved into class Contact to remain accessible for method sendMessage, and, conversely, method getPrivateKey() has to become public instead of private to remain accessible for encryptMessage(). Although these small changes do not affect the functionality of the original program, it may have a negative impact on extra-functional properties like program security. Therefore, the amount of invalid solutions highly depends on the interaction between constraints and repair mechanisms.

Attack Surface of Object-Oriented Programs. The attack surface of a program comprises all conventional ways of entering a software from outside such that a larger surface increases the danger of exploiting vulnerabilities (either unintentionally by some user, or intentionally by an attacker) [20]. Concerning Java-like programs in particular, explicit restrictions of accessibility of class members provide an essential mechanism to control the attack surface. Hence, refactoring R2 should be definitely blamed as harmful as the enforced relaxations of accessibility constraints, especially those of the indeed security-critical method getPrivateKey(), unnecessarily widen the attack surface of the original program. In contrast, refactoring R1 should be appreciated as it even narrows the attack surface by setting method plainToHtml() from public to private.

Challenges. As illustrated by our example, the attack surface of a program is a crucial, but yet unexplored, factor when searching for reasonable object-oriented program refactorings. However, if not treated with special care, accessibility constraints may seriously obstruct program maintenance by eagerly suppressing any refactoring opportunity in advance. We therefore pursue a model-based methodology for automating the search for optimal sequences of program refactorings by explicitly taking accessibility constraints into account. We formulate the underlying problem as constrained multi-objective optimization problem (MOOP) incorporating explicit control and minimization of attack-surface metrics. This framework allows us to facilitate search-based model transformation capabilities for approximating optimal solutions.

3 Search-Based Program Refactorings with Attack-Surface Control

We now describe our model-based framework for identifying (presumably) optimal sequences of object-oriented refactoring operations. To explicitly control (and minimize) the impact of recommended refactorings on the attack surface, we extend an existing EMF meta-model for representing Java-like programs with accessibility information and respective constraints. Based on this model, refactoring operations are defined as model-transformation rules which allow us to apply search-based model-transformation techniques to effectively explore candidate solutions of the resulting MOOP.

3.1 Program Model

In the context of model-based program transformation, a program model serves as unified program representation (1) constituting an appropriate level of abstraction comprising only (syntactic) program entities being relevant for a given task, and (2) including additional (static semantic) information required for a given task [24]. Concerning program models for model-based object-oriented program refactorings in particular, the corresponding model-transformation operations are mostly applied at the level of classes and members, whereas more fine-grained source code details can be neglected. Instead, program elements are augmented with additional (static semantic) dependencies to other entities being crucial for refactoring operations to yield correct results [24,25,26]. Here, we employ and enhance the program model proposed by Peldszus et al. [33] for automatically detecting structural anti-patterns (cf. O1b) in Java programs. Their incremental detection process also includes evaluation of coupling and cohesion metrics (cf. O1a), and both metric values and the detected anti-patterns are added as additional information into the program model.

Fig. 2.
figure 2

Excerpt of the program-model representation of MailApp

Figure 2 shows an excerpt of the program-model representation for MailApp including the classes MailApp, Message, SecureMailApp, and Contact together with a selection of their method definitions. Each program element is represented by a white rectangle labeled with name : type. The available types of program entities and possible (syntactic and semantic) dependencies (represented by arrows) between respective program elements are defined by a program meta-model, serving as a template for valid program models [26, 37]. The program model comprises as first-class entities the classes (type TClass) together with their members as declared in the program. The representation of methods is split into signatures (type TMethodSignature) and definitions (type TMethodDefinition) to capture overloading/overriding dependencies among method declarations (e.g., overriding of method sendMessage() imposes one shared method signature, but two different method definitions). Solid arrows correspond to syntactic dependencies between program elements such as aggregation (unlabeled) and inheritance (label extends) and relations between method signatures and their definitions, whereas dashed arrows represent (static) semantic dependencies (e.g., arrows labeled with call denote caller-callee relations between methods).

Fig. 3.
figure 3

Model-transformation rule for MoveMethod refactoring

Design-Flaw Information. The program model further incorporates information gained from design-flaw detection [33], to identify program parts to be refactored. In our example, design-flaw annotations (in gray) are attached to affected program elements, namely classes Message and Contact constitute data classes and classes MailApp and SecureMailApp constitute controller classes, which lead to two instances of the anti-pattern The Blob.

Accessibility Information. To reason about the impact of refactorings on the attack surface of programs, we extend the program model of Peldszus et al. by accessibility information. Our extensions include the attribute accessibility denoting the declared accessibility of entities as shown for method definitions in Fig. 2. In addition, our model comprises package declarations of classes (type TPackage) to reason about package-dependent accessibility constraints.

3.2 Model-Based Program Refactorings

Based on the program-model representation, refactoring operations by means of semantic-preserving program transformations can be concisely formalized in a declarative manner in terms of model-transformation rules [26]. A model-transformation rule specifies a generic change pattern consisting of a left-hand side pattern to be matched in an input model for applying the rule, and a right-hand side replacing the occurrence of the left-hand side to yield an output model. Here, we focus on (sequences of) MoveMethod refactorings as it has been shown in recent research that MoveMethod refactorings are considerably effective in improving CRA measures in flawed object-oriented program designs [34]. Figure 3 shows a (simplified) rule for MoveMethod refactorings defined on our program meta-model, using a compact visual notation superimposing the left- and right-hand side. The rule takes a source class srcClass, a target class trgClass and a method signature methodSig as parameters, deletes the containment arrow between source class and signature (red arrow annotated with --) and creates a new containment arrow from the target class (green arrow annotated with ++), only if such an arrow not already exists before rule application. The latter (pre-)condition is expressed by a forbidden (crossed-out) arrow. For a comprehensive list of all necessary pre-conditions (or, pre-constraints), we refer to [38].

Accessibility Post-constraints. Besides pre-constraints, for refactoring operations to yield correct results, it must satisfy further post-constraints to be evaluated after rule application, especially concerning accessibility constraints as declared in the original program (i.e., member accesses like method calls in the original program must be preserved after refactoring [24]). As an example, a (simplified) post-constraint for the MoveMethod rule is shown on the right of Fig. 3 using OCL-like notation. Members refers to the collection of all class members in the program. The post-constraint utilizes helper-function reqAcc(m) to compute the required access modifier of class member \(\textsf {m}\) and checks whether the declared accessibility of \(\textsf {m}\) is at least as generous as required (based on the canonical ordering private < default < protected < public) [38].

For instance, if refactoring R2 is applied to MailApp, method encryptMessage() violates this post-constraint, as the call from sendMessage() from another package requires accessibility public, whereas the declared accessibility is protected. Instead of immediately rejecting refactorings like R2, we introduce an accessibility-repair operation of the form m.accessibility := reqAcc(m) for each member violating the post-constraint which therefore causes a relaxation of the attack surface. However, this repair is not always possible as relaxations may lead to incorrect refactorings altering the original program semantics (e.g., due to method overriding/overloading [38]). In contrast, refactoring R1 (i.e., moving plainToHtml() to class Message) satisfies the post-constraint as the required accessibility of plainToHtml() becomes private, whereas its declared accessibility is public. In those cases, we may also apply the operation m.accessibility := reqAcc(m), now leading to a reduction of the attack surface. Different strategies for attack-surface reduction will be investigated in Sect. 4.

3.3 Optimization Objectives

We now describe the evaluation of objectives (O1)–(O3) on the program model, to serve as fitness values in a search-based setting.

Coupling/Cohesion. Concerning (O1a), coupling and cohesion metrics are well-established quality measures for CRA decisions in object-oriented program design [4]. In our program model, coupling (COU) is related to the overall number of member accesses (e.g., call-arrows) across class boundaries [5], and for measuring cohesion, we adopt the well-known LCOM5 metric to quantify lack of cohesion among members within classes [17]. While there are other metrics which indicate good CRA decisions, such as Number of Children, these metrics are not modifiable using MoveMethod refactorings and are therefore not used in this paper [9]. Consequently, good CRA decisions exhibit low values for both COU and LCOM5. Hence, refactorings R1 and R2 both improve values of COU (i.e., by eliminating inter-class call-arrows) and LCOM5 (i.e., by moving methods into classes where they are called).

Anti-patterns. Concerning (O1b), we limit our considerations to occurrences of The Blob anti-pattern for convenience. We employ the detection-approach of Peldszus et al. [33] and consider as objective to minimize the number of The Blob instances (denoted #BLOB). For instance, for the original MailApp program (white parts in Fig. 1), we have #BLOB \(=1\), while for the extended version (white and gray parts), we have #BLOB \(=2\). Refactoring R1 may help to remove the first occurrence and R2 potentially removes the second one.

Changes. Concerning (O2), real-life studies show that refactoring recommendations to be accepted by users must avoid a too large deviation from the original design [8]. Here, we consider the number of MoveMethod refactorings (denoted #REF) to be performed in a recommendation, as a further objective to be minimized. For example, solely applying R1 results in #REF \(=1\), whereas a sequence of R1 followed by R2 most likely imposes more design changes (i.e., #REF \(=2\)). In contrast, accessibility-repair operations do not affect the value #REF, but rather impact objective (O3).

Attack Surface. Concerning (O3), the guidelines for secure object-oriented programming encourages developers to grant as least access privileges as possible to any accessible program element to minimize the attack surface [19]. In our program model, the attack-surface metric (denoted AS) is measured as

$$\begin{aligned} {\mathbf {AS} = \sum \nolimits _{m \in {\textsf {Members}}}} \omega (m.accessibility), \end{aligned}$$
(1)

where weighting function \(\omega : \textit{Mod} \rightarrow \mathbb {N}_0\) on the set Mod of accessibility modifiers may be, for instance, defined as \(\omega (\texttt {private}) = 0\), \(\omega (\texttt {default}) = 1\), \(\omega (\texttt {protected})\) \( = 2\), \(\omega (\textsf {public}) = 3\). Hence, a lower value corresponds to a smaller attack surface. For example, R1 enables an attack-surface reduction by setting plainToHtml() from public to private which decreases AS by 3. In contrast, R2 involves a repair step setting encryptMessage() from protected to public which increases AS by 1. Whether such negative impacts of refactorings on (O3) are outweighed by simultaneous improvements gained for other objectives depends, among others, on the actual weighting \(\omega \) applied. For instance, each further modifier public considerably opens the attack surface and should therefore be blamed by a higher weighting value, as compared to the other modifiers (cf. Sect. 4).

3.4 Search-Based Optimization Process

Our tool for recommending optimized object-oriented refactoring sequences, called GOBLINFootnote 2, is based on a combination of search-based multi-objective optimization techniques using genetic algorithms and model-transformations on the basis of the MOMoT framework [11]. Figure 4 shows an overview on GOBLIN. First, the input Java program is translated into our program model [33]. This original program model together with its objective values for (O1)(O3) (i.e., its fitness values) serves as a baseline for evaluating the improvements obtained by candidate refactorings. The built-in genetic algorithm (NSGA-III) of MOMoT is initialized by an initial population of a fixed number of individuals serving as generation 0, where each individual constitutes a sequence of at least 1 up to a maximum number of MoveMethod rule applications (cf. Fig. 3) to the original program model. Thus, each individual corresponds to a refactored version of the original program model on which the resulting fitness values are evaluated. The refactored program model is obtained by applying the given sequence of refactorings to the original program model. Steps within a sequence not being applicable to an intermediate model (e.g., due to unsatisfied pre-conditions) are skipped, whereas steps producing infeasible results (e.g., due to unsatisfied and non-repairable post-conditions) cause the entire individual to become invalid (thus being removed from the population).

Fig. 4.
figure 4

Architecture of the GOBLIN tool

For deriving generation \(i+1\) from generation i, NSGA-III first creates a set of new individuals using random crossover and mutation operators. As indicated in Fig. 4, a crossover splits and recombines two individuals into a new one, while a mutation generates a new individual by injecting small changes into an existing one. Afterwards, in the selection phase, individuals from the overall population (the original and newly created individuals) are selected into the next generation, depending on their fitness values. For more details on NSGA-III, we refer to [15, 28]. The search-process terminates when a maximum number of generations (or, individuals, respectively) has been reached, resulting in a Pareto-front of non-dominated individuals, each constituting a refactoring recommendation [11].

4 Experimental Evaluation

We now present experimental evaluation results gained from applying GOBLIN to a collection of Java programs. First, to investigate the impact of attack-surface reduction on the resulting refactoring recommendations, we consider the following reduction strategies, differing in when to perform attack-surface reduction during search-space exploration (where step means a refactoring step):

  • Strategy 1: A priori reduction. Before the first and after the last step.

  • Strategy 2: A posteriori reduction. Only after the last step.

  • Strategy 3: Continuous reduction. After every refactoring step.

We are interested in the impact of each strategy on the trade-off between attack-surface metrics and design-quality metrics (i.e., do the recommended refactoring sequences tend to optimize more the attack surface aspect or the program design?). We quantify attack-surface impact (ASI) and design impact (DI) of a refactoring recommendation rr as follows:

$$\begin{aligned} {\mathbf {ASI}}({\mathsf {rr}}) = \frac{ {{\mathbf {\mathsf{{AS}}}}({\mathsf {rr}})} - {{\mathbf {\mathsf{{AS}}}}({\mathsf {orig}})}}{{{\mathbf {\mathsf{{AS}}}}({\mathsf {orig}})}} \end{aligned}$$
(2)
$$\begin{aligned} {\mathbf {DI}}({\mathsf {rr}}) = \frac{ {{\mathbf {\mathsf{{COU}}}}({\mathsf {rr}})} - {{\mathbf {\mathsf{{COU}}}}({\mathsf {orig}})}}{{{\mathbf {\mathsf{{COU}}}}({\mathsf {orig}})}} + \frac{ {{\mathbf {\mathsf{{LCOM5}}}}({\mathsf {rr}})} - {{\mathbf {\mathsf{{LCOM5}}}}({\mathsf {orig}})}}{{{\mathbf {\mathsf{{LCOM5}}}}({\mathsf {orig}})}} \end{aligned}$$
(3)

where orig refers to the original program. Second, we consider the impact of different weightings \(\omega \) on attack-surface metric AS. As modifier public has a considerably negative influence on the attack surface, we study the impact of increasing the penalty for public in \(\omega \), as compared to the other modifiers. We are interested especially in whether there exists a threshold for which any design-improving refactoring would be rejected as security-critical. Finally, we compare GOBLIN to the recent refactoring tools JDeodorant and CODe-Imp, which both do not explicitly consider attack-surface metrics as optimization objective so far. To summarize, we aim to answer the following research questions:

  • (RQ1: Objective Trade-Off) Which attack-surface reduction strategy offers the best trade-off between attack-surface impact and design impact when taking the original program as a baseline?

  • (RQ2: Weighting of Attack Surface) Which weighting of public in the attack-surface metric constitutes a critical threshold obstructing any design-improving refactorings?

  • (RQ3: Tool Comparison) Which tool provides the best trade-off between attack-surface impact and design impact in refactoring recommendations?

Fig. 5.
figure 5

Minimal ASI and DI values for different weightings of public

Table 1. Evaluation corpus

4.1 Experiment Setup and Results

We conducted our experiments on an established corpus of real-life open-source Java programs of various size [33, 39] as listed in Table 1 (with lines of code LOC, number of packages \(\#P\), number of classes \(\#C\) and number of methods \(\#M\)). For a compact presentation, we divide the corpus into three program-size categories (small, mid-sized, large), indicated by horizontal lines in Table 1. All experiments have been executed on a Windows-Server-2016 machine with a 2.4 GHz quad-core CPU, 32 GB RAM and JRE 1.8. We used the default genetic-algorithm configuration of MOMoT in all our experiments [11]: termination after 10,000 individual evaluations, population size of 100, and each individual consisting of at most 10 refactorings. We applied the metrics for (O1)(O3) (cf. Sect. 3.3) to compute fitness values. GOBLIN requires 25 min to compute a set of refactoring recommendations for the smallest program, up to several hours in the case of large programs, which is acceptable for a search-based (off-line) optimization approach. We selected a representative set of computed recommendations which were manually checked for program correctness and impact.

Fig. 6.
figure 6

Measurement results

For (RQ1), we measured ASI and DI values for two runs of GOBLIN (cf. Figs. 6a, b, c, d, e and f). Figures 6a and b (first row, side by side) show a box-plot for each Strategy (1−3) for small programs of our corpus (\(\#iSj\) referring to the program number i in Table 1 and Strategy j). The box-plots show the distribution of ASI (Fig. 6a) and DI (Fig. 6b) values for each refactoring recommendation of GOBLIN. The figure-pairs 6c−6d and 6e−6f show the same data for mid-sized and large programs, respectively. For (RQ2), we used Strategy 3 from (RQ2) and varied function \(\omega \) to study different penalties for modifier public. Figure 5 plots the (minimal) values of ASI and DI depending on \(\omega (\textsf {public})\) (from 3 up to 100). Regarding (RQ3), we compare the results of GOBLIN to those of state-of-the-art refactoring recommender tools, JDeodorant [12] and CODe-Imp [27]. Refactorings proposed by JDeodorant have as singleton optimization objective to eliminate specific anti-patterns through heuristic refactoring strategies. In particular, JDeodorant employs ExtractClass [13] to eliminate The Blob (also called GodClass), by separating parts from the controller-class into a freshly created class. Thus, each recommendation of JDeodorant subsumes multiple MoveMethod refactorings (into the fresh target class). In contrast, CODe-Imp pursues a search-based approach, including a variety of refactoring operations and design-quality metrics. For a comparison to GOBLIN, we used the MoveMethod refactoring of CODe-Imp which produces one sequence of MoveMethod refactorings per run. Figures 6g and h contain comparisons of ASI and DI values, respectively, for our corpus (excluding QuickUML due to relatively very high variations). For each program, the upper box-plot shows the results for GOBLIN and the lower one for JDeodorant, respectively. CODe-Imp only successfully produced results for QuickUML and JUnit (10 runs each) while terminating without any result for the others.

4.2 Discussion

Concerning (RQ1), Strategy 3 leads to the best attack-surface impact for small programs (under neglectible execution-time overhead), while even slightly improving the design impact. Although this clear advantage dissolves for mid-sized and large programs, it still contributes to a reasonable trade-off, while attack-surface reductions tend to hamper design improvements as expected. Calculating the Pearson correlation [32] between ASI and DI shows that (1) the strategy does not influence the correlation and (2) for small programs, GOBLIN finds refactorings which are beneficial for both attack surface and program design.

Concerning (RQ2), Fig. 5 shows that a higher value for \(\omega (\textsf {public})\) leads to a better attack-surface impact, as attack-surface-critical refactorings are less likely to survive throughout generations. The increase in ASI is remarkably steep from \(\omega (\textsf {public}) = 3\) to \(\omega (\textsf {public}) = 7\), but exhibits slow linear growth for higher values. Regarding the design impact, up to \(\omega (\textsf {public}) = 10\), the best achieved DI also grows linearly, but afterwards, no more DI improvements emerge. In higher value ranges (>70), DI reaches a threshold, and degrades afterwards.

Regarding (RQ3), the The Blob elimination strategy of JDeodorant necessarily increases attack surfaces, as calls to extracted methods have to access the new class, thus necessarily increasing accessibility at least up to default. As also shown in Fig. 6g, there are almost no refactorings proposed by JDeodorant with a positive attack-surface impact. Surprisingly, JDeodorant also achieves a less beneficial design impact than GOBLIN, with a strong correlation between ASI and DI. Our unfortunately very limited set of observations for CODe-Imp shows that, due to the similar search technique, the refactorings found by CODe-Imp and GOBLIN are quite similar. Nevertheless, due to the different focus of objectives, CODe-Imp tends to increase attack surfaces. Although, the differences in metrics definitions forbid any definite conclusions, however, CODe-Imp does not achieve any design improvements according to our metrics.

To summarize, our experimental results demonstrate that attack-surface impacts of refactorings clearly deserve more attention in the context of refactoring recommendations, revealing a practically relevant trade-off (or, even contradiction) between traditional design-improvement efforts and extra-functional (particularly, security) aspects. Our experiments further uncover that existing tools are mostly unaware of attack-surface impacts of recommended refactorings.

5 Related Work

Automating Design-Flaw Detection and Refactorings. Marinescu proposes a metric-based design-flaw detection approach similar to Peldszus et al. in [33], which is used in our work. However, both works do not deal with elimination of detected flaws [21]. In contrast, the DECOR framework also includes recommendations for eliminating anti-patterns, whereas, in contrast to our work, those recommendations remain rather atomic and local. More related to our approach, Fokaefs et al. [12] and Tsantalis et al. [40] consider (semi-)automatic refactorings to eliminate anti-patterns like The Blob in the tool JDeodorant. Nevertheless, they focus on optimizing one single objective and do not consider multiple, esp. extra-functional, aspects like security metrics as in our approach.

Multi-objective Search-Based Refactorings. O’Keeffe and Ó Cinnéide use search-based refactorings in their tool CODe-Imp [28] including various standard refactoring operations and different quality metrics as objectives [27]. Seng et al. consider a search-based setting, where, similar to our approach, compound refactoring recommendations comprise atomic MoveMethod operations. Harman and Tratt also investigate a Pareto-front of refactoring recommendations including various design objectives [16], and more recently, Ouni et al. conducted a large-scale real-world study on multi-objective search-based refactoring recommendations [30]. However, neither of the approaches investigates the impact of refactorings on security-relevant metrics as in our approach.

Security-Aware Refactorings. Steimann and Thies were the first to propose a comprehensive set of accessibility constraints for refactorings covering full Java [38]. Although their constraints are formally founded, they do not consider software metrics to quantify the attack surface impact of (sequences of) refactorings. Alshammari et al. propose an extensive catalogue of software metrics for evaluating the impact of refactorings on program security of object-oriented programs [1]. Similarly, Maruyama and Omori propose a technique [22] and tool [23] for checking if a refactoring operation raises security issues. However, all these approaches are concerned with security and accessibility constraints of specific refactorings, but they do not investigate those aspects in a multi-objective program optimization setting. The problem of measuring attack surfaces serving as a metric for evaluating secure object-oriented programming policies has been investigated by Zoller and Schmolitzky [41] and Manadhata and Wing [20], respectively. Nevertheless, those and similar metrics have not yet been utilized as optimization objective for program refactoring. Finally, Ghaith and Ó Cinnéide consider a catalogue of security-relevant metrics to recommend refactorings using CODe-Imp, but they also consider security as single objective [14].

6 Conclusion

We presented a search-based approach to recommend sequences of refactorings for object-oriented Java-like programs by taking the attack surface as additional optimization objective into account. Our model-based methodology, implemented in the tool GOBLIN, utilizes the MOMoT framework including the genetic algorithm NSGA-III for search-space exploration. Our experimental results gained from applying GOBLIN to real-world Java programs provides us with detailed insights into the impact of attack-surface metrics on fitness values of refactorings and the resulting trade-off with competing design-quality objectives. As a future work, we plan to incorporate additional domain knowledge about critical code parts to further control security-aware refactorings.