Transcompiling Firewalls
Abstract
Porting a policy from a firewall system to another is a difficult and error prone task. Indeed, network administrators have to know in detail the policy meaning, as well as the internals of the firewall systems and of their languages. Equally difficult is policy maintenance and refactoring, e.g., removing useless or redundant rules. In this paper, we present a transcompiling pipeline that automatically tackles both problems: it can be used to port a policy into an equivalent one, when the target firewall language is different from the source one; when the two languages coincide, transcompiling supports policy maintenance and refactoring. Our transcompiler and its correctness are based on a formal intermediate firewall language that we endow with a formal semantics.
1 Introduction
Firewalls are one of the standard mechanisms for protecting computer networks. Configuring and maintaining them is very difficult also for expert system administrators since firewall policy languages are varied and usually rather complex, they account for lowlevel system and network details and support non trivial control flow constructs. Additional difficulties come from the way in which packets are processed by the network stack of the operating system and further issues are due to Network Address Translation (NAT), the mechanism for translating addresses and performing port redirection while packets traverse the firewall.
A configuration is typically composed of a large number of rules and it is often hard to figure out the overall firewall behavior. Also, firewall rules interact with each other, e.g., some shadow others making them redundant or preventing them to be triggered. Often administrators resort to policy refactoring to solve these issues and to obtain minimal and clean configurations. Software Defined Network (SDN) paradigm has recently been proposed for programming the network as a whole at a high level, making network and firewall configuration simpler and less error prone. However, network administrators have still to face the porting of firewall configurations from a variety of legacy devices into this new paradigm.
 1.
decompile the policy in the source language into an intermediate language;
 2.
extract the meaning of the policy as a set of non overlapping declarative rules describing the accepted packets and their translations in logical terms;
 3.
compile the declarative rules into the target language.
Another key contribution of this paper is to formalize this pipeline and to prove that it preserves the meaning of the original policy (Theorems 1, 2 and 3). The core of our proposal is the intermediate language IFCL (Sect. 4), which offers all the typical features of firewall languages such as NAT, jumps, invocations to rulesets and stateful packet filtering. This language unveils the bipartite structure common to real firewall languages: the rulesets determining the destiny of packets and the control flow in which the rules are applied. The relevant aspects of IFCL are its independence from specific firewall systems and their languages, and its formal semantics (Sect. 5). Remarkably, stage 1 provides real languages, which usually have no formal semantics, with the one inherited by the decompilation to IFCL. In this way the meaning of a policy is formally defined, so allowing algorithmic manipulations that yield the rules of stage 2 (Sect. 6). These rules represent minimal configurations in a declarative way, covering all accepted packets and their transformations, with neither overlapping nor shadowing rules. These two stages are implemented in a tool appearing in a companion paper [1] and surveyed below, in the section on related work. The translation algorithm of stage 3 (Sect. 7) distributes the rules determined in the previous stage on the relevant points of the firewall where it decides the destiny of packets.
To show our transcompilation at work, we consider iptables [2] and pf [3] (Sect. 2), since they have very different packet processing schemes making policy porting hard. In particular, we apply the stages of our pipeline to port a policy from iptables to pf (Sect. 3). For brevity, we do not include an example of refactoring, which occurs when the source and the target languages coincide.
Related Work. Formal methods have been used to model firewalls and access control, e.g., [4, 5, 6]. Below we restrict our attention to languagebased approaches.
Transcompilation is a wellestablished technique to address the problem of code refactoring, automatic parallelization and porting legacy code to a new programming language. Recently, this technique has been largely used in the field of web programming to implement high level languages into JavaScript, see e.g., [7, 8]. We tackle transcompilation in the area of firewall languages to support porting and refactoring of policies.
To the best of our knowledge, the literature has no approaches to mechanically porting firewall policies, while it has some to refactoring. The proposal in [9] is similar to ours, in that it “cleans” rulesets, then analyzes them by an automatic tool. It uses a formal semantics of iptables (without NAT) and a semanticspreserving ruleset simplification. The tool FIREMAN [10] detects inconsistencies and inefficiencies of firewall policies (without NAT). The Margrave policy analyzer [11] analyzes IOS firewalls, and is extensible to other languages. However the analysis focuses on finding specific problems in policies rather then synthesizing a highlevel policy specification. Another tool for discovering anomalies is Fang [12, 13], which also synthesizes an abstract policy. Our approach differs from the above proposals mainly because at the same time it (i) is languageindependent; (ii) defines a formal semantics of firewall behavior; (iii) gives a declarative, concise and neat representation of such a behavior; (iv) supports NAT; (v) generates policies in a target language.
Among the papers that formalize the semantics of firewall languages, we mention [14, 15] that specify abstract filtering policies to be then compiled into the actual firewall systems. More generally, NetKat [16] proposes linguistic constructs for programming a network as a whole within the SDN paradigm. All these approaches propose their own high level language with a formal semantics, and then compile it to a specific target language (cf. our stage 3). Instead, IFCL intermediates between real source and target languages. It thus takes from real languages actions both for filtering/rewriting packets (notably NAT and MARK) and for controlling the inspection flow, widely used in practice.
Our companion paper [1] describes the design of an automated tool and its application to real cases. The tool implements the first two stages of our pipeline and supports system administrators in the verification of some properties of a given firewall policy. In particular, the user can ask queries to check implication, equivalence and difference of policies, and reachability among hosts. The tool uses the same syntax of Sect. 4 but only sketches how to obtain the declarative representation of a given policy, while here we fully formalize the process and prove it correct (Sect. 6.2). In detail, the present paper partially overlaps with [1] on Sect. 4, where the language is presented, and on Sect. 6.2, where the logical characterization is introduced. Besides the technical details and theorems, which support the semantics and the correctness of the whole approach missing in [1], here we also address the issue of compiling the declarative firewall representation to a target language, enabling transcompilation (cf. Sects. 3 and 7).
2 Background
Usually, system administrators classify networks into security domains. Through firewalls they monitor the traffic and enforce a predetermined set of access control policies (packet filtering), possibly performing some network address translation.
Firewalls are implemented either as proprietary, special devices, or as software tools running on general purpose operating systems. Independently of their actual implementations, they are usually characterized by a set of rules that determine which packets reach the different subnetworks and hosts, and how they are modified or translated. We briefly review iptables [2] and pf [3] that are two of the most used firewall tools in Linux and Unix.
\({{\texttt {\textit{iptables.}}}}\) It is the default in Linux distributions, and operates on top of Netfilter, the standard framework for packets processing of the Linux kernel [2]. This tool is based on the notions of tables and chains. Intuitively, a table is a collection of ordered lists of policy rules called chains. The most commonly used tables are: filter for packet filtering; nat for network address translation; mangle for packet alteration. There are five builtin chains that are inspected at specific moments of the packet life cycle [17]: PreRouting, when the packet reaches the host; Forward, when the packet is routed through the host; PostRouting, right before the packet leaves the host; Input, when the packet is routed to the host; Output, when the packet is generated by the host. Moreover, users can define additional chains, besides the builtin ones.
Each rule specifies a condition and a target. If the packet matches the condition then it is processed according to the specified target. The most common targets are: ACCEPT and DROP, to accept and discard packets; DNAT/SNAT, to perform destination/source NAT; MARK to mark a packet with a numeric identifier which can be used in the conditions of other rules, even placed in different chains; RETURN, to stop examining the current chain and resume the processing of a previous chain. When the target is a userdefined chain, two “jumping” modes are available: call and goto. They differ when a RETURN is executed or the end of the chain is reached: the evaluation resumes from the rule following the last matched call. Builtin chains have a userconfigurable default policy (ACCEPT or DROP): if the evaluation reaches the end of a builtin chain without matches, its default policy is applied.
\({{\texttt {\textit{pf.}}}}\) This is the standard firewall of OpenBSD [3] and is included in macOS since version 10.7. Similarly to iptables, each rule consists of a predicate which is used to select packets and an action that specifies how to process the packets satisfying the predicate. The most frequently used actions are pass and block to accept and reject packets, rdr and nat to perform destination and source NAT. Packet marking is supported also by pf: if a rule containing the tag keyword is applied, the packet is marked with the specified identifier and then processed according to the rule’s action.
Differently from other firewalls, the action taken on a packet is determined by the last matched rule, unless otherwise specified. pf has a single ruleset that is inspected both when the packet enters and exits the host. When a packet enters the host, DNAT rules are examined first and filtering is performed after the address translation. Similarly when a packet leaves the host: first its source address is translated by the relevant SNAT rules, and then the resulting packet is possibly filtered. Notice also that packets belonging to established connections are accepted by default, thus bypassing the filters.
3 Porting a Policy: An Example
Declarative representation of the configuration in Fig. 2.
4 The Intermediate Firewall Configuration Language
We now present our intermediate firewall configuration language (IFCL). It is parametric w.r.t. the notion of state and the steps performed to elaborate packets. For generality, we do not detail the format of network packets. In the following we only use \(sa(p)\) and \(da(p)\) to denote the source and destination addresses of a given packet p; additionally, tag(p) returns the tag m associated with p. An address a consists of an IP address ip(a) and possibly a port port(a). An address range n is a pair consisting of a set of IP addresses and a set of ports, denoted \( IP (n)\):\( port (n)\). An address a is in the range n (written \(a \in n\)) if \(ip(a) \in ip(n)\) and \(port(a) \in port(n)\), when port(a) is defined, e.g., for ICMP packets we only check if the IP address is in the range.
Firewalls modify packets, e.g., through network address translations. We write \(p[da \mapsto a]\) and \(p[sa \mapsto a]\) to denote a packet identical to p, except for the destination address da and source address sa, which is equal to a, respectively. Similarly, \(p[tag \mapsto m]\) denotes the packet with a modified tag m.
Here we consider stateful firewalls that keep track of the state s of network connections and use this information to process a packet. Any existing network connection can be described by several protocolspecific properties, e.g., source and destination addresses or ports, and by the translations to apply. In this way, filtering and translation decisions are not only based on administratordefined rules, but also on the information built by previous packets belonging to the same connection. We omit a precise definition of a state, but we assume that it tracks at least the source and destination ranges, NAT operations and the state of the connection, i.e., established or not. When receiving a packet p one may check whether it matches the state s or not. We left unspecified the match between a packet and the state because it depends on the actual shape of the state. When the match succeeds, we write \(p \vdash _{s} \alpha \), where \(\alpha \) describes the actions to be carried on p; otherwise we write \({p} \not \vdash _{s}\).
Definition 1
(Firewall rule). A firewall rule r is a pair \((\phi , t)\) where \(\phi \) is a logical formula over a packet, and t is the target action of the rule.
A packet p matches a rule r with target t whenever \(\phi \) holds.
Definition 2
(Rule match). Given a rule \(r =(\phi ,t)\) we say that p matches r with target t, denoted \(p\,{\models _r}{\,t}\), iff \(\phi (p)\). We write \(p\,{\not \models _{r}}{}\) when p does not match r.
We can now define how a packet is processed given a possibly empty list of rules (denoted with \(\epsilon \)), hereafter called ruleset. Similarly to real implementations of firewalls, we inspect the rules in the list, one after the other, until we find a matching one, which establishes the destiny (or target) of the packet. For sanity, we assume that no GOTO(R) and CALL(R) occur in the ruleset R, so avoiding selfloops. We also assume that rulesets may have a default target denoted by \(t_d \in \{ {\texttt {{ACCEPT}}}, {\texttt {{DROP}}}\}\), which accepts or drops according to the will of the system administrator.
Definition 3
In our model we do not explicitly specify the steps performed by the kernel of the operating system to process a single packet passing through the host. We represent this algorithm through a control diagram, i.e., a graph where nodes represent different processing steps and the arcs determine the sequence of steps. The arcs are labeled with a predicate describing the requirements a packet has to meet in order to pass to the next processing phase. Therefore, they are not finite state auomata. We assume that control diagrams are deterministic, i.e., that every pair of arcs leaving the same node has mutually exclusive predicates. For generality, we let these predicates abstract, since they depend on the specific firewall.
Definition 4

Q is the set of nodes;

\(A \subseteq Q \times \varPsi \times Q\) is the set of arcs, such that whenever \((q,\psi ,q'), (q,\psi ',q'') \in A\) and \(q' \ne q''\) then \( \lnot (\psi \wedge \psi ')\);

\(q_i\),\(q_f \in Q\) are special nodes denoting the start and the end of elaboration.
Definition 5
We can now define a firewall in IFCL.
Definition 6
(Firewall). A firewall \(\mathcal {F}\) is a triple \((\mathcal {C}, \rho , c)\), where \(\mathcal {C}\) is a control diagram; \(\rho \) is a set of rulesets; and \(c:Q \mapsto \rho \) is the correspondence mapping from the nodes of \(\mathcal {C}\) to the actual rulesets.
4.1 Decompiling Two Real Languages into IFCL
Here we encode the two de facto standard Unix firewalls iptables and pf as triples \((\mathcal {C}, \rho , c)\) of our framework (stage 1). An immediate fallout is a formal semantics for both iptables and pf defined in terms of that of IFCL (see Sect. 5).
Modelling pf. Differently from iptables, pf has a single ruleset and the rule applied to a packet is the last one matched, apart from the case of the socalled quick rules: as soon as one of these rules matches the packet, its action is applied and the remaining part of the ruleset is skipped.

\(R_{dnat}\) contains the rule \((state == 1, {\texttt {{CHECK\hbox {}STATE}}}({\rightarrow }))\) as the first one, followed by all the rules rdr of \(R_{\texttt {pf}}\);

\(R_{snat}\) contains the rule \((state == 1, {\texttt {{CHECK\hbox {}STATE}}}({\leftarrow }))\) as the first one, followed by all the rules nat of \(R_{\texttt {pf}}\);

\(R_{finp}\) contains the rule \((state == 1, {\texttt {{ACCEPT}}})\) followed by all the quick filtering rules of \(R_{\texttt {pf}}\) without modifier out, and finally the rule \((true, {{} \texttt {GOTO(}R_{finpr}{} \texttt {)}});\)

\(R_{finpr}\) contains all the no quick filtering rules of \(R_{\texttt {pf}}\) without modifier out, in reverse order;

\(R_{fout}\) contains the rule \((state == 1, {\texttt {{ACCEPT}}})\) followed by all the quick filtering rules of \(R_{\texttt {pf}}\) without modifier in, and \((true, {{} \texttt {GOTO(}{R_{foutr}}{} \texttt {)}})\) as last rule;

\(R_{ foutr }\) includes all the no quick filtering rules of \(R_{\texttt {pf}}\) without modifier in in reverse order.
5 Formal Semantics
Now, we formally define the semantics of a firewall through two transition systems operating in a masterslave fashion. The master has a labeled transition relation of the form \(s \xrightarrow {p, p'} s'\). The intuition is that the state s of a firewall changes to \(s'\) when a new packet p reaches the host and becomes \(p'\).
The configurations of the slave transition system are triples (q, s, p) where: (i) \(q \in Q\) is a control diagram node; (ii) s is the state of the firewall; (iv) p is the packet. A transition \({(q,s,p) \rightarrow (q',s,p')}\) describes how a firewall in a state s deals with a packet p and possibly transforms it in \(p'\), according to the control diagram \({\mathcal C}\). Recall that the state records established connections and other kinds of information that are updated after the transition.
Recall that actions \({\texttt {{CALL({R})}}}\), \({\texttt {{RETURN}}}\) and \({\texttt {{GOTO({R})}}}\) are similar to procedure calls, returns and jumps in imperative programming languages. To correctly deal with them, our predicate \({p},s \models ^{S}_{R} (t,p')\) uses a stack S to implement a behavior similar to the one of procedure calls. We will denote with \(\epsilon \) the empty stack and with \(\cdot \) the concatenation of elements on the stack. This stack is also used to detect and prevent loops in ruleset invocation, as it is the case in real firewalls.
Furthermore, in the definition of \({p},s \models ^{S}_{R} (t,p') \) we use the notation \(R_k\) to indicate the ruleset \([r_k, ..., r_n]\) (\(k \in [1,n]\)) resulting from dropping the first \(k1\) rules from the given ruleset \(R = [r_1, ..., r_n]\).
We also assume the function establ that, taken an action \(\alpha \) from the state, a packet p and the fields \({X \in \{\leftarrow ,\rightarrow , \leftrightarrow \}}\) to rewrite, returns a possibly changed packet \(p'\), e.g., in case of an established connection. Also this function depends on the specific firewall we are modeling, and so it is left unspecified.
The predicate \({p},s \models ^{S}_{R} (t,p')\).
Table 2 shows the rules defining \({p},s \models ^{S}_{R} (t,p')\). The first inference rule deals with the case when the packet p matches a rule that says \({\texttt {{ACCEPT}}}\) or \({\texttt {{DROP}}}\); in this case the ruleset execution stops returning the found action and leaving p unmodified. When a packet p matches a rule with action \({\texttt {{CHECK\hbox {}STATE}}}\), we query the state s: if p belongs to an established connection, we return \({\texttt {{ACCEPT}}}\) and a \(p'\) obtained rewriting p. If p belongs to no existent connection the packet is matched against the remaining rules in the ruleset. When a packet p matches a \(\texttt {{NAT}}\) rule, we return \({\texttt {{ACCEPT}}}\) and the packet resulting by the invocation of the function nat. There are two cases if a packet p matches a \({\texttt {{GOTO({\_})}}}\). If the ruleset \(R'\) is not already in the stack, we push the current ruleset R onto the stack overlined to record that this ruleset dictated a \({\texttt {{GOTO({\_})}}}\). Otherwise, if \(R'\) is in the stack, we detect a loop and discard p. The case when a packet p matches a rule with action \({\texttt {{CALL({\_})}}}\) is similar, except that the ruleset pushed on the stack is not overlined. When a packet p matches a rule with action \({\texttt {{RETURN}}}\), we pop the stack and match p against the top of the stack. Finally, when no rule matches, an implicit return occurs: we continue from the top of the stack, if non empty. The MARK rule simply changes the tag of the matching packet to the value m. If none of the above applies, we return the default action \(t_d\) of the current ruleset.
Example 1
6 From Operational to Declarative Descriptions
We now extract the meaning of a firewall written in our intermediate language by transforming it in a declarative, logical presentation that preserves the semantics (stage 2). This transformation is done in three steps: (i) generate an unfolded firewall with a single ruleset for each node of the control diagram; (ii)transform the unfolded firewall in a firstorder formula; (iii)determine a model for the obtained formula, through a SAT solver (the procedure for this step is described in [1] and is omitted here). The correctness of stage 2 follows from Theorem 1, which guarantees that the unfolded firewall is semantically equivalent to the original one, and from Theorem 2, which ensures that the derived formula characterizes exactly the accepted packets and their translations.
6.1 Unfolding Chains
Our intermediate language can deal with involved control flows, by using the targets \({\texttt {{GOTO({\_})}}}\), \({\texttt {{CALL({\_})}}}\) and \({\texttt {{RETURN}}}\) (see Example 1). The following unfolding operation \([\![\_ ]\!]\) rewrites a ruleset into an equivalent one with no control flow rules.
Hereafter, let r; R be a non empty ruleset consisting of a rule r followed by a possibly empty ruleset R; and let \(R_1 @ R_2\) be the concatenation of \(R_1\) and \(R_2\).
In the case of a return rule \((\phi , \texttt {{\texttt {{RETURN}}}})\) we generate no new rule, and we continue to recursively analyze the rest of the ruleset, by updating f with the negation of \(\phi \). For the rule Open image in new window we have two cases: if the callee ruleset \(R'\) is not in I, we replace the rule with the unfolding of \(R'\) with \(f \wedge \phi \) as predicate, and append \(\{R'\}\) to the traversed rulesets. If \(R'\) is already in I, i.e., we have a loop, we replace the rule with a \({\texttt {{DROP}}}\), with \(f \wedge \phi \) as predicate. In both cases, we continue unfolding the rest of the ruleset. We deal with the rule Open image in new window as the previous one, except that the rest of the ruleset has \(f \wedge \lnot \phi \) as predicate.
Example 2
An unfolded firewall is obtained by repeatedly rewriting the rulesets associated with the nodes of its control diagram, using the procedure above. Formally,
Definition 7
(Unfolded firewall). Given a firewall \(\mathcal {F} = (\mathcal {C}, \rho , c)\), its unfolded version \([\![\mathcal {F} ]\!]\) is \((\mathcal {C}, \rho ', c')\) where \(\forall q\in \mathcal {C}.\, c'(q) = [\![c(q) ]\!]\) and \(\rho ' = \{[\![c(q) ]\!] \mid q \in \mathcal {C} \}\).
We now prove that a firewall \(\mathcal {F}\) and its unfolded version \([\![\mathcal {F} ]\!]\) are semantically equivalent, i.e., they perform the same action over a given packet p in a state s, and reach the same state \(s'\). Formally, the following theorem holds:
Theorem 1
Translation of rulesets into logical predicates.
6.2 Logical Characterization of Firewalls
We construct a logical predicate that characterizes all the packets accepted by an unfolded ruleset, together with the relevant translations.
Given an unfolded ruleset R, we build the predicate \(P_{R}(p, \tilde{p})\) that holds when the packet p is accepted as \(\tilde{p}\) by R. Its definition is in Table 3 that induces on the rules in R. Intuitively, the empty ruleset applies the default policy dp(R) and does not transform the packet, encoded by the constraint \(p = \tilde{p}\). The rule \( (\phi , \texttt {{\texttt {{ACCEPT}}}})\) considers two cases: when \(\phi (p)\) holds and the packet is accepted as it is; when instead \(\lnot \phi (p)\) holds, p is accepted as \(\tilde{p}\) only if the continuation R accepts it. The rule \((\phi , \texttt {{\texttt {{DROP}}}})\) accepts p only if the continuation does and \(\phi (p)\) does not hold. The rule \((\phi , \texttt {{NAT}}(d_n,s_n))\) is like an \( (\phi , \texttt {{\texttt {{ACCEPT}}}})\): the difference is when \(\phi (p)\) holds, and it gives \(\tilde{p}\) by applying to p the NAT translations \(tr(p, d_n, s_n, \leftrightarrow )\). Finally, \((\phi , {\texttt {{CHECK\hbox {}STATE}}}(X))\) is like a NAT that applies all possible translations of kind X (written as \(tr(p, *:*, *:*, X)\)). The idea is that, since we abstract away from the actual established connections, we overapproximate the state by considering any possible translations. At runtime, only the connections corresponding to the actual state will be possible. The rule \((\phi , \texttt {{MARK}}(m))\) is like a NAT, but when \(\phi (p)\) holds it requires that the continuation accepts p tagged by m as \(\tilde{p}\).
Example 3
The predicate in Table 3 is semantically correct, because if a packet p is accepted by a ruleset R as \(p'\), then \(P_R(p,p')\) holds, and vice versa. Formally,
Lemma 1
 1.
\(\forall p, s.\ {p},s \models ^{\epsilon }_{R} ({\texttt {{ACCEPT}}},p') \implies P_R(p,p')\); and
 2.
\(\forall p, p'.\ P_R(p,p') \implies \exists s. {p},s \models ^{\epsilon }_{R} ({\texttt {{ACCEPT}}},p')\)
We eventually define the predicate associated with a whole firewall as follows.
Definition 8
Intuitively, in the final node \(q_f\) we accept p as it is. In all the other nodes, p is accepted as \(\tilde{p}\) if and only if there is a path starting from p in the control diagram that obtains \(\tilde{p}\) through intermediate transformations. More precisely, we look for an intermediate packet \(p'\), provided that (i) p is accepted as \(p'\) by the ruleset c(q) of node q; (ii) \(p'\) satisfies one of the predicates \(\psi \) labeling the branches of the control diagram; and (iii) \(p'\) is accepted as \(\tilde{p}\) in the reached node \(q'\). Note that we ignore paths with loops, because firewalls have mechanisms to detect and discard a packet when its elaboration loops. To this aim, our predicate uses the set I for recording the nodes already traversed.
We conclude this section by establishing the correspondence between the logical formulation and the operational semantics of a firewall. Formally, \(\mathcal {F}\) accepts the packet p as \(\tilde{p}\) if the predicate \(\mathcal {P}_{\mathcal {F}}(p, \tilde{p})\) is satisfied, and vice versa:
Theorem 2
 1.
\(s \xrightarrow {p, p'} s \uplus (p,p') \implies \mathcal {P}_\mathcal {F}(p,p')\)
 2.
\(\forall p,p'.\ \mathcal {P}_\mathcal {F}(p,p') \implies \exists s. s \xrightarrow {p, p'} s \uplus (p,p')\)
Recall that the logical characterization abstracts away the notion of state, and thus \(\mathcal {P}_\mathcal {F}(p,p')\) holds if and only if there exists a state s in which p is accepted as \(p'\). In particular, if the predicate holds for a packet p that belongs to an established connection, p will be accepted only if the relevant state is reached at runtime. This is the usual interpretation of firewall rules for established connections.
7 Policy Generation
The declarative specification extracted from a firewall policy (cf. Table 1) can be mapped to a firewall \(\mathcal {F}_{S}\) whose control diagram has just one node. The ruleset \(R_S\) associated with this node only contains ACCEPT and NAT rules, each corresponding to a line of the declarative specification. In Sect. 3 we showed that each line is disjoint from the others. Hence, the ordering of rules in \(R_S\) is irrelevant.
Here we compile \(\mathcal {F}_S\) into an equivalent firewall \(\mathcal {F}_C\). First, we introduce an algorithm that computes the basic rulesets of \(\mathcal {F}_C\). Then, we map these rulesets to the nodes of the control diagram of a real system. Finally, we prove the correctness of the compilation.
7.1 Compiling a Firewall Specification
Our algorithm takes as input the ruleset \(R_S\) derived from a synthesized specification and yields the rulesets \(R_{fil}\), \(R_{dnat}\), \(R_{snat}\) (with default \({\texttt {{ACCEPT}}}\) policy) containing filtering, DNAT and SNAT rules. This separation reflects that all the real systems we have analyzed impose constraints on where NAT rules can be placed, e.g., in iptables, DNAT is allowed only in rulesets \(R_{{\textsc {Pre}}}^{\textit{nat}}\) and \(R_{{\textsc {Out}}}^{\textit{nat}}\), while SNAT only in \(R_{{\textsc {Inp}}}^{\textit{nat}}\) and \(R_{{\textsc {Post}}}^{\textit{nat}}\).
Intuitively, Algorithm 1 produces rules that assign different tags to packets that must be processed by different NAT rules (lines 6 and 7). Each NAT rule is split in a DNAT (line 8) and an SNAT (line 9), where the predicate \(\phi \) becomes a check on the tag of the packet. Filtering rules are left unchanged (line 4). Packets subject to NAT are accepted in \(R_{fil}\) while the others are dropped (line 12). We prepend \(R_{mark}\) to all rulesets making sure that packets are always marked, independently of which ruleset will be processed first (line 13). We use \(\bullet \) to denote the empty tag used when a packet has never been tagged.
Recall that the \( \, @ \, \) operator combines rulesets in sequence. Note that \(R_{fil}\) drops by default and shadows any ruleset appended to it. In practice, the only interesting rulesets are \(\mathcal {R} = \{ R_{\epsilon }, R_{fil}, R_{dnat},\) \( R_{snat}, R_{dnat} \, @ \, R_{fil} , R_{snat} \, @ \, R_{fil} \}\) where \(R_{\epsilon }\) is the empty ruleset with default \({\texttt {{ACCEPT}}}\) policy. Since here we do not discuss ipfw [18] and other firewalls with a minimal control diagram, we neither use \(R_{dnat} \, @ \, R_{fil} \) nor \(R_{snat} \, @ \, R_{fil}\).
We now introduce the notion of compiled firewall.
Definition 9

\(c(q_i) = c(q_f) = R_{\epsilon }\)

\(c(q) \in \mathcal {R}\) for every \(q \in Q \setminus \{q_i, q_f\}\)

every path \(\pi \) from \(q_i\) to \(q_f\) in the control diagram \(\mathcal {C}\) traverses a node q such that \(c(q) \in \{ R_{fil}, R_{dnat} \, @ \, R_{fil}, R_{snat} \, @ \, R_{fil} \}\)
Intuitively, the above definition requires that only rulesets in \(\mathcal {R}\) are associated with the nodes in the control diagram and that all paths pass at least one through a node with the filtering ruleset.
Example 4
7.2 Correctness of the Compiled Firewall
We start by showing that a compiled firewall \(\mathcal {F}_C\) accepts the same packets as \(\mathcal {F}_S\), possibly with a different translation.
Lemma 2
Let be \(\mathcal {T} = \{id,dnat,snat,nat\}\) the set of translations possibly applied to a packet while it traverses a firewall. The first, id, represents the identity, dnat and snat are for DNAT and SNAT, while nat represents both DNAT and SNAT. Also, let \((\mathcal {T}, <)\) be the partial order such that \(id < dnat\), \(id < snat\), \(dnat < nat\) and \(snat < nat\). Finally, given a packet p and a firewall \(\mathcal {F}\), let \(\pi _{\mathcal {F}}(p)\) be the path in the control diagram of \(\mathcal {F}\) along which p is processed. Note that there exists a unique path for each packet because the control diagram is deterministic.
The following function computes the translation capability of a path \(\pi \), i.e., which translations can be performed on packets processed along \(\pi \).
Definition 10
Theorem 3
Example 5
Consider again Example 4. Any path \(\pi \) in iptables has \(tc(\pi ) = nat\), which implies \(p' \approx p''\), i.e., \(\mathcal {F}_C\) behaves exactly as \(\mathcal {F}_S\). Interestingly, paths \(\pi _1 = \langle q_i, Inp ^n, \) \( Inp ^f, \) \( q_o \rangle \) and \(\pi _2 = \langle q_i, Out ^n, Out ^f, q_o \rangle \) in pf have \(tc(\pi )\) equal to dnat and snat, respectively. In fact, pf cannot perform snat and dnat on packets directed to and generated from the host, respectively.
8 Conclusions
We have proposed a transcompling pipeline for firewall languages, made of three stages. Its core is IFCL, an intermediate language equipped here with a formal semantics. It has the typical actions of real configuration languages, and it keeps them apart from the way the firewall applies them, represented by a control diagram. In stage 1, a real firewall policy language can be encoded in IFCL by simply instantiating the state and the control diagram. As a byproduct, we give a formal semantics to the source language, which usually has none. In stage 2, we have built a logical predicate that describes the flow of packets accepted by the firewall together with their possible translations. From that, we have synthesized a declarative firewall specification, in the form of a table that succinctly represents the firewall behavior. This table is the basis for supporting policy analysis, like policy implication and comparison, as described in our companion paper [1]. The declarative specification is the input of stage 3, which compiles it to a real target language. To illustrate, we have applied these stages on two among the most used firewall systems in Linux and Unix: iptables and pf. We have selected these two systems because they exhibit very different packet processing schemes, making the porting of configurations very challenging. All the stages above have been proved to preserve the semantics of the original policy, so guaranteeing that our transcompilation is correct. As a matter of fact, we have proposed a way to mechanically implement policy refactoring, when the source and the target languages coincide. This is because the declarative specification has no anomalies, e.g., rule overlapping or shadowing, so helping the system administrator also in policy maintenance. At the same time, we have put forward a manner to mechanically port policies from one firewall system to another, when their languages differ. We point out that, even though [1] intuitively presents and implements the first two stages of our transcompiling pipeline, the overlap with this paper is only on Sects. 4 and 6.2. Indeed, the theory, the semantics, the compilation of stage 3 and the proofs of the correctness of the whole transcompilation are original material.
As a future work, we intend to further experiment on our proposal by encoding more languages, e.g., from specialized firewall devices, like commercial Cisco IOS, or within the SDN paradigm. We plan to include a (more refined) policy generator of stage 3 in the existing tool [1] that implements the stages 1 and 2, and can deal with configurations made of hundreds of rules. Also testing and improving the performance of our transcompiler, as well as providing it with a friendly interface would make it more appealing to network administrators. For example, readability can be improved by automatically grouping rules and by adding comments that explain the meaning of refactored configurations. Finally, it would be very interesting to extend our approach to deal with networks with more than one firewall. The idea would be to combine the synthesized specifications based on network topology and routing.
References
 1.Bodei, C., Degano, P., Focardi, R., Galletta, L., Tempesta, M., Veronese, L.: Languageindependent synthesis of firewall policies. In: Proceedings of the 3rd IEEE European Symposium on Security and Privacy (2018)Google Scholar
 2.The Netfilter Project. https://www.netfilter.org/
 3.Packet Filter (PF). https://www.openbsd.org/faq/pf/
 4.Cuppens, F., CuppensBoulahia, N., Sans, T., Miège, A.: A formal approach to specify and deploy a network security policy. In: Dimitrakos, T., Martinelli, F. (eds.) Formal Aspects in Security and Trust. IFIP, vol. 173, pp. 203–218. Springer, Boston, MA (2005). https://doi.org/10.1007/0387240985_15CrossRefGoogle Scholar
 5.Gouda, M., Liu, A.: Structured firewall design. Comput. Netw. 51(4), 1106–1120 (2007)CrossRefGoogle Scholar
 6.Foley, S.N., Neville, U.: A firewall algebra for openstack. In: 2015 IEEE Conference on Communications and Network Security, CNS 2015, pp. 541–549 (2015)Google Scholar
 7.Babel: The compiler for writing next generation JavaScript. https://babeljs.io
 8.Runtime converter. http://www.runtimeconverter.com
 9.Diekmann, C., Michaelis, J., Haslbeck, M.P.L., Carle, G.: Verified iptables firewall analysis. In: Proceedings of the 15th IFIP Networking Conference, Vienna, Austria, 17–19 May 2016, pp. 252–260 (2016)Google Scholar
 10.Yuan, L., Mai, J., Su, Z., Chen, H., Chuah, C., Mohapatra, P.: FIREMAN: a toolkit for firewall modeling and analysis. In: IEEE Symposium on Security and Privacy (S&P 2006), May 2006, Berkeley, California, USA, pp. 199–213 (2006)Google Scholar
 11.Nelson, T., Barratt, C., Dougherty, D.J., Fisler, K., Krishnamurthi, S.: The Margrave tool for firewall analysis. In: Uncovering the Secrets of System Administration: Proceedings of the 24th Large Installation System Administration Conference, LISA 2010, San Jose, CA, USA, 7–12 November 2010 (2010)Google Scholar
 12.Mayer, A.J., Wool, A., Ziskind, E.: Fang: a firewall analysis engine. In: 2000 IEEE Symposium on Security and Privacy, Berkeley, California, USA, 14–17 May 2000, pp. 177–187 (2000)Google Scholar
 13.Mayer, A.J., Wool, A., Ziskind, E.: Offline firewall analysis. Int. J. Inf. Secur. 5(3), 125–144 (2006)CrossRefGoogle Scholar
 14.Adão, P., Bozzato, C., Rossi, G.D., Focardi, R., Luccio, F.L.: Mignis: a semantic based tool for firewall configuration. In: IEEE 27th Computer Security Foundations Symposium, CSF 2014, pp. 351–365 (2014)Google Scholar
 15.Bartal, Y., Mayer, A.J., Nissim, K., Wool, A.: Firmato: a novel firewall management toolkit. ACM Trans. Comput. Syst. 22(4), 381–420 (2004)CrossRefGoogle Scholar
 16.Anderson, C.J., Foster, N., Guha, A., Jeannin, J.B., Kozen, D., Schlesinger, C., Walker, D.: NetKAT: semantic foundations for networks. In: Proceedings of the 41st ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (POPL 2014). ACM (2014)Google Scholar
 17.The Netfilter Project: Traversing of tables and chains. http://www.iptables.info/en/structureofiptables.html
 18.The IPFW Firewall. https://www.freebsd.org/doc/handbook/firewallsipfw.html
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.