A joint optimization method for NoC topology generation

  • Yonghui Li
  • Kun Wang
  • Huaxi Gu
  • Yintang Yang
  • Nan Su
  • Yawen Chen
  • Haibo Zhang
Article
  • 26 Downloads

Abstract

The increasing demand on efficient intra-chip communication of multicore systems has driven the interconnection structure to evolve from bus/ring to Network-on-Chip (NoC). NoC design is fundamentally based on network topology generation and floorplanning. This paper proposes a joint optimization method to generate network topologies based on the floorplanning of heterogeneous IP cores for a given application-specific NoC. This method starts with clustering the heterogeneous IP cores according to their communication workloads using fuzzy clustering. It proceeds to apply a genetic algorithm to optimize the floorplanning by minimizing power consumption and/or chip area. By adding a router to each cluster of IP cores, network topologies are further generated via connecting routers based on the principles of scale-free networks. Experiments with a video processing application show that the optimized floorplanning of IP cores can be achieved by either minimizing the power consumption or chip area. An OPNET simulator is used to evaluate the performance of the NoC designed based on the proposed method. Experimental results demonstrate that the performance requirements of the application-specific NoC can be satisfied.

Keywords

Application-specific Network-on-Chip Floorplanning Topology generation Optimization 

1 Introduction

The advances of semiconductor technology enable an increasing number of computation IP cores integrated in a chip and simultaneously support multiple applications. This provides the opportunities for cyber-physical systems (CPS) to achieve the Industry 4.0 key features such as system connectivity and autonomy in a cost-efficient way. Guaranteed performance (e.g., throughput and latency) and limited power budgets are the main constraints of executing CPS computational applications on a multicore chip [1, 2]. To meet the power budget with reasonable power consumption, there is a trend toward heterogeneous multicore systems where important functions of the applications are accelerated in hardware. This leads to heterogeneous IP cores with different sizes and diverse communication traffic between the IP cores [3]. Though the former reduce the power consumption, the latter consume a substantial amount of power. Moreover, the communication between IP cores plays an important role in determining the performance of the applications.

Network-on-Chip has been applied for achieving scalable communication between a large number of IP cores [4, 5, 6]. There are many promising research directions in NoC area, such as Optical Network-on-Chip (ONoC) [7, 8, 9] and Wireless Network-on-Chip (WiNoC) [10, 11, 12]. Among these research topics, NoC topology generation is fundamentally important, since the topologies have significant impact on the performance of NoCs [13, 14, 15]. Though much research has been performed on regular NoC topologies such as mesh and torus [16], they do not take into account the characteristics of specific applications, resulting in a waste of on-chip resources [17, 18]. The other issue that highly impacts NoC performance is floorplanning, which places the on-chip resources (e.g., IP cores, memories, NoC routers and links) on a limited chip area. Since topology generation and floorplanning are strongly correlated, they need to be jointly designed to maximize the NoC performance in terms of latency, power consumption, throughput and chip area. Due to the increasing number of diverse applications running on a single system [19], heterogeneous IP cores are promising to satisfy the various application requirements and to be more power-efficient. These on-chip heterogeneous IP cores are equipped with different sizes and functionalities, and more communication traffic exists between the IP cores. If the IP cores can be grouped based on the communication workload between them, network topology and floorplan can be generated to reduce communication delay and power consumption. Hence, application-specific NoC can be designed by exploiting the features of both heterogeneous IP cores and the specific applications, so that efficient communications can be achieved with less chip area and power consumption.

To optimize NoC systems for specific applications, a lot of work has been done on topology generation and floorplanning. To cater for different requirements on applications, customized NoC design methodologies focus on different optimization objectives and design constraints. V. Dumitriu et al. [20] proposed two topology generation algorithms aiming at meeting the communication requirements with minimum network resources. In contrast to regular topologies, the method was proven to offer similar or better performance with much less resources. Leary et al. [21] proposed to use genetic algorithm (GA) for application-specific NoC topology generation. The results turned out that both power consumption and chip area are reduced. In [22], subject to the varying communication bandwidth requirements of different IP cores, Choudhary et al. applied GA for NoC design, and showed that both the communication load and energy consumption can be well distributed on chip. Some simulation-based approaches that exploit the actual traffic trace of specific applications are presented in [23, 24], and the simulation results showed remarkable improvement on performance.

Application-specific topologies have been recently investigated. A fault-tolerant irregular topology generation method for application-specific NoC designs is presented in [25]. Murali et al. [26] designed an application-specific NoC with floorplanning, where the wiring complexity during the synthesis was considered so that the timing violates can be detected at early stage. In [27], based on message-dependent graph, an approach to customize NoC topology without contention was proposed. For a specific application, this approach produced topology along with the routing paths set. It turned out that the proposed solution can offer low latency and power consumption. A prohibitive greedy iterative strategy was proposed by Chan et al. [28] to design NoC topologies which support both point-to-point and packet-switched networks. With such hybrid networks, more energy saving and lower latency can be achieved. In [29], Tino presented a multi-objective tabu search-based approach to take floorplanning and contention into account during topology generation, resulting in a reduction in power dissipation. In [30], a transitive closure graph (TCG) based post-floorplanning repacking algorithm was proposed for power optimization of multiple supply voltages (MSV)-driven application-specific 2D NoC. The total communication power can be optimized without greatly changing the original floorplan. A two-phase floorplanning approach arranging each IP core into a fixed-outline rectangle was proposed for application-specific NoC design [31], resulting in 30% reduction in power consumption and 5% of chip area compared with random floorplanning on average.

To the best of our knowledge, in previous works topology was synthesized with existing floorplanners. In this paper, we propose a joint optimization method of topology generation and floorplanning, where the topology is generated based on the floorplanning. Our scheme will give a complete and detailed design of NoC architecture. The main contributions of this work are summarized as follows.
  1. 1.

    We propose a system-level design flow of application-specific NoC topology generation. According to this flow, new NoCs satisfying the application requirements can be obtained. The flow consists of IP core clustering, floorplanning, topology generation and evaluating the system performance.

     
  2. 2.

    A scale-free network model is established for application-specific NoC, which has the ability to tolerate random faults [32]. As a result, if random faults occur, the network has a high probability to operate normally. To design a scale-free network, an initial network structure is created by exploiting the characteristics of small-world [33, 34]. Then, we use a scale-free network model to generate the topology that meets the requirements of an application.

     
  3. 3.

    A comprehensive simulation of the proposed topology generation method is performed, where a video processing application-video object plane decoder (VOPD) is used. The results demonstrate that the proposed topology generation method provides a high-performance network architecture, which satisfies the requirements of the application.

     

The rest of this paper is organized as follows. Section 2 describes the problem, followed by building the models of floorplanning and topology generation. Section 3 illustrates the detailed algorithms to obtain the optimized floorplanning and topology generation. Finally, the experimental results are given in Sect. 4, before concluding in Sect. 5.

2 Modeling of floorplanning and topology generation

Topology generation and floorplanning are two basic steps of designing customized NoC. The order of floorplanning and topology generation makes a difference in NoC design. In [35], the authors stated that floorplanning should be firstly carried out because of the larger area occupied by IP cores on chip. However, NoC topology has more influence on delay and power consumption of the network. If the objective is to save power consumption instead of chip area, it is suggested to first perform topology generation and then carry out floorplanning of IP cores and other network components. Due to the scarce resources on a chip, generating topology according to the floorplanning is able to optimize the NoC design, where the features of specific applications can be taken into account. We proceed by modeling the floorplanning and topology generation in detail.

2.1 Definitions

Definition 1

A core communication graph (CCG) is a weighted directed graph CCG(V, E), where V is the set of IP cores and E represents the set of communication links between the IP cores in V. Each directed edge e i,j  E represents a communication link from IP core v i ∈ V to IP core v j ∈ V, and the weight w i,j for edge e i,j denotes the communication bandwidth for the link from v i to v j .

Definition 2

A topology graph is an undirected graph TG(V, R, L). V is the set of IP cores. R represents the set of routers, and L defines the set of links that connect network nodes including IP cores and routers.

2.2 Floorplanning model

Floorplanning is the process that all IP cores are reasonably arranged on a two-dimensional plane. Each core v i is assigned a two-dimensional position coordinates (x i , y i ) for each IP core, so as to satisfy the conditions that denotes the coordinate of the lower left corner of v i .

Let w i and h i be the width and height of the IP core v i , respectively. To guarantee that there is no overlapping between any two IP cores v i and v j , their coordinates (x i , y i ) and (x j , y j ) should satisfy:
$$ \left\{ {\begin{array}{*{20}l} {\hbox{min} \{ x_{i} ,x_{j} \} + W \le \hbox{max} \{ x_{i} ,x_{j} \} } \hfill \\ {\hbox{min} \{ y_{i} ,y_{j} \} + H \le \hbox{max} \{ y_{i} ,y_{j} \} } \hfill \\ \end{array} } \right. $$
(1)
where W = w i × δ xi  + w j × δ xj , H = h i × δ yi  + h j × δyj, and
$$ \begin{aligned} & \delta_{xi} = \left\{ {\begin{array}{*{20}l} {1,\quad {\text{if}}\,x_{i} = \hbox{min} \{ x_{i} ,x_{j} \} ;} \hfill \\ {0,\quad {\text{otherwise}}.} \hfill \\ \end{array} } \right.\quad \delta_{xj} = \left\{ {\begin{array}{*{20}l} {1,\quad {\text{if}}\,x_{j} = \hbox{min} \{ x_{i} ,x_{j} \} ;} \hfill \\ {0,\quad {\text{otherwise}}} \hfill \\ \end{array} } \right.. \\ & \delta_{yi} = \left\{ {\begin{array}{*{20}l} {1,\quad {\text{if}}\,y_{i} = \hbox{min} \{ y_{i} ,y_{j} \} ;} \hfill \\ {0,\quad {\text{otherwise}}.} \hfill \\ \end{array} } \right.\quad \delta_{yj} = \left\{ {\begin{array}{*{20}l} {1,\quad {\text{if}}\,y_{j} = \hbox{min} \{ y_{i} ,y_{j} \} ;} \hfill \\ {0,\quad {\text{otherwise}}.} \hfill \\ \end{array} } \right. \\ \end{aligned} $$
(2)
Let \( {\text{Total}}\_{\text{Area}} \) be the total chip area that all IP cores occupy. Then, \( {\text{Total}}\_{\text{Area}} = \sum\nolimits_{i = 1}^{\left| V \right|} {h_{i} \times w_{i} } \). The shape of commercial chips is mostly rectangular. The width and height of the chip should satisfy:
$$ \begin{aligned} {\text{Width}} - \Delta \delta & \le \sqrt {{\text{Total}}\_{\text{Area}}} \\ {\text{Height}} - \Delta \delta & \le \sqrt {{\text{Total}}\_{\text{Area}}} \\ \end{aligned} $$
(3)
where Width is the width of chip, Height is the height of chip. ∆δ is an adjustable factor which can be flexibly specified by designers.
Based on these two constraints, floorplanning is carried out with the objective of minimizing power consumption and chip area. The objective model is described as:
$$ { \hbox{min} }\left\{ {\alpha \times \sum\limits_{i,j} { ( {\text{Dis(}}i , { }j )\times E_{\text{link}} \times {\text{Bandwidth(}}i , { }j ) )} + \beta \times {\text{Area}}} \right\} $$
(4)
where Elink denotes the unit power consumed by links, Dis(i, j) and Bandwidth(i, j) denote the distance and communication bandwidth between IP core v i and IP core v j , respectively. Area is the area of the chip. The α and β denote the weighting factors of power consumption and area, which are determined by users and α  + β = 1.

2.3 Topology generation model

Topology generation aims at connecting as many IP cores within the same cluster as possible through a local router subject to its maximum number of ports. Since more communication traffic exists within the IP cores in the same class, the communication delay and power consumption can be effectively reduced by classification. The calculation of router position follows two steps. Firstly, the position of each router in the initial network is chosen according to the connections between the router and its connected IP cores. Secondly, for the dynamic evolution procession of scale-free network, each new router position should be comprehensively assessed to choose a nearest place to IP cores. Specific process can be expressed as follows:

With CCG and floorplanning, topology generation needs to place all the routers on the plane and specify the links L in TG. In our proposed scheme, we firstly specify the positions of necessary routers by minimizing the average length of links. The objective function is
$$ \text{aver}\_\text{dis} = \hbox{min} \left\{ {\sum\limits_{i = 0}^{n} {\sum\limits_{j = 0}^{n} {\frac{{\text{dis}(i,j) \times \text{con}\_\text{sign}(i,j)}}{{\text{link}\_\text{no}}}} } } \right\} $$
(5)
where n is the number of routers. con_sign(i, j) denotes whether router i and j are directly connected as defined by Eq. (6). dis(i, j) given by Eq. (7) represents the Euclidean distance between router i and j. link_no is the number of links existing in the network, and it can be calculated by Eq. (8), where node[i].x, node[i].y, node[j].x and node[j].y denote x and y coordinates of router i and j.
$$ {con\_sign}(i,j) = \left\{ {\begin{array}{*{20}l} {\text{0},\quad \text{no}\,\,\text{direct}\,\,\text{connection}\,\,\text{between}\,\,\text{router}\,\,i\,\,\text{and}\,\,j} \hfill \\ {\text{1},\quad \text{direct}\,\,\text{connection}\,\,\text{between}\,\,\text{router}\,\,i\,\,\text{and}\,\,j} \hfill \\ \end{array} } \right. $$
(6)
$$ \text{dis}(i,j) = \sqrt {(\text{node}[i] . x - \text{node}[j] . x)^{\text{2}} + (\text{node}[i] . y - \text{node}[j] . y)^{\text{2}} } $$
(7)
$$ {link\_no = }\sum\limits_{i = 0}^{n} {\sum\limits_{j = 0}^{n} {{con\_sign(}i\text{,}j\text{)}} } $$
(8)

The router position on the chip determines the mutual distance during final formation of the network. For system-level layout, the size of router is relatively smaller than IP cores [36]; thus, we can ignore effect of routers in layout process. Assume that router is located on a corner of IP core, which exist at least four positions on IP cores. Hence, it is necessary to estimate router position according to connections with all IP cores. The basic principle is that distance between router and connected IP cores is closest, and only one IP core exits in the same place.

3 Algorithms description

In the proposed method, floorplanning is carried out firstly. Based on the physical information such as wire lengths obtained from floorplanning, this scheme can achieve a more optimized topology. Furthermore, during the topology generation stage, we cluster IP cores and merge scale-free characteristic into NoC topology for fault tolerance and low power consumption. Note that a scale-free network is able to tolerate random faults.

3.1 Floorplanning

Floorplanning can be viewed as a searching problem on the plane for positioning IP cores. In this stage, genetic algorithm (GA) is employed to obtain the optimized result. There are four basic steps in GA for floorplanning.
  1. 1.

    Population initialization

    Each IP core will be assigned a vector (sequence, form). The sequence whose range is 1~ N (N is the number of IP cores) denotes the sequence number of an IP core for floorplanning. The form is either 0 or 1, denoting the gesture of the relevant IP core. If form equals 0, the IP core is horizontally placed. The IP core is vertically placed when form is 1. The vectors of all IP cores constitute one chromosome in GA. Note that each chromosome results in one floorplanning. The initial population can be acquired by generating an amount of chromosomes.

     
  2. 2.

    Fitness calculation

    Taking power consumption and chip area into account in the floorplanning process, the fitness is calculated with Eq. (9).
    $$ \begin{aligned} \text{fitness} & = \alpha \times l + \beta \times \frac{{\text{area}}}{{\text{total}\_\text{area}}}, \\ l & = \sum\limits_{i,j \in [1,N]} {\frac{{\text{wire}\_\text{length}(i,j) \times \text{traffic}(i,j) \times {unit\_power}}}{{{max\_power}}}} \\ \end{aligned} $$
    (9)
    where wire_length(i, j) and traffic(i, j) denote the wire length and communication traffic between IP cores v i and v j . The unit_power represents the unit power consumed by links, while the max_power is the power consumption under maximum communication distance and traffic. The total_area and area, respectively, denote the total area of all IP cores and the chip area. The α and β are weighting factors of power consumption and area, where α + β = 1. These values can be obtained according to the floorplanning result.
     
  3. 3.

    Selection

    To construct a new population, we use the idea of roulette to select individuals from the current population. Whether one individual is selected depends on its selection probability, and the selection probability can be calculated as
    $$ p = \frac{1}{{\text{fitness}}} $$
    (10)
    From this equation, it can predict that higher fitness causes lower selection probability. This is helpful to select individuals whose power consumption and area are relatively small.
     
  4. 4.

    Crossover and mutation

    Crossover and mutation are used to create new individuals. Crossover chooses a crossover point and two chromosomes crossover at this crossover point to create two new individuals. Mutation chooses mutation points and changes the genes at the points directly to get a new individual. Note that the points are chosen randomly.

     

In the above calculation of population fitness, it is necessary to know floorplanning order and IP cores posture that are included in a chromosome, to calculate the position of all IP cores on the chip, and to estimate the required line length between IP cores based on Euclidean distance. The most important thing in this process is to ensure that there is no overlap between IP cores, which can save area as much as possible. To achieve this goal, specific rules are defined for floorplanning of the given IP cores, which include (1) marking the location of each IP core of the entire chip by recording the position of lower left corner of each IP core; (2) beginning floorplanning from left to right and from bottom to top; (3) recording next valid floorplanning point and adding it to the set of feasible floorplanning points, followed by removing original relevant valid floorplanning point from the set, and adding to another “set of disappeared floorplanning points”; (4) for valid floorplanning point, there is a feasible floorplanning region, which depends on closest valid floorplanning points. Moreover, the new IP core is not allowed to occupy the position of “disappeared floorplanning points” nearest to floorplanning points.

To design floorplanning with GA above, the key problem is to specify the position of each IP core according to one chromosome. To achieve this goal, we generate one floorplanning result according to vectors of all IP cores with the following steps.
  1. 1.

    Generate floorplanning points

    Floorplanning points are used to indicate the valid positions. After one IP core floorplanned at a floorplanning point, this floorplanning point is signed disappeared, and new floorplanning points are specified. For example, if the current IP core is placed on (x c , y c ), new floorplanning points will be (x c ,y c + height), (x c + width, y c + height) and (x c + width, y c ), while (x c , y c ) is signed to be disappeared. However, we must guarantee whether new floorplanning points are valid, that is, whether the points distribute on regions of the floorplanned IP cores.

     
  2. 2.

    Check previous floorplanning points

    Previous floorplanning points may be overlapped after current IP core placed on the plane. In this situation, the floorplanning points that are overlapped should be signed disappeared to update the valid floorplanning points. If a floorplanning point (x f , y f ) satisfies:
    $$ \left\{ {\begin{array}{*{20}l} {x_{c} \le x_{f} \le x_{c} + {\text{width}}} \hfill \\ {y_{c} \le y_{f} \le y_{c} + {\text{height}}} \hfill \\ \end{array} } \right. $$
    (11)
    where (x c , y c ) is the floorplanning point on which the current IP core is placed. (x f , y f ) will be signed disappeared.
     
  3. 3.

    Add virtual floorplanning points

    For all previous floorplanning points such as (x a , y a ), if a new floorplanning point (x0, y0) satisfies condition: x0 ≥  x a , y0 ≥  y a , (x0, y0) will be added as a virtual floorplanning point. Then, the virtual floorplanning point is added to the set of valid floorplanning points.

     
  4. 4.

    Traverse floorplanning points

    To specify the position of the current IP core, all the valid floorplanning points should be traversed. Then, the floorplanning points are arranged in ascending order of the traffic between the current IP core and the IP core which is closest to the current floorplanning point. At last, the floorplanning points should be checked in order until the qualified position is determined.

     

Applying the steps above to each IP core in CCG according to the vectors, the floorplanning result relevant to the specific chromosome will be achieved.

3.2 Topology Generation

During topology generation, IP cores are divided into several clusters, followed by adding routers and connections to generate the topology. For application-specific NoC, routers need to be positioned to achieve low power consumption and short network distance, while being subject to the maximum number of ports per router. The selection of connections between routers generally relies on the communication traffic. For example, it has a high probability to add a connection between two routers if their traffic is heavy. Therefore, we establish a connection probability model based on the principle of small-word to create an initial topology. Then, new routers are gradually positioned to be the nearest place to IP cores during the dynamic evolution procession of scale-free network. Specific steps can be shown as follows.
  1. 1.

    Clustering

    For optimized topology, IP cores between which the communication is heavy should be connected to the same router as a cluster. Clustering algorithm is firstly employed into topology generation. In our scheme, the fuzzy clustering algorithm is employed. We define the communication between different IP cores as the properties of IP cores, for example, the communication from IP i to IP j is the jth property of IP i. Note that the communication workflow between IP cores is determined by the application (e.g., VOPD), which is characterized by a core communication graph. The fuzzy matrix is defined as:
    $$ X = \left\{ {\begin{array}{*{20}l} {X_{1} } \hfill \\ {X_{2} } \hfill \\ \vdots \hfill \\ {X_{n} } \hfill \\ \end{array} } \right\} = \left\{ {\begin{array}{*{20}l} {x_{11} ,} \hfill & {x_{12} ,} \hfill & \cdots \hfill & {x_{1m} } \hfill \\ {x_{21} , } \hfill & {x_{22} ,} \hfill & \cdots \hfill & {x_{2m} } \hfill \\ \vdots \hfill & \vdots \hfill & \ddots \hfill & \vdots \hfill \\ {x_{n1} ,} \hfill & {x_{n2} ,} \hfill & \cdots \hfill & {x_{nm} } \hfill \\ \end{array} } \right\} $$
    (12)
    Based on the fuzzy matrix, cosine distance is used to measure the distance between the properties of different IP cores in the network. Getting rid of the great discreteness of different properties, we address the properties with normalization equation:
    $$ x_{ik}^{{\prime }} = \frac{{x_{ik} - \overline{{x_{ik} }} }}{{\sigma_{k} }} $$
    (13)
    where \( X_{ik} \) and σ k are the mean value and variance of the kth property of all the IP cores and can be calculated as:
    $$ \overline{{x_{ik} }} = \frac{1}{n}\sum\limits_{i = 1}^{n} {x_{ik} } ,\quad \sigma_{k} = \sqrt {\frac{1}{n - 1}\sum\limits_{i = 1}^{n} {(x_{ik} - \overline{{x_{ik} }} )^{2} } } $$
    (14)
    After normalization, the cosine distance between IP i and IP j is quantified as:
    $$ r_{ij} = \frac{{\sum\nolimits_{k = 1}^{m} {x_{ik}^{{\prime }} x_{jk}^{{\prime }} } }}{{\sqrt {\left( {\sum\nolimits_{k = 1}^{m} {x_{ik}^{{{\prime }2}} } } \right)} \sqrt {\left( {\sum\nolimits_{k = 1}^{m} {x_{jk}^{{{\prime }2}} } } \right)} }} $$
    (15)
    Shorter distance implies larger possibility for two IP cores to be the same cluster. Since the fuzzy matrix is obtained from communicating adjacent matrix, IP cores in the same cluster can communicate with each other more and the relationship between them will be closer. This characteristic can be advantageous to improve the performance. In this scheme, we use netting method [37] to cluster IP cores directly. For different targets, different intercepts λ should be chosen as confidence level for netting to get different optimized results. The detailed steps for clustering IP cores are shown in Fig. 1.
    Fig. 1

    Pseudocode for clustering

     
  2. 2.

    Scale-free topology generation

    In scale-free networks [38], the probability of one router connecting k nodes p(k) is proportional to kγ, where γ is relevant to the system. Based on this principle, small degree nodes will be much more than high degree nodes in the network. Therefore, the possibility of fault in high degree nodes will be less than that in low degree nodes, helping to construct a high fault tolerance topology. In our scheme, we propose to construct the NoC topology with scale-free characteristic.

     
To achieve such a topology, an initial topology will be created first and other IP cores will be added to the initial topology for evolving.
  1. (a)

    Create initial topology

    The initial topology generation is based on the clustering result. Therefore, during process of clustering, the value of the intercept λ should certify that the number of ports in one router is enough to connect IP cores in one cluster and other necessary routers.

    IP cores in the specific application will be chosen to constitute the initial topology. Routers u and v are linked together with the probability:
    $$ p(u,v) = \alpha e^{{{{ - d(u,v)} \mathord{\left/ {\vphantom {{ - d(u,v)} {\beta L}}} \right. \kern-0pt} {\beta L}}}} $$
    (16)
    where d (u,v) is the Euclidean distance between routers u and v, L is the longest distance between two routers. α and β denote the average degree of routers and the average length of links in the topology. We can adjust α and β for specified initial topologies with different characteristics. If the parameter α increases, the connection probability of edge also becomes larger, leading to an increase of numbers of edge in generated graph at last. If the parameter β increases, it will be easier to generate longer edges instead of short edges. By this flexible method to configure parameters, random graphs with a small-world effect can be obtained, which ultimately expands the search space to find an optimal topology.

    After initial topology generation, the positions of routers should be determined. Considering that the area of routers is much smaller than the area of IP cores, routers are ignored during floorplanning and will be placed at the corners of IP cores during topology generation by minimizing average link length.

     
  2. (b)

    Add new nodes to the initial topology

    The generation of scale-free network mainly depends on the rule of network growth, which can be well reflected in its growth and prior connection. The growth determines how additional nodes connect to the existing nodes, such as the number of connections while priory of connectivity gives the probability of a connection between new added node and existing node. All the nodes will be added to current network in this way until the entire network is ultimately generated. Certain scale-free characteristics will be shown in network obtained by this way.

     
In this stage, IP cores which are not included in the initial topology will be added to the network. For one IP core i in cluster A, if there are valid ports in the router which connects other IP cores in cluster A, connect IP core i to that router. Otherwise, connect IP core i to a new router, and connect the new router to other routers in the current topology with the probability:
$$ p(i,j) = \frac{{{{\alpha \times k_{i} } \mathord{\left/ {\vphantom {{\alpha \times k_{i} } {\sum\nolimits_{i = 1}^{n} {k_{i} } }}} \right. \kern-0pt} {\sum\nolimits_{i = 1}^{n} {k_{i} } }} + {{\beta \times Tr_{i,j} } \mathord{\left/ {\vphantom {{\beta \times Tr_{i,j} } {\sum\nolimits_{i = 1}^{n} {Tr_{i,j} } }}} \right. \kern-0pt} {\sum\nolimits_{i = 1}^{n} {Tr_{i,j} } }}}}{{{{\lambda \times {\text{Dis}}(i,j)} \mathord{\left/ {\vphantom {{\lambda \times {\text{Dis}}(i,j)} {\sum\nolimits_{i = 1}^{n} {{\text{Dis}}(i,j)} }}} \right. \kern-0pt} {\sum\nolimits_{i = 1}^{n} {{\text{Dis}}(i,j)} }} + {{\gamma \times {\text{Phop}}(i,j)} \mathord{\left/ {\vphantom {{\gamma \times {\text{Phop}}(i,j)} {\sum\nolimits_{i = 1}^{n} {{\text{Phop}}(i,j)} }}} \right. \kern-0pt} {\sum\nolimits_{i = 1}^{n} {{\text{Phop}}(i,j)} }}}} $$
(17)
where n is the number of routers in the current topology, k i denotes the degree of router i, Tri, j, Dis(i, j) and Phop(i, j) are the traffic, communication distance and potential communicating hops between router i and router j, α, β, λ and γ are the relevant weighting factors and α + β + λ + γ = 1. After all other IP cores being added into the initial topology, the final optimized topology is achieved.

4 Experiments and results

For evaluation, we apply the proposed method to the application VOPD under 45 nm technology and simulate the system with Visual C++. The unit link power unit_power is set to be 40.4 nW/Mbps/mm [39]. Note that the characteristics of VOPD can be found in [40].

Whether genetic algorithm can find optimal result mainly depends on when the stable results are achieved. If the obtained optimal individual fitness, power consumption and area get stable with time of population evolution, it guarantees that the optimal solution is achieved. To see whether the generation in GA is evolving during floorplanning, we set the size of one generation as 100 and iterate the operations for 1000 generations. By validating this process, we ultimately acquire trends of fitness, area and power consumption with genetic algorithm iterations through statistical parameters data that computer runs. Details are shown in Figs. 2, 3 and 4.
Fig. 2

Tendency of fitness

Fig. 3

Tendency of chip area

Fig. 4

Tendency of power consumption

Figure 2 shows the tendency of fitness over time when power factor and area factor vary. It can be seen no matter what kind of weight configuration, fitness will be small initially and eventually stabilize at a higher value, indicating that through continuous iterative, the optimal floorplanning result can be finally found. The greater the weight of power consumption parameter becomes, the larger fitness will be. It proves that we can obtain larger fitness and even a floorplanning result with smaller power consumption when regarding the power consumption as an optimization goal.

Figures 3 and 4 show optimal area and power consumption curves over time during current record. The tendency of curve generally declines, eventually stabilize at a smaller value. It shows that through iterative genetic algorithm, a floorplanning result of smallest power consumption and area ultimately can also be found. As can be seen from the figure, the curve in the middle of the process, there will be some fluctuations. The main reason for this phenomenon is the randomness of population. Genetic algorithm aims to find optimal method of generating a better population. However, we choose the population with a certain degree of randomness. There may be bad performance of the population in the middle population, but after crossover, mutation and ultimately a reasonable population can be produced; thus, the optimal solution can be obtained. Note that optimal results can be theoretically achieved using genetic algorithm. However, the experimental results in this paper are practically optimal, which can satisfy the requirements of floorplanning and the specific applications.

Figures 2, 3 and 4 show the tendency of fitness, power consumption and chip area of the floorplanning results under different weighting factors. This figure indicates that the generation tends to be steady and optimized as time goes on and that the optimization degree of different performance depends on the weighting factors. When power consumption is mainly considered, that is, the factor of power consumption and chip area is 0.7 and 0.3, power consumption is decreased to 0.6 mw, while chip area is decreased to 193.44 mm2. When chip area is mainly considered, power consumption of final topology is 667,526.1513 nw, while corresponding chip area is 186.03 mm2. Hence, during the floorplanning stage, we must choose suitable weighting factors to achieve a trade-off between different optimizations.

During topology generation, the search range of the number of IP cores in the initial topology is from 4 to the number of clusters. As we know, the complexity and power consumption grow rapidly as the number of router ports increases. To avoid the extra overhead, the maximum number of router ports is restricted from 5 to 8. The simulation results show the topology performance varies over time and the tendency of power consumption is shown in Fig. 5. From the tendency of power consumption, obviously, the performance of the topology is improved.
Fig. 5

Variation of topology power consumption with time

System-level simulation aims at testing the performance of the network design and offers a reasonable network configuration that meets the requirements of applications, the important network consideration in which is the input buffer length and the number of virtual channels. Then in the case of a given network configuration, we can find a network saturation point in this configuration by changing network injection rate, which will determine whether the scope of the work meets application requirements. We use OPNET emulator to simulate the generated VOPD on-chip network system and verify final performance of proposed NoC topology generation algorithm based on scale-free networks theory. Note that the OPNET simulator is decoupled from the topology generation. It takes the generated topology as an input. The output of simulation platform consists mainly of network latency and throughput for evaluating the performance and cost. The average delay is elapsed average time when all packets generated from the source node to the destination node during network simulation process, which can obtain data transmission delay in a configured network. The throughput is defined as received packet bit rate when the network gets stable, and the unit is bits/s or flits/cycle. According to this idea, we analyze overall network performance based on different buffer lengths and number of virtual channels.

Figure 6 shows that input buffer length has a significant influence on the communication performance between a single pair of nodes, which also impacts on overall network performance. When in a certain range of content increasing the input buffer length, we can find that network performance is obviously improved and that saturation point is increased, extending work scope of the network. However, when the buffer is increased to a certain extent, other factors instead of buffer length will restrict network performance, such as router processing speed, arbitration and scheduling. Therefore, buffer length needs to be set within reasonable limits; thus, the network performance can be well improved and ultimately meet required network configuration of VOPD. Figure 6 shows delay and throughput performance curves under different buffer lengths in VOPD network. When the buffer length increases from 1 to 5 flits, the network performance will be greatly improved. The network saturation point can be increased to 0.12 packets/cycle/IP, namely 0.6 flits/cycle/IP. The network delay at the time is 286.7 cycles, while throughput equals 137.9 bits/cycle. Maximum communication rate VOPD defined communication is 500 Mbps, so VOPD application can be ensured to work within normal range. In addition, it can be seen from Fig. 6 that improvement of the network performance is not obvious when the buffer length increases from 4 to 5 flits. At the same time, the delay curves coincide within a certain range, and promotion of throughput is also relatively small. Therefore, it is enough for VOPD applications to set buffer length as 4 or 5 flits.
Fig. 6

Variation of delay and throughput with offered load

Figure 7 shows network latency and throughput curves in VOPD network with a different number of virtual channels when buffer length equals 1 or 4 flits. It can be inferred that changing of virtual channels number can hardly improve the communication performance of the network. When the buffer length is 1 flit, the network saturation point equals 0.048 packets/cycle/IP (namely 0.24 flits/cycle/IP), while the network average delay is 389 cycles and network throughput is 54.7 bits/cycle. When the buffer is 4 flits, the network performance gets better. However, changing of the virtual channels number is still unable to effectively enhance the network performance, which can be seen from Fig. 7. It indicates that the network saturation point of VOPD is about 0.12 packets/cycle/IP (0.6 flits/cycle/IP), while the network average delay equals 286.9 cycles and network throughput equals 137.9 bits/cycle. It also shows that for VOPD application, an effective way to improve network performance is to increase within reasonable limits buffer length, rather than increasing the number of virtual channels. Therefore, it suggests that minimizing the number of virtual channels can reduce the router complexity and save chip resources.
Fig. 7

Variation of delay and throughput with offered load

The simulation results of OPNET illustrate the performance of VOPD application-specific NoC, which can realize application requirements including input buffer length, virtual channel number and so on, while providing a given configuration. Besides, the simulation results show that proposed topology generation method will meet the requirements of application-specific NoC architecture based on a reasonable network configuration.

5 Conclusion

In this paper, we present a new optimization method for application-specific NoC topology generation. Floorplanning is carried out first with GA to determine the positions of IP cores targeting low overhead such as power consumption and chip area. During topology generation, fuzzy clustering algorithm is employed to divide IP cores into several clusters according to the communication. Based on the floorplanning information and clusters, topology with scale-free characteristics is generated. Simulation results indicate that power consumption is highly optimized by the proposed method, and proposed topology generation method can meet requirements of application-specific NoC architecture based on reasonable network configuration.

In the future work, the proposed joint optimization method will be applied for generating topologies for multiple applications, which are particularly executed by a large number of IP cores.

Notes

Acknowledgements

This work was supported by the National Science Foundation of China under Grants 61634004, and Grant 61472300, the Fundamental Research Funds for the Central Universities Grant No. JB180309 and No. JB170107, and the key research and development plan of Shaanxi province No. 2017ZDCXL-GY-05-01.

References

  1. 1.
    Li Y, Akesson B, Goossens K (2016) Architecture and analysis of a dynamically-scheduled real-time memory controller. Real-Time Syst 52(5):675–729CrossRefGoogle Scholar
  2. 2.
    Adyanthaya S, Ara H et al (2015) xCPS: a tool to explore cyber physical systems. In: Proceedings of the WESE'15: workshop on embedded and cyber-physical systems education, Amsterdam, Netherlands.  https://doi.org/10.1145/2832920.2832923
  3. 3.
    Chen X et al (2014) Variation-aware layer assignment with hierarchical stochastic optimization on a multicore platform. IEEE Trans Emerg Top Comput 2(4):488–500CrossRefGoogle Scholar
  4. 4.
    Deligiannidis L, Arabnia HR (2014) Parallel video processing techniques for surveillance applications. In: International Conference on Computational Science and Computational Intelligence (CSCI), pp 183–189Google Scholar
  5. 5.
    Choche A, Arabnia HR (2011) A methodology to conceal QR codes for security applications. In: Proceedings of the International Conference on Information and Knowledge Engineering (IKE’11)Google Scholar
  6. 6.
    Ding H, Gu H, Li B Du K (2012) Configuring algorithm for reconfigurable Network-on-Chip architecture. In: International Conference on Consumer Electronics, Communications and Networks (CECNet), pp 222–225Google Scholar
  7. 7.
    Ji R, Xu J, Yang L (2013) Five-port optical router based on microring switches for photonic networks-on-chip. IEEE Photon Technol Lett 25(5):492–495CrossRefGoogle Scholar
  8. 8.
    Ye Y, Xu J, Huang B et al (2013) 3-D mesh-based optical Network-on-Chip for multiprocessor system-on-chip. IEEE Trans Comput Aided Des Integr Circuits Syst 32(4):584–596CrossRefGoogle Scholar
  9. 9.
    Wu X, Xu J, Ye Y et al (2014) SUOR: sectioned undirectional optical ring for chip multiprocessor. ACM J Emerg Technol Comput Syst 10(4):228–239CrossRefGoogle Scholar
  10. 10.
    Majumder T, Pande PP, Kalyanaraman A (2014) Wireless NoC platforms with dynamic task allocation for maximum likelihood phylogeny reconstruction. IEEE Des Test 31(3):54–64CrossRefGoogle Scholar
  11. 11.
    Murray J, Tang N, Pande PP et al (2015) DVFS pruning for wireless NoC architectures. IEEE Des Test 32(2):29–38CrossRefGoogle Scholar
  12. 12.
    Kulkarni VV, Lim WY, et al (2016) A 5.1 Gb/s 60.3 fJ/bit/mm PVT tolerant NoC transceiver. In: 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp 141–144Google Scholar
  13. 13.
    Li C-L, Yoo J-C, Han TH (2016) Energy-efficient custom topology generation for link-failure-aware Network-on-Chip in voltage-frequency island regime. J Semicond Technol Sci 16(6):832–841CrossRefGoogle Scholar
  14. 14.
    Li KSM (2013) CusNoC: fast full-chip custom NoC generation. IEEE Trans Very Large Scale Integr VLSI Syst 21(4):692–705MathSciNetCrossRefGoogle Scholar
  15. 15.
    Yu B, Dong S, Chen S, Goto S (2010) Floorplanning and topology generation for application-specific Network-on-Chip. In: 15th Asia and South Pacific Design Automation Conference (ASP-DAC), pp 535–540Google Scholar
  16. 16.
    Yu Z, Xiang D, Wang X (2015) Balancing virtual channel utilization for deadlock-free routing in torus networks. J Supercomput 71(8):3094–3115CrossRefGoogle Scholar
  17. 17.
    Soohyun K, Pasricha S, Jeonghun C (2011) POSEIDON: a framework for application-specific Network-on-Chip synthesis for heterogeneous chip multiprocessors. In: Proceedings of 2011 12th International Symposium on Quality Electronic Design (ISQED), pp 1–7Google Scholar
  18. 18.
    Soumya J, Kumar KN, Chattopadhyay S (2015) Integrated core selection and mapping for mesh based Network-on-Chip design with irregular core sizes. J Syst Archit 61(9):410–422CrossRefGoogle Scholar
  19. 19.
    Khoshkbarforoushha A, Ranjan R, Gaire R, Abbasnejad E, Wang L, Zomaya AY (2017) Distribution based workload modelling of continuous queries in clouds. IEEE Trans Emerg Top Comput 5(1):120–133CrossRefGoogle Scholar
  20. 20.
    Dumitriu V, Khan GN (2009) Throughput-oriented NoC topology generation and analysis for high performance SoCs. IEEE Trans Very Large Scale Integr VLSI Syst 17(10):1433–1446CrossRefGoogle Scholar
  21. 21.
    Leary G, Srinivasan K, Mehta K et al (2009) Design of Network-on-Chip architectures with a genetic algorithm-based technique. IEEE Trans Very Large Scale Integr (VLSI) Syst 17(5):674–687CrossRefGoogle Scholar
  22. 22.
    Choudhary N, Gaur MS, Laxmi V et al (2011) GA based congestion aware topology generation for application specific NoC. In: Proceedings of 6th IEEE International Symposium in Electronic Design, Test and Application (DELTA), pp 93–98Google Scholar
  23. 23.
    Wang Z, Liu W, Xu J et al (2014) A systematic Network-on-Chip traffic modeling and generation methodology. In: Proceedings of 2014 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp 675–678Google Scholar
  24. 24.
    Murali S, Benini L, De Micheli G (2007) An application-specific design methodology for on-chip crossbar generation. IEEE Trans Comput Aided Des Integr Circuits Syst 26(7):1283–1296CrossRefGoogle Scholar
  25. 25.
    Tosun S, Ajabshir V, Mercanoglu O et al (2015) Fault-tolerant topology generation method for application-specific Network-on-Chips. IEEE Trans Comput Aided Des Integr Circuits Syst 34(9):1495–1508CrossRefGoogle Scholar
  26. 26.
    Murali S, Meloni P, Angiolini F et al (2006) Designing application-specific Networks on Chips with floorplan information. In: Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp 355–362Google Scholar
  27. 27.
    Deniziak S, Tomaszewski R (2009) Contention-avoiding custom topology generation for Network-on-Chip. In: Proceedings of 12th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), pp 234–237Google Scholar
  28. 28.
    Chan J, Parameswaran S (2008) NoCOUT: NoC topology generation with mixed packet-switched and point-to-point networks. In: Proceedings of Asia and South Pacific in Design Automation Conference (ASPDAC), pp 265–270Google Scholar
  29. 29.
    Tino A, Khan GN (2011) Multi-objective Tabu Search based topology generation technique for application-specific Network-on-Chip architectures. In: Proceedings of Design, Automation and Test in Europe Conference & Exhibition (DATE), pp 1–6Google Scholar
  30. 30.
    Wang K, Dong S (2014) Post-floorplanning power optimization for MSV-driven application specific NoC design. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp 994–997Google Scholar
  31. 31.
    Yu S, Ge F, Feng G et al (2013) A two-phase floorplanning approach for application-specific Network-on-Chip. In: Proceedings of IEEE 10th International Conference on ASIC, pp 1–4Google Scholar
  32. 32.
    Hao J, Yin J, Zhang B (2007) Structural fault tolerance of scale-free networks. Tsinghua Sci Technol 12(S1):246–249CrossRefGoogle Scholar
  33. 33.
    Duraisamy K, Lu H, Pande P et al (2015) High performance and energy efficient wireless NoC-enabled multicore architecture for graph analytics. In: Proceedings of International Conference on Compilers, Architectures and Synthesis of Embedded Systems (CASES), pp 147–156Google Scholar
  34. 34.
    Hollis SJ, Jackson C, Bogdan P et al (2014) Exploiting emergence in on-chip interconnects. IEEE Trans Comput 63(3):570–582MathSciNetCrossRefMATHGoogle Scholar
  35. 35.
    Wan-Yu L, Jiang IHR (2008) Topology generation and floorplanning for low power application-specific Network-on-Chips. In: Proceedings of IEEE International Symposium on VLSI design, automation and test (VLSI-DAT), pp 283–286Google Scholar
  36. 36.
    Dally WJ, Towles B (2001) Route packets, not wires: on-chip interconnection networks. In: Proceedings of Design Automation Conference, pp 684–689Google Scholar
  37. 37.
    Crespo F, Weber R (2005) A methodology for dynamic data mining based on fuzzy clustering. Fuzzy Sets Syst 150(2):267–284MathSciNetCrossRefMATHGoogle Scholar
  38. 38.
    Albert-laszlo B, Reka A (1999) Emergence of scaling in random networks. Science 286(5439):509–512MathSciNetCrossRefMATHGoogle Scholar
  39. 39.
    Park S, Qazi M, Peh L-S et al (2013) 40.4 fJ/bit/mm low-swing on-chip signaling with self-resetting logic repeaters embedded within a mesh NoC in 45 nm SOI CMOS. In: Proceedings of Design, Automation and Test in Europe Conference and Exhibition (DATE), pp 1637–1642Google Scholar
  40. 40.
    Bertozzi DJ, Srinivasan A, Tamhankar M, Stergiou R, Benini S, De Micheli LG (2005) NoC synthesis flow for customized domain specific multiprocessor systems-on-chip”. IEEE Trans Parallel Distrib Syst 16(2):113–129CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.State Key Laboratory of ISNXidian UniversityXi’anChina
  2. 2.School of Computer ScienceXidian UniversityXi’anChina
  3. 3.Institute of MicroelectronicsXidian UniversityXi’anChina
  4. 4.Department of Computer ScienceUniversity of OtagoDunedinNew Zealand

Personalised recommendations