# How Complex Is a Fractal? Head/tail Breaks and Fractional Hierarchy

## Abstract

A fractal bears a complex structure that is reflected in a scaling hierarchy, indicating that there are far more small things than large ones. This scaling hierarchy can be effectively derived using head/tail breaks—a clustering and visualization tool for data with a heavy-tailed distribution—and quantified by a head/tail breaks-induced integer, called ht-index, indicating the number of clusters or hierarchical levels. However, this integral ht-index has been found to be less precise for many fractals at their different phrases of development. This paper refines the ht-index as a fraction to measure the scaling hierarchy of a fractal more precisely within a coherent whole and further assigns a fractional ht-index—the fht-index—to an individual data value of a data series that represents the fractal. We developed two case studies to demonstrate the advantages of the fht-index, in comparison with the ht-index. We found that the fractional ht-index or fractional hierarchy in general can help characterize a fractal set or pattern in a much more precise manner. The index may help create intermediate map scales between two consecutive map scales.

## Keywords

Ht-index Fractal Scaling Complexity Fht-index## Introduction

All fractals bear a complex structure with far more small things than large ones. This notion of far more small things than large ones, being recursive in nature, can be expressed as a scaling hierarchy of numerous smallest things, a very few largest, and some in between the smallest and the largest. The scaling hierarchy can be revealed by head/tail breaks, which is a clustering and visualization tool for data with a heavy-tailed distribution (Jiang 2013, 2015a). More specifically, a data series is ranked from the largest to smallest, and then its average is to partition the data series into two parts: those greater than the average as the head, accounting for the minority of the data series, and those less than the average as the tail, accounting for the majority. This partition process continues for the head iteratively until the head is no longer a minority (for example, 40%). This recursive partition or head/tail breaks process leads to a number of clusters or hierarchical levels that are measured by the ht-index (Jiang and Yin 2014). To illustrate, given ten numbers that follow Zipf’s law (Zipf 1949) exactly, 1, 1/2, 1/3, …, 1/10, as a whole, their average is 0.29, which partitions the ten numbers into the largest three in the head and the remaining seven in the tail. For the three in the head 1, 1/2, and 1/3, as a sub-whole, their average is 0.61, which, again iteratively, partitions the largest three in the head into one (1) in the head and two (1, 1/2) in the tail. Thus, the scaling pattern of far more small numbers than large ones recurs twice, so the ht-index of the ten numbers is three.

The head/tail breaks or ht-index has been used to re-define fractal, leading to the so-called third definition of fractal: *a set or pattern is fractal if the scaling pattern of far more small things than large ones recurs multiple times or with the ht-index being at least three* (Jiang and Yin 2014; Jiang 2015a). Under the new definition, a fractal is simply characterized by a data series of a heavy-tailed distribution, with the ht-index indicating its scaling hierarchy or complexity—the higher the ht-index of a fractal, the more complex the fractal. The ht-index is an integer, so complexity or scaling hierarchy is measured by the integral ht-index. In this paper, we will develop a real number or fraction to measure the scaling hierarchy in a more precise manner. The extended ht-index or fractional hierarchy in general may have a variety of applications related to geospatial data, including map generalization.

## Motivation

Scaling hierarchy cannot always be an integer. For example, a data series of ten numbers {1, 1/2, 1/3, …, 1/10} has the ht-index of three, as seen above. If we append five small numbers such as 1/11, 1/12, …, and 1/15 into the data series to become {1, 1/2, 1/3, …, 1/10, 1/11, 1/12, …, 1/15}, the ht-index remains unchanged. This implies that ht-index is not sensitive to some small changes (Gao et al. 2016a, b, 2017), although it has been used for characterizing fractal cities and hierarchical scaling (e.g., Long 2016). The ht-index, as previously defined, is likely to be rounded from a fractional ht-index (fht-index). In other words, scaling hierarchy could be a fraction. The present paper aims to seek a more precise ht-index—namely, the fht-index—for characterizing hierarchy of a data series with a heavy-tailed distribution. This paper further assigns an fht-index to an individual data value of a data series indicating its appropriate hierarchical level.

*x*, where 0 <

*x*< 1. However, ht-index as previously defined (Jiang and Yin 2014) captures only approximately scaling hierarchy and is therefore less sensitive to some small changes. This is what motivates us to develop the fht-index.

## Wholes and Sub-wholes

A fundamental concept of this paper is whole or sub-wholes. Assuming that the above ten numbers constitute a complete whole, the first three numbers or the first head would be a sub-whole. In other words, given a data series as a whole, its head and the head of the head (in a recursive fashion) would be the sub-wholes. This is just a simple understanding of whole or sub-wholes. The reader needs to refer to the following formal definition and methods for better understanding the whole or sub-wholes. It is important to realize that the curve shown in panel e of Fig. 1 is not a whole, but part of a whole—the curve shown in panel c of the same figure. In this paper, a whole is defined as a data series of *n* values that ranges from the largest to smallest and meets the following condition: ht-index(*n*) – ht-index(*n*-1) = 1. For example, the 52 segments constitute a whole because ht-index(52) – ht-index(51) = 1. This definition of whole applies to sub-wholes as well. For example, the first 13 values of the 52 segments constitute a sub-whole because ht-index(13) – ht-index(12) = 1. According to the definition of whole or sub-whole, a Koch curve is not a whole, but the seemingly incomplete Koch curves shown in panels a and c are a sub-whole or whole. In other words, the curve in panel e of Fig. 1 is a whole according to the strict definition of Koch curve, but it is not a whole according to the very definition of head/tail breaks.

Given the 52 segments as a whole, ranking all its segments from the longest (of scale 1) to the shortest (of scale 1/27) creates a data series shown in panel g of Fig. 1—the row named “whole”—where data and its whole are shown together with its index in the first three rows. We have already derived the sub-whole of the 13 segments in the previous paragraph with the ht-index of 3. We further determine other sub-wholes or sub-data: the first three segments {1, 1/3, 1/3} with the ht-index of 2 and the first segment {1} with the ht-index of 1. All these sub-wholes (or sub-data series) are with integral ht-indexes as shown in panel g of Fig. 1. These indexes with integral ht-indexes are called anchors for each sub-whole or whole. Note that the sub-whole and whole constitute a nested relationship; that is, the first sub-whole is within the second sub-whole, the first two sub-wholes are within the third sub-whole, and all the three sub-wholes are within the whole.

## Methods—fht-Index for a Data Series and Its Individual Data

In order to determine the fht-index of the first 21 segments, we divided the data series range between the 13th and the 52nd (or the range between the third and fourth anchors) equally into 39 intervals and converted the equal intervals from a linear scale to a nonlinear scale using a power function of *f*(interval_{ j }) = (*j* * interval)^{2}, where *j* is the index of each interval. This provides us with the fht-index of the first 21 segments: 3.042 (or *x* = 0.042 in panel g of Fig. 1).

To summarize the calculation of the fht-index in general, given a data series, we first seek its whole by appending new data values up to the next hierarchical level and sub-wholes by shrinking the data series to previous levels recursively. A whole is obtained from a data series by appending small values at its smallest end until the ht-index is increased to the next level exactly. In a similar vein, starting from the first value as the first sub-whole, more sub-wholes are obtained by adding values one by one until ht-index is increased to a next level exactly. A whole and its sub-wholes constitute nesting relationships. As a rule for determining the whole and sub-wholes or the anchors, the ht-index at index *k* must meet the condition of ht-index(*k*) – ht-index(*k*-1) = 1. Next, the range between two largest anchors, representing the largest sub-whole and the whole, respectively, should be equally interpolated and the equal intervals are then converted into a nonlinear scale to get the fht-index of a data series.

*y*= 0.63 in panel g of Fig. 1). The above procedure for a whole can be packed as a function of the fht-index:

For a data series that is not incidentally a whole, it is necessary to append some smallest values in order to make it a whole. While this is simple for the Koch curves, for real-world data, it is important to get its trend line that best fits the data series. In this regard, it is recommended to use trend line functions such as power law, logarithmic, polynominal, and exponential. As a rule, the most-fit trend line must be chosen for a specific data series. The fht-index (e.g., 3.*x*) of the data series is obtained by interpolating the range between the largest sub-whole and the whole. The anchors are with integral ht-indexes, but in the opposite order: the largest anchors with the smallest integral ht-index and smallest anchors with the largest integral ht-index. Those data values between anchors or between the largest anchors and the whole must be obtained through interpolation. To this point, we have relied on the Koch curves to illustrate the ideas of fht-index in order to make it more accessible to experts as well as non-experts.

## Case Studies and FHTCalculator

*P*= 5.03

*r*

^{−2.1}of the 8016 city sizes. Unlike the ht-indexes that are discrete, fht-indexes for individual data values, as shown in panels b and d, are continuous and thus capture scaling hierarchy more precisely than the discrete ht-indexes. The fht-indexes of these two data are 3.81 and 7.04, respectively, based on the methods introduced above or by applying these data series into FHTCalculator (2017). The fht-index of individual value within the two data is plotted in Fig. 2. Note that the fht-indexes are not simply interpolated from the discrete ht-index, but are recalculated from their wholes as described above. We developed a small program for computing fht-index, called FHTCalculator (2017). The computing for the two case studies can be done within a few seconds. This program has been made available in GitHub, and interested readers can try it with their own data.

## Implications

Existing fractals, both classic and statistical, are essentially defined from the top down, i.e., either a strict or statistical fractal can be generated by following a rule endlessly, such as the Koch curve or the statistical Koch curve; see the literature on the theory (Mandelbrot 1967, 1982) and its applications in geography (e.g., Batty and Longley 1994; Frankhauser 1994; Chen 2011). The new relaxed definition or the third definition of fractal is imposed from the bottom up, capturing the underlying scaling hierarchy of far more small things than large ones through the ht-index. The fht-index takes a step further because it can more precisely measure the degree of hierarchy from a previously discrete value to continuous value, or from a previous integer to real number.

This continuous value is sensitive enough to capture different phrases of a fractal from its initial stage to matured stage with the fht-index increasing slowly or gradually (rather than rapidly as the ht-index). It is in this sense that we believe that the new fractal geometry focuses not only on statics but also on dynamics. This new fractal geometry is very much in line with living geometry developed by Alexander (2002–2005). The living geometry aims not only for understanding fractal structure but also for making complex or living structure. In this connection, the fht-index provides an excellent means for judging living structure, for example, when it is applied for measuring degree of livingness (Jiang 2015b). In addition, the fht-index can more precisely characterize spatial heterogeneity that is more pervasive or ubiquitous in geography (Jiang 2015c). By focusing on fractal structure of far more small things than large ones, in addition to spatially auto-correlated things (Tobler 1970), the fht-index represents a new approach for geospatial analysis.

## Conclusion

This paper refines the ht-index to be a fraction to better characterize the scaling hierarchy of a fractal or data series with a heavy-tailed distribution. The existing integral ht-index is implicitly based on the assumption that any given data series of a heavy-tailed distribution is always a whole. This assumption does not always hold true. In many cases, a data series is likely to be part of a whole rather than a whole itself. Based on this new perception, we put a data series within a whole and seek its sub-wholes or anchors in order to derive its fht-index. This fht-index is always greater than or equal to the integral ht-index. We further assign an fht-index to each data value of the data series. More precisely, the anchors are with integral ht-indexes, while other data values or non-anchors are with fht-indexes. The fht-index may help measure degree of living structure or more efficiently and effectively visualize fractal urban structure and nonlinear dynamics, since the structure and dynamics have been firstly captured by the fht-index. In the future, we will seek applications of the fht-index to better characterize geographic forms and processes, or urban structure and dynamics in particular, and even beyond the understanding towards the making—how to better heal and design built environments.

## References

- Alexander C (2002-2005) The nature of order: an essay on the art of building and the nature of the universe. Center for Environmental Structure, BerkeleyGoogle Scholar
- Batty M, Longley P (1994) Fractal cities: a geometry of form and function. Academic Press, LondonGoogle Scholar
- Chen Y (2011) Modeling fractal structure of city-size distributions using correlation functions. PLoS One 6(9):e24791. https://doi.org/10.1371/journal.pone.0024791 CrossRefGoogle Scholar
- FHTCalculator (2017), https://github.com/dingmartin/FHTCalculator
- Frankhauser P (1994) La fractalit’e des structures urbaines [the fractals of urban structure]. Economica, ParisGoogle Scholar
- Gao PC, Liu Z, Xie MH, Tian K, Liu G (2016a) CRG index: a more sensitive ht-index for enabling dynamic views of geographic features. Prof Geogr 68(4):533–545CrossRefGoogle Scholar
- Gao PC, Liu Z, Tian K, Liu G (2016b) Characterizing traffic conditions from the perspective of spatial-temporal heterogeneity. ISPRS Int J Geo-Informat 5(3):34. https://doi.org/10.3390/ijgi5030034 CrossRefGoogle Scholar
- Gao P, Liu Z, Liu G, Zhao H, Xie X (2017) Unified metrics for characterizing the fractal nature of geographic features. Ann Am Assoc Geogr:1–17. https://doi.org/10.1080/24694452.2017.1310022
- Jiang B (2013) Head/tail breaks: a new classification scheme for data with a heavy-tailed distribution. Prof Geogr 65(3):482–494CrossRefGoogle Scholar
- Jiang B (2015a) Head/tail breaks for visualization of city structure and dynamics. Cities 43:69–77CrossRefGoogle Scholar
- Jiang B (2015b) Wholeness as a hierarchical graph to capture the nature of space. Int J Geogr Inf Sci 29(9):1632–1648CrossRefGoogle Scholar
- Jiang B (2015c) Geospatial analysis requires a different way of thinking: the problem of spatial heterogeneity. GeoJournal 80(1):1–13CrossRefGoogle Scholar
- Jiang B, Miao Y (2015) The evolution of natural cities from the perspective of location-based social media. Prof Geogr 67(2):295–306 data source available at: https://www.researchgate.net/publication/303895757_BrightkiteCheckinLocation CrossRefGoogle Scholar
- Jiang B, Yin J (2014) Ht-index for quantifying the fractal or scaling structure of geographic features. Ann Assoc Am Geogr 104(3):530–541CrossRefGoogle Scholar
- Long Y (2016) Redefining Chinese city system with emerging new data. Appl Geogr 75:36–48CrossRefGoogle Scholar
- Mandelbrot BB (1967) How long is the coast of Britain? Statistical self-similarity and fractional dimension. Science 156(3775):636–638CrossRefGoogle Scholar
- Mandelbrot B. B. (1982), The fractal geometry of nature, W. H. Freeman and Co.: New YorkGoogle Scholar
- Tobler W (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46(2):234–240CrossRefGoogle Scholar
- Zipf GK (1949) Human behavior and the principles of least effort. Addison Wesley, CambridgeGoogle Scholar

## Copyright information

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.