Keywords

1 Introduction

This paper details the selection of a small set of scales that in further research will be used to tease apart the psychological effect of the features that have been ascribed to the diatonic scale. It achieves this through first collecting appropriate features from a review of the literature and defining some more, then by calculating the value of each of these features for the set of 7-note scales of 22-TET, choosing an appropriate subset of values, and conducting a k-medoids clustering of the scales in this feature space, resulting in the selection of a small set of exemplar scales that represent distinct clusters from an optimal clustering of the set, that we see as best representative of the entire space. Outside the scope of this paper, this exemplar set is to be used in a perceptual experiment, testing for the effect of these features on the cognition of harmonic tonality using the scales as stimulus.

The features may be divided into six groups: Generator complexity, R-ad entropy, redundancy, coherence and evenness, consonance, and tetrachordality. Given that almost all of these features are defined based on the diatonic scale, it shows extreme values for all of them. It boasts equal lowest generator complexity and R-ad entropy and equal highest redundancy for 7-note scales in 12-TET. It is also the maximally even 7-note scale in 12-TET, and 12-TET’s only omnitetrahordal scale. The diatonic scale also maximises the number of constituent consonant triads. Since in 12-TET the diatonic scale holds a monopoly on many of these features, we need to look elsewhere if we are to tease them apart. 22-TET is chosen as it is the simplest tuning wherein a single scale no longer holds a monopoly, but where all the features we define exist across an appropriate range of values in some scales. We choose also to limit our analysis to scales of 7 notes to minimise the size of our set and simplify our analysis. Beginning our review with redundancy, we first introduce how we will be discussing scales in this article.

2 Review

In this paper a scale is considered to be an equivalence class by rotation of ordered sets of specific intervals called steps, where the different rotations of a scale are its modes. Scales in equal temperaments (ETs) are written in their “brightest" mode (the mode in which the larger steps are most concentrated towards the beginning; the lexicographically highest mode), unless otherwise indicated, with step sizes written in degrees of the equal temperament. For example, the diatonic scale in 12-TET is represented as 2221221.

2.1 Redundancy

Carey suggests that a pitch class set can be considered a scale, ‘when its generic intervals efficiently organize and encode its specific intervals. Put simply, a scale is that kind of pitch-class set in which it makes sense to think about intervals generically’ [4].

Redundancy and coherence concern the relationship between specific and generic intervals. Redundancy concerns the certainty with which a generic interval infers a specific interval, while coherence concerns the inverse: the certainty with which a specific interval infers a generic interval.

Considering redundancy, Rothenberg defines the variety of a generic interval as the number of specific sizes it comes in. Mean variety [19] and maximum variety follow directly from this, considering all the generic intervals of the scale (up to N-1, where N is the cardinality of the scale). Wilson notes that some generated scales – scales that can be produced from the iterated addition of a specific interval modulo the period [11] – possess the property that the maximum variety is two. He calls these scales Moment of Symmetry or MOS scales [21, 22].

Clough and Douthett defined maximally even (ME) scales as scales in which each generic interval has either one or two adjacent specific intervals, meaning that it is ‘distributed as evenely as possible’ [9]. ME scales are a subset of Distributionally even (DE scales), where each generic interval comes in either one or two specific intervals [11].

Similarly, Carey and Clampitt [6] define a well-formed (WF) scale as a generated scale in which the generator is of invariant generic size. They divide WF scales into two types: degenerate, the set of equal temperaments, and non-degenerate, the set of scales that possess Myhill’s property – that each generic interval comes in exactly two specific sizes [12]. In non-equal scales of prime cardinality, WF, DE and MOS are equivalent. We refer henceforth to these scales as WF.

After Myhill’s property for well-formed scales, trivalent scales are defined such that each generic interval comes in three specific sizes. Consider the JI major scale 9/8 5/4 4/3 3/2 5/3 15/8 2/1. With steps of 9/8 10/9 16/15 9/8 10/9 9/8 16/15, it is trivalent [5, 7]. In meantone temperament, the minor and major tones – 10/9 and 9/8 – are tempered to equivalence (tempering out their difference, 81/80). This leads us back to the well-formed meantone diatonic, which may be described, in the major mode, as LLsLLLs. If we take any other pair of step sizes to be equivalent, we also are led to well-formed scales. i.e., taking 10/9 to be equivalent to 16/15 (tempering out 25/24) leads to LssLsLs, and taking 9/8 to be equal to 16/15 (tempering out 135/128) leads to sLssLss. This property is described by Clampitt as pairwise well-formedness [8].

Carey later introduces the concept of strong n-valence as a generalisation to a consequence of Myhill’s property: ‘Let n represent the number of distinct step sizes per span. If the set of (n)(n-1)/2 (positive) differences between the n step sizes is the same for each span, the set has strong n-valence’ [5]. He conjectures that ‘iff a set of odd cardinality has strong trivalence then it is pairwise well-formed’.

An instance of a pair of intervals of the same generic size which differ in specific size is called a difference. Carey’s sameness quotient gives a continuous measure of the infrequency of difference in a scale, which is where a pair of intervals of the same generic size differs in specific size [4].

Another similar feature, which will here be called n-chord entropy, introduced recently by Milne and Dean [17] considers the entropy of the distribution of n-chords, which are n note factors/segments of the scale (we are most familiar with n-chords when n is 4; i.e., tetrachords). The probability mass function

$$P_{i}(n) $$

is the number of occurrences of each different n-chord, divided by the number of notes in the scale. Then the n-chord entropy in bits is as follows:

$$\begin{aligned} E(P) = -\sum _{i}^{} P_{i} log_{2}P_{i} \end{aligned}$$
(1)

n-chord entropy is defined in a scale of N notes for

$$2 \le n \le N-1$$

.

2.2 Coherence and Evenness

Rothenberg [18] introduced the concept of propriety, where a scale is considered proper if no specific interval of generic interval n is larger than any specific interval of generic interval n+1. The diatonic scale in 12-TET is proper, but not strictly proper, where a scale is considered to be strictly proper if no specific interval of generic size n is equal to or larger than any specific interval of generic size n+1. A pair of intervals for which strict propriety fails for a scale is called a failure. Failures may be contradictions, when propriety also fails, or ambiguities, where only strict propriety fails.

Balzano [3] independently introduced the concept of coherence, equivalent to strict propriety. He then then defined a weaker version of coherence which the diatonic scale in 12-TET passes, in which ambiguity is allowed an interval of half an octave (the tritone).

Tuned as it was for centuries to Pythagorean intonation, the diatonic scale is improper, where the Aug 4 is sharper than the dim 5. With Meantone tempering it is strictly proper. Clearly a scale does not need to be strictly proper or even proper to be tonal, and accordingly we do not include binary coherence features in our analysis. Non-binary measures for coherence have also been defined, by which the various tunings of the diatonic scale recieve extreme values.

Similar to his sameness quotient, in the same paper Carey introduced a coherence quotient as a continuous measure for the infrequency of failures of coherence (ambiguity or contradiction) [4].

Along with propriety, Rothenberg introduced stability, with which proper scales can be compared, defined as the portion of unambiguous intervals, out of all N(N-1) possible intervals [18]. Unlike Carey’s coherence quotient which considers both ambiguities and contradictions, Rothenburg stability concerns only ambiguities (of any degree). Given that it is only defined for proper scales, we do not include it in our analysis.

Thus far no feature directly concerns the relative size of intervals in the scale. Lumma introduces two concepts intended to take this into account. The first of these – Lumma stability – is an extension of Rothenberg’s stability. Lumma stability is the portion of the octave that is not covered with the spans of each generic interval class. The portion of the octave more than singly covered by the spans of each generic interval class is defined as Lumma impropriety [13].

Evenness also directly concerns the relative sizes of intervals of a scale, measuring the similarity the scale to an ET of the same cardinality. For more thorough definitions and formulae see [1, 2]. Evenness can be seen as a continuous generalization of the binary measure of maximal evenness.

2.3 R-ad entropy

We define R-ad entropy as the entropy of the distribution of subsets of R notes – “R-ads” – from a scale of cardinality N where R ranges from 2 to N-1. We consider however only R-values of 2 and 3, corresponding to dyads and triads, as we consider larger subsets of notes to less important to tonality. The entropy in bits is calculated as in n-chord entropy above, using the probability mass function of the number of occurences of each different R-ad, divided by the total number of R-ads.

2.4 Generator Complexity

Generator complexity considers the compactness with which the scale can be represented in a minimum number of dimensions. Where the Graham complexity, after Graham Breed, is the number of generators needed to reach an interval in a scale or a 2-dimensional tuning system, we define scalar Graham complexity (SGC) as the minimum number of generators of a given size needed to cover a scale, across all possible sizes of generator. It follows that the scalar Graham complexity or SGC of any generated scale of n notes is n-1. Carey [4] suggests that both the minimum number of different generators for which it may be considered a generated scale (for which we were unable to build an algorithm) and the acoustic dissonance of the generators affects its scale candidacy.

2.5 Consonance and Tetrachordality

Consonance has received more definitions than there are researchers who write about it. We do not wish to give any definition of consonance, but to simply observe that the diatonic scale contains the highest number of triads and dyads generally considered to be consonant (e.g., perfect intervals, major and minor thirds and sixths, major and minor triads) out of any 7-note scale in 12-TET.

In tonal-harmonic music the tonic function belongs not only to a note but to a consonant triad (either major or minor). Major and minor triads are tertian in the diatonic scale, meaning that above the notes are separated by thirds in the scale. Since the consonance of triads in 22-TET is unknown, an experiment was run to collect such data. Added to our analysis are measures of the maximum, median and minimum perceived consonance of the tertian triads of each scale.

In terms of dyads, we assume that in 22-TET, as in 12-TET, the perfect fifth remains the strongest consonance (other than the octave). Tetrachordal scales maximize similarity at intervals of a perfect fifth and fourth, combining consonance with redundancy. A mode of a scale is said to be tetrachordal if it consists of two identical non-overlapping tetrachords that span an approximation of 4/3 (along with, necessarily, a step of an approximation of 9/8 as a remainder). Erlich [15] defined a tetrachordal scale as a scale all of whose modes are tetrachordal. Such scales are now referred to more clearly as omnitetrachordal [16]. The diatonic scale is the only omnitetrachordal scale in 12-TET. We define tetrachordality as the number of modes of a scale of N notes that are tetrachordal, divided by the total number of modes, N.

3 Analysis

In order of mention, our features for analysis, according to their classification, are:

figure a

3.1 Reduction

We assume that, especially given the classification of these features into 6 groups, many may not be linearly independent of each other. 23 is also a large number of features to consider in a cluster analysis and so we reduce the dimensionality. A dimensional reduction could be used, however in order to later test the extent to which these features may mediate the ability of a scale to support harmonic tonality, we instead select a subset of features that are least able to be expressed as linear combinations of the others. In order to achieve this, the features are calculated for every 7-note scale in 22-TET. The variance inflation factor (the factor by which the variance of a predictor is inflated compared to what you would expect if there was no multicollinearity; no correlation between predictors) is calculated for all the features, measuring the extent to which they may be predicted by a linear combination of the other features. The feature with the highest variance inflation factor is removed, and the processes is iterated until the variance inflation factor for all remaining features is less than 2.

We found immediately that some of our features correlated 100% with each other: Hexachord entropy had only two values, depending on whether or not the scale was WF. It might be worth looking into n-chord entropy then, in future work, as a generalisation of well-formedness. Strong trivalence, we found, correlated 100% with trivalence. Where strong trivalence did not correlate 100% with pairwise well-formedness we have disproven Carey’s conjecture: For example, 4334332 is an example of a strongly trivalent scale that is not pairwise well-formed. Though Carey proves by example that not all trivalent scales are strongly trivalent, we found that all trivalent 7-note scales in 22-TET are strongly trivalent. Removing hexachord entropy and strong trivalence first, our procedure leads us to the following features:

figure b

The feature of evenness, we suspect, is captured, along with coherence, in Lumma stability and impropriety, given that they involve direct measures of relative interval size.

3.2 Cluster Analysis

Considering that our features are of different types of values – binary and continuous – we use Mahalanobis distance as our distance measure for our clustering. K-medoids clustering is used (via Partitioning Around Medoids in R) rather than k-means clustering given that exemplar scales from the original set are needed.

In order to test for the appropriateness of different numbers of clusters, we measure the average silhouette width for each clustering. The silhouette width for a single object is a measure of how similar it is to the cluster to which it is assigned, compared to the other clusters. It ranges from −1 to 1, where a high value indicates that the object is well classified in its cluster and a value below 0 indicates it is closer to another cluster, and may be misclassified [20].

The clustering algorithm leads us to a maximum at 9 clusters, with an average silhouette width of 0.26. We observed however that the average silhouette width, and therefore the clustering may be substantially improved by leaving the vast majority of scales in a single cluster rather than splitting them into multiple clusters. Accordingly, from the initial clustering solutions for 3 to 40 clusters we combined clusters such that the average silhouette width most improved. Further, misclassified scales (those with negative silouette width) were moved into the cluster they are closest to when appropriate. Via these processes, we find a maximum average silhouette width of 0.9877 at 2 clusters, where one cluster is the scale 76 (4441441) and the other cluster is every other scale. We know from this that the scale 4441441 is the most distinct scale in terms of our features. 3 clusters give the second best solution, consisting of 4441441, scale 1 (4333333) and the remainder, with average silhouette value 0.9857. Following this, the other 5 well-formed scales split from the remainder group as a cluster (for an average silhouette width of 0.9806), followed by scales 50 and 32 (for an average silhouette value of 0.9729) followed by scale 11 (4342432) and its inverse, scale 13 (4342342) the pairwise well-formed (PWF) JI major scale (for an average silhouette width of 0.9606). The average silhouette width decreases incrementally for each larger number of new clusters until 12 clusters, in which the decrement from 11 clusters is substantially larger (0.9018 to 0.7646).

Table 1. Exemplar scales associated with each successive cluster added.

A principal component analysis is run in order to reduce the dimensionality of the space such that we may visualize the clustering solution. The following diagrams show rotations of a plot of the clustering in the first three principal components (which account for 22%, 18% and 10% of the variance respectively). For interpretability, the representation of the 13 features in the principal components are plotted as vectors with labels at 15 standard deviations from the origin, though PWF is mostly hidden (you can kind of see it in the cluster, which comprises PWF scales). 11 clusters seems quite appropriate looking at the clustering, and 11 scales is already pushing towards, or possibly through the limit on how many scales we can test in an experiment. Accordingly we take 11 clusters to be a stopping point. For supplementary material, including an interactive 3D PCA plot of the clustering, data for all 7752 scales, and sound files for the exemplar scales, follow this link: https://en.xen.wiki/w/User:Gareth.hearne/Analysis22-7

Fig. 1.
figure 1

3D clustering view 1

Table 2. Values of features for exemplar scales.
Fig. 2.
figure 2

3D clustering view 2

Table 3. Z-scores of features for exemplar scales.

3.3 Exemplar Scales

Table 1 displays the exemplar scales in hexadecimal, along with their scale ID, so they can be located in the cluster diagram. They are ordered such that the first n scales are the exemplars for the best n-cluster solution, and the size of each new cluster, and the average silouette value for each associated successive clustering is also shown.

Tables 2 and 3 display the values of the 13 features and their z-scores for these scales.

The scale 8113621 represents the vast majority of scales (), and for which all features are valued within 1 standard deviation of the mean. The scale 4441441 (), the WF scale generated by the approximation of 3/2, is the most exceptional (and probably the most similar to 12-TET’s diatonic scale), and the scale 4333333 (), the maximally even scale, the second most exceptional. The scale 6226222 represents the other 5 WF scales (). 4432432 represents itself and its inverse 4423423 (), the two scales with tetrachordality of 5/7. 4342432 represents itself and its inverse 4342342 (), the PWF scales with tetrachordalty value 3/7. 4343332 represents the remaining scales that are relatively consonant, with Lumma stability above 0 and low Lumma impropriety (). 6142612 represents the other scales with tetrachordality 3/7 (). The remaining PWF scales are split between the clusters represented by 7414141 (), and 8121811 (), the first being those with pentachord entropy 2.52, and the second, 2.81, which is very close to the mean for all scales. The final exemplar scale B122222 represents the scales with pentachord entropy 2.52 that are not PWF () (Figs. 1 and 2).

The clustering seems to be dominated by WF, PWF and tetrachordality, the variables for which the few possible values other than 0 are very rare. This is probably because extreme values of these features can cause scales to “stand out” more overall than extreme values of other features.

We note that scales 1, 13, 50 and 76 can be thought of as 22-TET’s approximations of 4 different JI representations of the diatonic scale. We’ll begin with scale 13, 4342432, which, in its mode 4324342, is 22-TET’s approximation of the JI major scale, 9/8 5/4 4/3 3/2 5/3 15/8 2/1, which is PWF. Scale 50 is very similar. In its mode 4324432 its 22-TET’s approximation of an alternative JI major scale, 9/8 5/4 4/3 3/2 27/16 15/8 2/1. This scale is not PWF, but it has tetrachordality 5/7, rather than 3/7. If we take its steps to be of 22 (unequal) śruti of early Indian music, rather than of degrees of 22-TET, these two scales are (modes of) the two basic scales of early Indian music, Ma grāma and Sa grāma, respectively [10, 14, 15]. A third scale, ‘Ga grāma’, though less frequently discussed, also existed. Though the tuning is quoted differently across sources, Daniélou [14] suggests that it is 3334333, which in 22-TET is scale 1, 22-tET’s approximation of the PWF JI dorian scale 10/9 6/5 4/3 3/2 5/3 9/5 2/1.

Finally, scale 76 in its mode 4414441 we already know is 22-TET’s approximation of the Pythaogrean diatonic scale 9/8 81/64 4/3 3/2 27/16 243/128 2/1. It can also be though of as approximating the scale 9/8 9/7 4/3 3/2 27/16 27/14 2/1, in a similar way to how in 12-TET the scale 2212221 approximates both Pythaogrean and JI major scales.

The last two scales (4414441 and 4333333), the most distinct in 22-TET, are probably the most popular among musicians who use 22-TET, referred to as ‘Superpyth [7]’ and ‘Porcupine [7]’ respectively. This analysis suggests we should not be surprised by this.