Keywords

1 Introduction

Knowledge graphs are data storage structures that rely on principles from graph theory to represent information. Specifically, facts are stored as triples which bring together two entities through a relation. In a graphical context, these entities are analogous to nodes, and the relations between them are analogous to edges. In recent years, knowledge graphs have garnered widespread attention as a medium for storing data on the web. Public knowledge bases such as DBpedia [13], YAGO [12], and WikiData [28] are all underpinned by large-scale knowledge graphs containing upwards of one billion triples each. These knowledge bases find uses in personal, academic, and commercial domains and are ubiquitous in the research fields of the Semantic Web, artificial intelligence, and computer science broadly. Furthermore, private companies are known to use proprietary knowledge graphs as a component of their data stores. Google, for instance, uses a knowledge graph derived from Freebase [6] to enhance their search engine results by providing infoboxes which summarize facts about a user’s query [24].

Ontologies are often used in conjunction with knowledge graphs to provide an axiomatic foundation on which knowledge graphs are built. In this view, an ontology may be seen as a rule book that provides semantics to a knowledge graph and governs how the information contained within it can be reasoned with. One of the core components of an ontology is the class taxonomy: a set of subsumption axioms between the type classes that may exists in the knowledge graph. When put together, the subsumption axioms form a hierarchy of classes where general concepts appear at the top and their subconcepts appear as their descendants.

One of the challenges that arise when working with large knowledge graphs is that of class taxonomy construction. Manual construction is time consuming and requires curators knowledgeable in the area. DBpedia, for instance, relies on its community to curate its class taxonomy. Similarly, YAGO relies on a combination of information from WikipediaFootnote 1 and WordNetFootnote 2, both of which are manually curated. On the other hand, automated methods are not able to induce class taxonomies of the quality necessary to reliably apply to complex knowledge bases. Furthermore, they oftentimes rely on external information which may itself be manually curated or may only be applicable to knowledge bases in a particular domain. With this in mind, the impetus for automatically inducing class taxonomies of high quality from large-scale knowledge graphs becomes apparent.

In this paper, we propose a scalable method for inducing class taxonomies from knowledge graphs without relying on information external to the knowledge graph’s triples. Our approach applies methods used to solve the problem of tag hierarchy induction, which involves inducing a hierarchy of tags from a collection of documents and the tags that annotate them. Although extensively studied in the field of natural language processing, these methods have yet to be applied to knowledge graphs to the best of our knowledge. In order to use these methods, we reshape the knowledge graph’s triple structure to a tuple structure, exploiting the graph’s single dimensionality in assigning entities to type classes. Furthermore, we propose a novel approach to inducing class taxonomies which outperforms existing tag hierarchy induction methods both in terms scalability and quality of induced taxonomies.

The remainder of this paper proceeds with Sect. 2 which provides an overview of the existing work done on inducing class taxonomies and tag hierarchies. We formalize the problem and introduce notation in Sect. 3. Our proposed method is described in Sect. 4 and evaluated in Sect. 5. Section 6 concludes the paper.

2 Related Work

We divide our discussion of related work into two subsections: class taxonomy induction methods and tag hierarchy induction methods. Both of these methods are used to construct a hierarchy of concepts, however they differ in the type of data they are applied to. Class taxonomy induction methods are used on knowledge graphs and thus operate on data represented as triples. Tag hierarchy induction methods operate on documents and the tags that annotate them. In practice, these documents are often blog posts, images, and videos annotated by users on social networking websites. We can view our proposed method as a combination of the aforementioned categories as it takes the input structure of documents and tags but is applied to knowledge graphs to induce a class taxonomy.

2.1 Methods for Class Taxonomy Induction

Völker and Niepert [27] introduce Statistical Schema Induction which uses association rule mining on a knowledge graph’s transaction table to generate ontology axioms. Each row in the transaction table corresponds to a subject in the graph along with the classes it belongs to. Implication patterns which are consistent with the table are mined from this table to create candidate ontology axioms. The candidate axioms are then sorted in terms of descending certainty values and added greedily to the ontology only if they are logically coherent with axioms added before them. Nickel et al. [18] propose a method using hierarchical clustering on a decomposed representation of the knowledge graph. Specifically, they extend RESCAL [17], a method for factorizing a three-way tensor, to better handle sparse large-scale data and apply OPTICS [3], a density based hierarchical clustering algorithm. Ristoski et al. [20] rely on entity and text embeddings in their proposed method, TIEmb. The intuition behind this approach is that entities of a subclass will be embedded within their parent class’s embeddings. Thus if you calculate the centroid for each class’s embeddings, you can infer its subclasses as those whose centroid falls within a certain radius. For instance, the class centroids of Mammals and Reptiles will fall inside the radius of Animals although the converse is not true since Mammals and Reptiles are more specific classes and are expected to have a smaller radius.

2.2 Methods for Tag Hierarchy Induction

Heymann and Garcia-Molina [11] propose a frequency-based approach using cosine similarity to calculate tag generality. In their approach, tags are assigned vectors based on the amount of times they annotate each document. The pairwise cosine similarity between tag vectors is used to build a tag similarity graph. The closeness centrality of tags in this graph is used as the generality of tags. To build the hierarchy, tags are greedily added – in order of descending generality – as children to the tag in the hierarchy that has the highest degree of similarity. This approach was extended by Benz et al. [4] to better handle synonyms and homonyms in the dataset. Schmitz [23] proposed a method extending on the work done by Sanderson and Croft [22] which uses subsumption rules to identify the relations between parents and children in the hierarchy. The subsumption rules are calculated by tag co-occurrence and filtered to control for “idiosyncratic vocabulary”. These rules form a directed graph which is then pruned to create a tree. Solskinnsbakk and Gulla [25] use the Aprioir algorithm [1] to mine a set of association rules from the tags. Each of these rules has the relationship of premise and consequence which the authors treat as that of class and subclass. This is used to construct a tree which is then verified based on the semantics of each tag. Tang et al. [26] use Latent Dirichlet Allocation (LDA) [5] to generate topics comprised of tags. Generality can then be calculated following the reasoning that tags with high frequencies across many topics are more general than ones that have a high frequencies in a single topic. Relations between tags are induced based on four divergence measures calculated on the LDA results. Agglomerative Hierarchical Clustering for Taxonomy Construction [14] avoids explicitly computing tag generality by employing agglomerative clustering and selecting cluster medoids to be promoted upwards in the hierarchy. Cluster medoids are chosen based on a similarity metric calculated as the divergence between a tag’s topic distributions as learned by LDA. Wang et al. [29] propose a taxonomy generation method based on repeated application of k-medoids clustering. As the distance metric necessary for k-medoids clustering, they propose a similarity score based on the weighted sum of document and textual similarities. Levels in the hierarchy are created by repeated application of k-medoids clustering such that for each cluster, the cluster medoid becomes the parent of all other tags in the cluster. Dong et al. [8] propose a supervised learning approach wherein binary classifiers are trained to predict a “broader-narrower” relation between tags. LDA is used to generate topic distributions for tags which act as a basis for three sets of features used to train the classifier. This approach does not guarantee that the relations between tags will form a rooted tree.

3 Problem Description

A knowledge graph, \(\mathcal {K}\), is repository of information structured as a collection of triples where each triple relates the subject, s, to the object, o, through a relation, r. More formally, \(\mathcal {K} = \{\langle s, r, o \rangle \in \mathcal {E} \times \mathcal {R} \times \mathcal {E} \}\) where \(\langle s, r, o \rangle \) is a triple, \(\mathcal {E}\) is the set of entities in \(\mathcal {K}\), and \(\mathcal {R}\) is the set of relations in \(\mathcal {K}\). \(\mathcal {K}\) can therefore be viewed as a directed graph with nodes representing entities and edges representing relations.

We can think of relation-object pairs, \(\langle r, o \rangle \), as tags that describe the subject. In this view, each entity that takes on the role of subject, \(s_i\), is annotated by tags, \(t_j \in \mathcal {A}_i\), where \(\mathcal {A}_i\) is the set of tags that annotate \(s_i\). We call these entities documents, \(d_i \in \mathcal {D}\), such that the set of all documents is a subset of all entities, \(\mathcal {D} \subseteq \mathcal {E}\). Tags are defined as relation-objects pairs, \(t := \langle r, o \rangle \), and belong to the set of all tags, the vocabulary, denoted as \(\mathcal {V}\), such that \(t_j \in \mathcal {V}\). For a concrete example of this notation consider DBpedia, wherein the entity is annotated by the tags , , , and amongst others. In this view, the knowledge base \(\mathcal {K}\) may be represented as the set of document-tag tuples \(\mathcal {K} = \{\langle d, t \rangle \in \mathcal {D} \times \mathcal {V}\}\), where \(\langle d, t \rangle \) is the tuple that relates document d with tag t. We refer to this notation as the tuple structure for the remainder of the paper.

Information in knowledge graphs is often structured using an ontology, which provides semantics to the knowledge graph’s triples through an axiomatic foundation which defines how entities and relations associate with one another. A key component of most ontologies is the class taxonomy which organizes classes through a set of class subsumption axioms. These subsumption axioms may be thought of as is-a relations between classes. For instance, in the DBpedia class hierarchy, the subsumption axioms and imply that is a and that is a . Furthermore, since class subsumption axioms are transitive, is a . This taxonomy oftentimes takes the form of a rooted tree with a root class of which all other classes are considered logical descendants of.

The problem of class taxonomy induction from knowledge graphs involves generating subsumption axioms from triples to build the class taxonomy. We notice that in most knowledge graphs, subjects are related to their class type by one relation. This has the effect of reducing the knowledge graph’s class identifying triples to a single dimension. The property can be exploited in the tuple structure, since all class identifying relations are the same, they can be ignored without loss of information. For instance, in DBpedia the relation which relates subjects to their class is . Thus, when compiling a dataset of class identifying tuples, we can treat the tags and as equivalent. Therefore, the tuple preserves all information required to induce a class taxonomy. This can be exploited by tag hierarchy induction methods which take documents and their tags as input.

4 Approach

Our proposed method uses class frequencies and co-occurrences to calculate similarity between tags. This approach, inspired by the method proposed by Schmitz, relies on the intuition that subclasses will co-occur in documents with their superclasses more often than with classes they are not logical descendants of. Unlike Schmitz’s method which uses this assumption to generate candidate subsumption axioms, our method uses similarity to choose a parent tag which already exists in the taxonomy. In this step, which draws inspiration from Heymann and Garcia-Molina, tags are greedily added to the taxonomy in order of descending generality. Thus, subsumption axioms induced by our method have to abide by the following rules: (1) the parent tag has a higher generality than the child tag; (2) the parent tag is the tag with the highest similarity to the child tag from the tags that exist in the taxonomy when the child tag is being added.

As previously mentioned, our approach leverages the tuple structure of a knowledge graph to induce a class taxonomy in the form of a rooted tree. As such, the first step is data preprocessing wherein all of a knowledge graph’s class identifying triples are converted to tuple structure.

4.1 Class Taxonomy Induction Procedure

Before describing the taxonomy induction procedure for our method, we define measures which are calculated on the knowledge graph as required input for our algorithm.

  • The number of documents annotated by tag \(t_a\) is denoted as \(\text {D}_{t_a}\).

  • The number of documents annotated by both tags \(t_a\) and \(t_b\) is denoted as \(\text {D}_{t_a, t_b}\). We note that this measure is symmetrical, i.e. \(\text {D}_{t_a, t_b} = \text {D}_{t_b, t_a}\).

  • The generality of tag \(t_a\), denoted as \(\text {G}_{t_a}\), measures how general the concept described by the tag is and how high it belongs in the taxonomy. The generality is defined as:

    $$\begin{aligned} \text {G}_{t_a} = {\sum _{t_b \in \mathcal {V}_{-t_a}} \dfrac{\text {D}_{t_a,t_b}}{\text {D}_{t_b}}} \end{aligned}$$
    (1)

    Where \(\mathcal {V}_{-t_a}\) is the set of all tags excluding tag \(t_a\).

Having calculated the aforementioned measures, we proceed by sorting tags in the order of descending generality and store them as \(\mathcal {V}_{sorted}\). The first element of this list, \(\mathcal {V}_{sorted}[0]\), is semantically the most general of all tags and becomes the root tag of the taxonomy. The taxonomy, \(\mathcal {T}\), is represented as a set of subsumption axioms between parent and child tags. Formally, each subsumption between parent tag, \(t_{parent}\), and child tag, \(t_{child}\), is represented by \(\{ t_{parent} \rightarrow t_{child} \}\) such that \(\{ t_{parent} \rightarrow t_{child} \} \in \mathcal {T}\). The taxonomy is therefore initialized with the root tag as \(\mathcal {T} = \{\{ \emptyset \rightarrow \mathcal {V}_{sorted}[0] \}\}\) where \(\emptyset \) represents a null value, i.e. no parent.

Following initialization, the remaining tags are added to the taxonomy in terms of descending generality by calculating the similarity between the tag being added, \(t_b\), and all the tags already in the taxonomy, \(\mathcal {T*}\). The tag \(t_a \in \mathcal {T*}\) that has the highest similarity with tag \(t_b\) becomes the parent of \(t_b\) and \(\{ t_a \rightarrow t_b \}\) is added to \(\mathcal {T}\). The similarity between tags \(t_a\) and \(t_b\), denoted as \(\text {S}_{t_a \rightarrow t_b}\), measures the degree to which tag \(t_b\) is the direct descendant of tag \(t_a\). It is calculated as the degree to which tag \(t_b\) is compatible with tag \(t_a\) and all the ancestors of \(t_a\):

$$\begin{aligned} \text {S}_{t_a \rightarrow t_b} = \sum _{t_c \in \mathcal {P}_{t_a}} \alpha ^{l_a - l_c}\dfrac{\text {D}_{t_b,t_c}}{\text {D}_{t_b}} \end{aligned}$$
(2)

Where \(\mathcal {P}_{t_a}\) is the path in the taxonomy from the root tag \(\mathcal {V}_{sorted}[0]\) to tag \(t_a\). \(l_a\) and \(l_c\) denote the levels in the hierarchy of tags \(t_a\) and \(t_c\), respectively. The levels are counted from the root tag starting at zero. Thus, the level of \(\mathcal {V}_{sorted}[0]\), denoted as \(l_{\mathcal {V}_{sorted}[0]}\), is equal to zero, the levels of its children are equal to one, and so on. The decay factor, \(\alpha \), is a hyperparameter that controls the effect ancestors of tag \(t_a\) have on its similarity when calculating \(\text {S}_{t_a \rightarrow t_b}\). By setting the value of \(\alpha \) such that \(0< \alpha < 1\), we ensure that the effect is lower the more distant an ancestor tag is. The cases were \(\alpha = 0\) and \(\alpha = 1\) correspond to ancestors having no effect and equal effect on the similarity, respectively. We explore the effect various \(\alpha \) values have on the induced class taxonomy in the following section. The full details of our method’s procedure are outlined in Algorithm 1.

figure a

5 Evaluation

Evaluation of class taxonomy induction methods is difficult as there may be several equally valid taxonomies for a dataset. Previous works such as Gu et al. [10] and Wang et al. (2009) [30] have opted for human evaluation, wherein domain experts assess the correctness of relations between classes. Wang et al. (2012) [29] used domain experts to rank entire paths on a three point scale. Others, such as Liu et al. [15] and Almoqhim et al. [2], compare class relations against a gold standard taxonomy. In this approach, a confusion matrix between class subsumption axioms is calculated between the induced and gold standard taxonomies. When a gold standard taxonomy can be established, it is the preferred evaluation method as it provides an objective measurement; as such, it is the one we use in our work. We use the confusion matrix to derive the harmonic mean between precision and recall, the \(F_1\) score [7], as our evaluation metric:

$$\begin{aligned} precision \text { } =&\text { } \dfrac{TP}{TP+FP} \end{aligned}$$
(3)
$$\begin{aligned} recall \text { } =&\text { } \dfrac{TP}{TP+FN} \end{aligned}$$
(4)
$$\begin{aligned} F_1 \text { } =&\text { } 2 * \dfrac{precision * recall}{precision + recall} \end{aligned}$$
(5)

where TP, FP, and FN are the number of true positives, false positives, and false negatives, respectively.

For the remainder of this section, we first evaluate the effect of our method’s hyperparameter, \(\alpha \), on each of the three datasets and provide suggestions for selecting the \(\alpha \) value when applying our method to other datasets. This is followed by a comparing our method to the aforementioned Heymann and Garcia-Molina method, Schmitz method, as well as results from the literature. We also provide visualizations of excerpts from the class taxonomies induced by our method on the Life and DBpedia datasets. Finally, our method’s computational complexity and the effect of dataset size on induced taxonomies are evaluated. The method was implemented using Python and has been made public alongside our datasets for reproducibility on GithubFootnote 3.

5.1 Datasets

We evaluate the method on three real-world datasets generated from public online knowledge bases: Life, DBpedia, and WordNet. All three datasets as well as their respective gold standard class taxonomies were generated during the month of November 2019.

The Life Dataset was generated by querying the Catalogue of Life: 2019 Annual Checklist (CoL) [21], an online database that indexes living organisms by their taxonomic classification. One hundred thousand living organisms were randomly selected from the GBIF Type Specimen Names [9], an online checklist of 1,226,904 organisms, and queried on CoL at each of their taxonomic ranks to generate the document-tag tuples. The resulting dataset takes the form such that each organism is a document and its membership at each taxonomic rank is a tag related by . For instance, the document (coyote) will have the tags and . Furthermore, to anchor the class taxonomy to a root tag, we added the tag to every document. We note that even though the number of taxonomic ranks is fixed, most organisms in the database are not defined on all of them. As such, the number of tags per document varies from two to ten. In total, there are 100,000 documents and 37,368 unique tags. Since the dataset itself is classified in the correct taxonomic order, the Life gold standard taxonomy could simply be obtained by querying for subsumption axioms from the dataset.

The DBpedia Dataset was generated by randomly querying for 50,000 unique subjects in DBpedia for which there exists a triple where the subject is related to a DBpedia class object (an object having the prefix ) via the relation . These 50,000 subjects become the documents in the tuple structure. Following this step, all the triples for each document having the tag form were queried to make the document-tag tuples. ( represents any object with the prefix \(\texttt {dbo}\).) In total, 205,793 triples were used to create the dataset with 418 unique tags. The DBpedia gold standard taxonomy was generated from the DBpedia ontology class mappings which can be found on the DBpedia websiteFootnote 4. At the time of querying, the ontology had 765 classes, 418 of which were present in the dataset. This difference made it necessary to include only those subsumption axioms for which parent and child tags exist in the dataset when computing the confusion matrix. This is similar to the dataset generated in Ristoski et al. where the number of classes present in their dataset was 415.

The WordNet Dataset was generated by querying DBpedia for subjects of types that exist in WordNet [16], an English language lexical database. Fifty thousand subjects having a WordNet class object related by were queried. In DBpedia, WordNet class objects use the prefix, giving the tag format . This process yielded a dataset comprised of 50,000 documents and 1752 unique tags generated from 392,846 triples. To generate the WordNet gold standard taxonomy, DBpedia was queried to learn the relations between WordNet classes through the relation. In this process, is set as the root class and the taxonomy is built by recursively querying for subclasses using as the relation. This process builds a taxonomy of 30722 tags. To fit the 1752 tags present in the dataset, it was necessary to collapse the gold standard taxonomy. This was done by removing tags in the gold standard taxonomy that are missing in the dataset and adopting orphaned tags with the nearest ancestor existing in the dataset.

Fig. 1.
figure 1

Comparison of mean test \(F_1\) Scores at varying \(\alpha \) values on the Life, DBpedia, and WordNet datasets.

5.2 Hyperparameter Sensitivity

We evaluate our method’s sensitivity to the decay factor, \(\alpha \), by performing a hyperparameter sweep on each of the three datasets. In this process, our method is applied five times on each dataset for \(\alpha \) values starting at \(\alpha = 0\) and increasing by increments of 0.05 up until \(\alpha = 1\). This process is analogous to increasing the relative importance of ancestor tags when calculating tag similarity. Furthermore, since similarity is calculated as a summation, increasing \(\alpha \) will favour tags lower in the taxonomy. The \(F_1\) scores are calculated and their means at each \(\alpha \) value are displayed graphically in Fig. 1. For clarity, we omit graphing the mean \(F_1\) scores at \(\alpha = 0\) as the values are disproportionately low for all three datasets (\(F_1 < 0.1)\). This is because when \(\alpha = 0\), the similarity gets reduced to \(\text {S}_{t_a \rightarrow t_b} = \text {D}_{t_a,t_b} / \text {D}_{t_b}\) which has the effect of inducing shallow taxonomies with most tags as children of the root tag.

Upon cursory inspection of the \(F_1\) scores, we notice that there is no clear behaviour that \(\alpha \) exhibits which is constant across datasets. This is also apparent when comparing the optimal \(\alpha \) values: 0.95, 0.70, and 0.35 for Life, DBpedia, and WordNet datasets, respectively. Furthermore, we notice that as \(\alpha \) increases, the trend follows three different patterns: stable, generally increasing, and generally decreasing. A possible reason for the relative stability of \(\alpha \) on the Life dataset is its consistency. Due to the strict requirements for source datasets to be included in CoL, all entries are well scrutinised. As such, tags will always appear with their ancestors in the same documents. For example, all 893 instances of the tag Mammalia co-occur with the tag’s ancestors Animalia, Chordata, and LivingOrganism. In this scenario, there is less information to be gained by incorporating information from higher up in the taxonomy. On the other hand, the DBpedia dataset shows improvement with increasing \(\alpha \) values until a peak is reached and \(F_1\) declines. The increase in induced taxonomy quality with increasing \(\alpha \) values in consistent with the assumption that taking into account a potential parent’s path is advantageous when selecting a parent. The decline in \(F_1\) after \(\alpha = 0.8\) can be explained by distant ancestor tags having too strong an influence in assigning parent tags to children. One possible explanation for better \(F_1\) scores of lower \(\alpha \) values on WordNet is our method’s overall lower \(F_1\) scores on this dataset. Errors in the induced taxonomy propagate downwards and their effect increases with the value of \(\alpha \). Thus, in a taxonomy with many errors, it is advantageous to place a relatively higher value on the similarity between the direct parent tag and its child, as is done with lower \(\alpha \) values.

In general, it is difficult to predict the optimal \(\alpha \) value a priori, however there are a few rules of thumb to guide this process when applying our method. When there is no prior information about a nature of the dataset or its expected class taxonomy, we suggest using \(\alpha \) values around 0.5 as these values perform well (although not optimally) in our experiments. Datasets which are complex, or have low co-occurence rates between ancestor and descendent tags will favour lower \(\alpha \) values as these ensure errors will propagate less through the taxonomy. On the other hand, well structured datasets will be less affected by varying \(\alpha \) values.

5.3 Results

In our experiments, we applied our proposed method to each of the aforementioned datasets at the \(\alpha \) values determined optimal in the previous subsection. Each dataset was applied five times to account for the stochasticity in sorting tags of equal generality. The results of our method as well as those of the comparison methods are summarized in Table 1. We implemented Heymann and Garcia-Molina, and Schmitz methods to the best of our understanding and performed hyperparameter exploration for their respective hyperparameters on each dataset. After obtaining the optimal hyperparameters, we ran the methods five times on each dataset and collected the results. We note that Heymann and Garcia-Molina was not able to terminate sufficiently fast enough for us to obtain results on the Life dataset. In the table we also included the results reported in previous work applied on the DBpedia dataset. Although the DBpedia dataset was derived similarly to our own, conclusions in comparing this method to our proposed method should be drawn cautiously. We indicate these entries in the table with a footnote.

In general, our method outperforms the other two tag hierarchy induction methods as shown by the mean \(F_1\) scores. We notice similarly high precision and recall values which suggests that it’s both capable of inducing subsumption axioms (recall) while ensuring these axioms are correct (precision). Furthermore, closer inspection of the results reveals that many of the errors can be categorized by two types, which we illustrate by using results from the DBpedia dataset. In the first, the order between parent and child tags are reversed as in the induced when the correct order is . In the second, a tag is misplaced as the child of its sibling, for instance, the gold standard classification of educational institutions is while our induced taxonomy gives the following: . Finally, our induced taxonomy includes subsumption axioms which are considered incorrect as per the gold standard but may not be to a human evaluator. An example of this is that our method induced the subsumption axiom while the gold standard considers to be the correct parent for . We provide an excerpt of our induced class taxonomies on the Life and DBpedia datasets in Fig. 2.

Table 1. Method results (mean ± standard deviation) on the Life, DBpedia, and WordNet datasets.
Fig. 2.
figure 2

Excerpts of the induced class taxonomies for the Life (left) and DBpedia (right) datasets. Ellipses denote addition child classes omitted for brevity.

5.4 Computational Complexity Evaluation

One of the most salient issues that arises when applying class taxonomy induction methods to real-world knowledge graphs is that of scalability. As mentioned previously, DBpedia, Yago, and WikiData have upwards of one billion triples each, thus for a method to operate on these datasets, it has to be computationally efficient. It is important to note, however, that in inducing a class taxonomy, it is not necessary to use all the triples available in the knowledge graph but rather to only use as many as is required to achieve an acceptable result. We discuss this idea in the following subsection.

The most computationally taxing procedure in our method is that of calculating the number of documents annotated by two tags, \(\text {D}_{{t_a}, {t_b}}\), which has a worst case time complexity of \(\mathcal {O}(|\mathcal {D}||\mathcal {V}|^2)\), where \(|\mathcal {D}|\) and \(|\mathcal {V}|\) are the number of documents and tags, respectively. It is important to note, however, that the worst case only occurs when all documents are annotated by all tags. In this scenario, every subject in a knowledge graph is of every class type in the ontology. The average computation complexity of our algorithm is \(\mathcal {O}(|\mathcal {D}|\overline{|\mathcal {A}|}^2)\) where \(\overline{|\mathcal {A}|}\) is the average number of tags that annotate a document. In our experiments our method was faster to terminate than both the Heymann and Garcia-Molina and Schmitz methods on all three datasets.

5.5 Effect of Dataset Size on Induced Taxonomy

As mentioned previously, although a method’s scalability to large knowledge graphs is important in the context of the Semantic Web, it’s not the case that larger datasets will produce better taxonomies. To demonstrate this, we applied our method as described in the Results subsection to DBpedia datasets at differing document counts. Each dataset was derived the same way as described in the Datasets subsection, such that all of the smaller DBpedia datasets are strict subsets of the larger ones. A summary of the results is displayed in Table 2. We note that runtime measures the execution of our method without including time for input and output. We notice that although larger datasets obtain higher \(F_1\) scores, the incremental increase in \(F_1\) diminishes, and the scores plateau after 20,000 documents. However, relying on \(F_1\) score as the sole comparison metric may be misguiding since it is calculated on the tags which exist in the dataset. Thus since there are 211 unique tags in the DBpedia 1,000 dataset and 428 unique tags in the DBpedia 100,000 dataset, the induced taxonomy of the latter will be over twice as large as the former.

Table 2. Summary of our method’s results on DBpedia datasets at various document counts, \(|\mathcal {D}|\).

6 Conclusions

In this paper, we described the problem of inducing class hierarchies from knowledge graphs and its significance to the Semantic Web community. In our contribution to this research area, we proposed an approach to the problem by marrying the fields of class taxonomy induction from knowledge graphs with tag hierarchy induction from documents and tags. To this end, we reshaped the knowledge graph to a tuple structure and applied two existing tag hierarchy induction methods to show the viability of such an approach. Furthermore, we proposed a novel method for inducing class taxonomies that relies solely on class frequencies and co-occurrences and can thus be applied on knowledge graphs irrespective of their content. We showed our method’s ability to induce class hierarchies by applying it on three real-world datasets and evaluating it against their respective gold standard taxonomies. Results demonstrate that our method induces better taxonomies than other tag hierarchy induction methods and can be reliably applied to large-scale knowledge graphs.