Effective Composition of Mappings for Matching Biomedical Ontologies

Hartung, Michael; Gross, Anika; Kirsten, Toralf; Rahm, Erhard

doi:10.1007/978-3-662-46641-4_13

Michael Hartung^20,21,
Anika Gross^20,21,
Toralf Kirsten²¹ &
…
Erhard Rahm^20,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7540))

Included in the following conference series:

Extended Semantic Web Conference

1387 Accesses
3 Citations

Abstract

There is an increasing need to interconnect biomedical ontologies. We investigate a simple but promising approach to generate mappings between ontologies by reusing and composing existing mappings across intermediate ontologies. Such an approach is especially promising for highly interconnected ontologies such as in the life science domain. There may be many ontologies that can be used for composition so that the problem arises to find the most suitable ones providing the best results. We therefore propose measures and strategies to select the most promising intermediate ontologies for composition. We further discuss advanced composition techniques to create more complete mappings compared to standard mapping composition. Experimental results for matching anatomy ontologies demonstrate the effectiveness of our approaches.

You have full access to this open access chapter, Download conference paper PDF

Selection and Combination of Heterogeneous Mappings to Enhance Biomedical Ontology Matching

Extending AgreementMakerLight to Perform Holistic Ontology Matching

Methodology for Biomedical Ontology Matching

Keywords

1 Introduction

In recent years ontologies have become increasingly important in the life sciences [5, 22]. For instance, Bio2RDF [3], the OBO Foundry [29] or BioPortal [24, 33] distribute a growing number of biomedical ontologies from different domains such as anatomy and molecular biology. The ontologies are primarily used to annotate objects such as proteins, genes or literature to achieve a better information exchange. Often there are different ontologies from one domain containing overlapping or related information. As an example information about mammalian anatomy is available in NCI Thesaurus [23], Adult Mouse Anatomy [1] or the Unified Medical Language System [32]. In such cases ontology mappings can be used to express correspondences between different but related ontologies, e.g., which concepts of two different ontologies are equivalent.

Mappings between related ontologies are useful in many ways, in particular for data integration and enhanced analysis [18, 24]. They are needed to merge ontologies [28], e.g., to create an integrated cross-species anatomy ontology such as the Uber ontology [31] or may also be useful to transfer knowledge from different experiments between species [6]. There are already numerous mappings between ontologies available, e.g., BioPortal provides mappings between approx. 300 ontologies. However, there is still a strong need for increasing the number of mappings as most ontologies are interlinked to only one or a few other ontologies. Furthermore, new ontologies need to be connected to existing ones. The size of biomedical ontologies makes a manual generation of new mappings unfeasible, hence (semi-) automatic match algorithms are required.

We focus on the reuse and composition of existing mappings between ontologies to indirectly determine new ontology mappings and correspondences. Such an approach is especially promising for the life science domain where many mappings can be reused (e.g., from BioPortal). A main advantage of such a composition approach is its simplicity and high efficiency even for large ontologies. As shown in Fig. 1, one can use multiple alternatives (routes) to establish a new mapping between a source ($S$) and target ($T$) ontology using composition. First, there can be multiple intermediate ontologies $IO$ ($IO_1$ ...$IO_n$) leading to questions like: “Is it better to use $IO_1$ instead of $IO_2$ or both?”. Second, for one single intermediate ontology there can be several alternatives if there are multiple mappings between two ontologies (dotted/dashed lines between $S$ and $IO_1$), e.g., determined by different match approaches. Considering a large number of possible composition alternatives we need an automatic approach to select the most suitable intermediates that likely result in the best composed mappings.

In this paper we study such selection methods and make the following contributions:

We propose an efficiently computable measure to determine the effectiveness of composition routes via intermediate ontologies. For the case of composing two mappings, the effectiveness measure helps to find the most promising intermediate ontology.
We describe two strategies using the proposed effectiveness measure to rank and select the top-k intermediates for mapping composition. Combining the derived mappings for the top-k routes helps to improve the overall mapping quality. We further discuss advanced composition techniques that may help to generate more complete ontology mappings compared to the standard technique.
We evaluate the proposed approach on the OAEI [25] anatomy match task by using existing mappings determined by different match approaches. The obtained mapping quality results demonstrate the effectiveness of the proposed selection strategies.

This paper is an extended version of [15]. In Sect. 2 we introduce our ontology and mapping model. Section 3 presents the composition-based match approach. We describe our effectiveness measure and outline two strategies for selecting the most promising routes. In Sect. 4 we discuss advanced composition techniques. We evaluate the approach in Sect. 5. After a discussion of related work (Sect. 6), we summarize and outline possible future work.

2 Preliminaries – Ontologies and Mappings

An ontology $O=(C,R,A)$ consists of a set of concepts $C$ which are interrelated by directed relationships $R$. Each concept has an unique identifier (e.g., accession number, URI) that is used to reference the concept, e.g., the concept ‘Vertebra’ in NCI Thesaurus is unambiguously referenced by C12933. A concept typically has further attributes $a \in A$ to describe the concept, e.g., C12933 has the name ‘Vertebra’ and a synonym ‘Vertebrae’. A relationship $r \in R$ forms a directed connection between two concepts and has a specific type, e.g., is_a or part_of. In our case C12933 is a special ‘Bone’ (C12366): [C12933, is_a, C12366].

A mapping between two ontologies $S$ and $T$, $M_{S,T}=\{(c_1,c_2,sim)|c_1\in S, c_2\in T,sim\in \left[ 0,1\right] \}$, consists of a set of correspondences between these ontologies, e.g., as determined by some ontology match method (see Related Work). Each correspondence interconnects two related concepts $c_1$ and $c_2$. Their relatedness is represented by a similarity value $sim$ between 0 and 1 determined by the used match approach. The greater the $sim$ value the more similar are the corresponding objects. Note that we focus on equality correspondences and leave the consideration of other correspondence types for future work. For already validated mappings we assume a similarity of 1 for each correspondence.

3 Rating and Selection of Composition Routes

In this section we present our approach to rate composition routes and to select the most promising ones. After introducing the concept of mapping composition, we propose an effectiveness measure to rate the value of routes in Sect. 3.2. Using this measure we describe the strategies topKByEffectiveness and topKByComplement for ranking and selecting the routes (Sect. 3.3). We finally describe in Sect. 3.4 the combined use of multiple selected routes to create a new mapping.

3.1 Composition for Generating New Mappings

The general idea behind mapping composition is to derive new mappings between two ontologies by reusing already existing mappings. Thus, new mappings are generated indirectly via one or more intermediate ontologies instead of a direct match between the two input ontologies. The typical situation for one intermediate is depicted in Fig. 2. The input consists of two ontologies $S$/$T$ and two mappings $M_{S,IO}$/$M_{IO,T}$ w.r.t. an intermediate ontology $IO$. The domain and range of the mappings can be used to find out which concepts are covered by the given mappings. For instance, all concepts of $S$ covered by the mapping to $IO$ are in its domain: domain($M_{S,IO}$). Similarly, $IO$ concepts covered by this mapping are in its range: range($M_{S,IO}$). Mapping composition is then applied in the following way. A compose operator takes as input two mappings (from $S$/$T$ to $IO$) and produces new correspondences between concepts of $S$ and $T$ if correspondences share the same concept in $IO$. The result is a new mapping $M_{S,T}$:

$$\begin{aligned}&\ \ \ \ \ \ \ \ \ \ \ \ M_{S,T} = \mathtt{compose }(M_{S,IO},M_{IO,T}) = \\&\{(c_1,c_2,aggSim(sim_1,sim_2))|c_1\in S, c_2\in T, b\in IO:\\&\ \ \ \ \exists (c_1,b,sim_1)\in M_{S,IO} \wedge \exists (b,c_2,sim_2)\in M_{IO,T}\} \end{aligned}$$

The similarity values of input correspondences are aggregated (aggSim) into new similarity values, e.g., by computing their maximum or average. In Fig. 2 we would create two correspondences between $S$ and $T$ since two concepts in $IO$ overlap.

3.2 Effectiveness of Routes

The result of a mapping composition heavily depends on which intermediate ontologies are used and how the mappings to these intermediates look like. First, compose can at best create correspondences between concepts of $S$/$T$ that are covered by the input mappings to an $IO$. The more concepts are covered by an input mapping the more likely it is that they can be interlinked to concepts in the other ontology. Thus, an intermediate for which mappings only cover a small portion of $S$/$T$ are less effective compared to those covering larger portions. Second, there should be a high overlap of mapped objects in $IO$, i.e., many $IO$ concepts should be in both range($M_{S,IO}$) and domain($M_{IO,T}$). This is because new correspondences can only be created if there are intermediate concepts for the composition. By contrast, a small overlap will only permit the creation of few correspondences, i.e., small and likely incomplete mappings. Based on these observations we define a measure to rate the effectiveness of a route between sources $S$ and $T$ via an intermediate $IO$:

$$\begin{aligned} eff(S,IO,T)=\frac{2 \cdot |range(M_{S,IO})\cap domain(M_{IO,T})|}{|S|+|T|} \end{aligned}$$

The measure is largely based on the size of the overlap of concepts in the intermediate ontology, i.e., the larger the overlap the better the effectiveness. Second, we relate this overlap to the sizes of the ontologies to be matched $S$ and $T$. Only mappings with many correspondences can produce a high overlap and a good coverage of concepts in $S$ and $T$. Figure 3 shows two examples for applying the measure. The left example results in a good effectiveness of ($\frac{2\cdot 3}{4+4}=0.75$) because the overlap in the intermediate ontology covers a large part of $S$ and $T$. By contrast, in the right example there is only one overlapping concept in the intermediate ontology resulting in a poor effectiveness of $\frac{2\cdot 1}{4+4}=0.25$. The compose operator would produce the following mappings (without similarity values): (a) $M_{S,T}=\{(A,A'), (B,B'), (C,C')\}$ and (b) $M_{S,T}=\{(B,B')\}$. This shows that the better rated intermediate ontology is able to produce more correspondences and thus a more complete mapping.

3.3 Ranking and Selection of Routes

Mapping composition using only one route may lead to insufficient (incomplete) match results. Composing mappings for several routes via different intermediates and combining their results is likely to improve the mapping to be determined. This is because other intermediate sources may provide additional correspondences between the input ontologies. The question thus arises which of the available routes should be selected for mapping composition. In the following, we describe two selection strategies that we will also evaluate later.

The first strategy topKByEffectiveness simply uses a ranking based on the effectiveness measure described in Sect. 3.2. Hence, we perform composition only on the $k$ most effective routes and combine their results.

The second strategy topKByComplement also selects the most effective route but selects the remaining routes based on the number of complementary correspondences they can provide. The strategy determines how much additional gains can be achieved by considering further routes. For instance, if one has to match two anatomy ontologies, an ontology about the skeletal system would be complementary to one about the nervous system or blood circuit. Hence, it makes sense to consider intermediate ontologies that contain additional knowledge that others do not provide.

Algorithm 1 shows the implementation of this strategy. It first selects the most effective intermediate based on our effectiveness measure (lines 1–3). It then iteratively (while loop) adds the intermediate possessing the maximum complement ($compl_{max}$) compared to the already covered objects ($cov_{all}$) in $S$ and $T$ (lines 5–12). Particularly, we compare the covered concepts of the current intermediate with the covered concept set ($cov_{all}$) from already selected intermediates. In each round we select the intermediate which brings us the maximum complement. Note that the algorithm could be adapted to not only consider a fixed number ($k$) of intermediates. Instead we could stop taking further intermediates into account if their complement does not succeed a given threshold.

3.4 Overall Composition Algorithm

We use the algorithm topKComposeMatch (see Algorithm 2) to perform the composition for the $k$ selected intermediates and to combine the composition results to obtain the overall mapping between two input ontologies.

We first apply our effectiveness measure on each route (line 1). Based on the given selection strategy (topKByEffectiveness, topKByComplement) we filter the top $k$ promising intermediates (line 2). We then iteratively compose the mappings between $S$ and $T$ along each selected intermediate (lines 4–7). The generated mappings are temporarily stored in a $mapList$ and are finally merged according to a specified merge strategy, such as union or intersection.

4 Advanced Composition Techniques

The compose operator described in Sect. 3.1 is most effective when many concepts in the intermediate ontology participate in both the first and the second input mapping. To improve the applicability of composition in less favorable settings we propose two generalized composition techniques to derive correspondences by reusing existing mappings. Such techniques can be applied incrementally, i.e., we would first use the standard compose operator to generate an initial mapping and then try to find further correspondences with the advanced techniques.

The two strategies we discuss in the following are Semantic Neighborhood Composition and Multi-Step Composition. We will not evaluate these strategies in this paper but leave this for future work.

4.1 Semantic Neighborhood Composition

Standard composition joins two correspondences $(c1,c2')$ and $(c2'',c3)$ only for $c2'=c2''$, i.e., if they share a concept in the intermediate ontology. With Semantic Neighborhood composition (SNcompose) we want to relax this condition by also composing correspondences where $c2'$ and $c2''$ are in close semantic neighborhood, e.g., if there are in a parent, child or sibling relationship. For the example in Fig. 4(a), standard composition only derives correspondence $(B,B')$ via the shared concept $B''$ in the intermediate ontology $IO$. With SNcompose we can additionally find out that concept $C$ in $S$ and concept $C'$ in $T$ correspond to the closely related $IO$ concepts $C1''$ and $C2''$ (that are in a parent-child relationship) so that the correspondence $(C,C')$ may also hold. A similar kind of composition has already been applied by the taxonomy matcher of the COMA++ [2] matching tool where a taxonomy is used as an intermediate ontology for representing background knowledge.

In general, the SNcompose operator can be defined as follows:

$$\begin{aligned}&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \qquad M_{S,T} = \mathtt{SNcompose }(M_{S,IO},M_{IO,T}) = \\&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \{(c_1,c_2,aggSim(sim_1,sim_2,sim_3))|c_1\in S, c_2\in T, a,b\in IO:\\&\exists (c_1,a,sim_1)\in M_{S,IO} \wedge \exists (b,c_2,sim_2)\in M_{IO,T} \wedge \exists neighbor(a,b,sim_3)\in IO\} \end{aligned}$$

It is assumed that the neighbor relation provides an intra-ontology distance or similarity ($sim_3$) between concepts depending on the type of relationship (parent of, child of, sibling of) and possibly further criteria such as cardinalities. This similarity is additionally used to compute the final similarity ($aggSim$) of the derived correspondence. However, one should be aware of that the resulting correspondences may no longer be of type ‘equality’, but that the relationship between the related concepts in the intermediate ontology may hold (e.g., an $is\_a$ relationship between $C$ and $C'$ in Fig. 4(a)).

4.2 Multi-Step-Composition

A second strategy to create additional correspondences is the adoption of a multi-step composition to combine multiple mappings within longer mapping paths over two or more intermediate ontologies. For the example in Fig. 4(b) we can only derive correspondence ($B,B'$) when considering only the composition of two mappings via a single intermediate ontology such as $IO1$. Considering longer mapping routes via the intermediates $IO2$ and $IO3$ can help us to identify additional correspondences. In particular, composing the correspondences ($C,C1'$) and ($C1',C2'$) and the result with ($C2',C'$) leads to a second correspondence ($C,C'$) between $S$ and $T$. The idea thus is to apply the standard compose several times along a complete mapping path between the ontologies to match, i.e., to determine compose(compose($M_{S,IO_2}$,$M_{IO_2,IO_3}$),$M_{IO_3,T}$) in our example.

There may be many applicable mapping paths of length three or more so that it becomes even more important to select the most promising one. Our effectiveness measure introduced in Sect. 3.2 can be generalized to longer routes via several intermediates. In general such routes are mapping chains across ontologies $O_1$, ..., $O_n$ with $O_1 = S$ and $O_n = T$ and we can iteratively compute the overlap in each intermediate ontology. We determine the effectiveness as follows:

$$\begin{aligned} eff(O_1 \ldots O_n)=\frac{2 \cdot min_{i=2}^{n-1}[|range(M_{O_{i-1},O_i})\cap domain(M_{O_i,O_{i+1}})|]}{|O_1|+|O_n|} \end{aligned}$$

We take the minimal overlap since the intermediate with the smallest overlap restricts the overall effectiveness and a composition path must be represented in the overlaps of all intermediate ontologies. It is easy to see that the effectiveness measure for a single intermediate is a special case of the formula. In our evaluation, we will focus on routes with a single intermediate ontology and leave the evaluation of multi-step composition for future work.

5 Evaluation

We evaluate our approach by composing mappings between anatomy ontologies. In particular, we focus on generating mappings between the Adult Mouse Anatomy (MA) and the anatomy part of NCI Thesaurus (NCIT) which is a challenging task in the yearly OAEI [25] match contest. This has the advantage that we can use the publicly available OAEI gold standard (perfect mapping) to assess the quality of computed mappings (using precision, recall and F-measure) and to compare the achieved results with the published results of other approaches. Furthermore, we can reuse a lot of already existing mappings, in particular mappings provided by BioPortal [33] and mappings that we previously generated using our GOMMA ontology management infrastructure [20].

We first describe our experimental setup in more detail (Sect. 5.1). We then correlate the effectiveness measure with the achieved match results by composing the mappings according to different intermediate ontologies (Sect. 5.2). Finally, we adopt our selection strategies and present results of performing composition-based matching via the most promising intermediate ontologies (Sect. 5.4).

5.1 Experimental Setup

The experiment focuses on generating mappings between the ontologies MA (2,737 concepts) and NCIT anatomy part (3,298 concepts) as available in June 2011. We use 28 input mappings interrelating MA/NCIT via 11 different intermediate ontologies. The input mappings are separated in two different sets. The first mapping set (referred to as Mapping set 1) is taken from the community platform BioPortal [33] and comprises 20 mappings from MA or NCIT to 10 ontologies including BRENDA Tissue Ontology (BTO), Cell Line Ontology (CL), Foundational Model of Anatomy (FMA), Galen (Galen), Logical Observation Identifiers Names and Codes (LOINC), Medical Subject Headings (MeSH), RadLex, Uber Anatomy Ontology (Uber), Teleost Anatomy (TAO), and ZebraFish Anatomy (ZF). These mappings have been created with the LOOM match approach [12]. LOOM takes all names and synonyms of the ontology concepts as input and returns concept pairs as matching when one of their name or synonym differ in at most one character. We use the mappings as provided by the BioPortal web page^{Footnote 1}.

The second set of mappings (called Mapping set 2) consists of eight mappings interrelating MA and NCIT with four intermediate ontologies including Unified Medical Language System (UMLS), Uber, FMA, and Radlex. These mappings have been automatically created by a GOMMA match process. It uses a high trigram string similarity between concept name and synonyms to generate correspondences between concepts. Moreover, post-processing steps are applied to select only the best correspondence(s) per concept (MaxDelta selection (see [8])) and removal of crossing correspondences [19].

Table 1. Mappings between MA and NCIT included in the evaluation according to the two used mapping sets

Full size table

5.2 Route Effectiveness

We focus on routes involving a single intermediate ontology since there are many such routes. Typically, routes with chains of two or more intermediate ontologies may result in a reduced effectiveness. Table 1 shows selected statistics for the considered routes over different intermediates indicated in the columns. The routes are grouped by mapping set and ordered by the computed effectiveness (last row) starting with the route having the highest effectiveness. The first two rows characterize the input mappings for each route by showing the number of correspondences they comprise. These numbers are very different in both mapping sets ranging from approx. 1,900 (4,300) of the largest to about 200 (1,000) correspondences of the smallest mapping in Mapping set 1 (Mapping set 2). For the ontologies used in both mapping sets (Uber, FMA, and Radlex), the mappings in Mapping set 2 are larger than in Mapping set 1.

The third row displays the sizes of the mapping overlap in the intermediate ontology that is decisive for the effectiveness. In Mapping set 1, the route via Uber has the largest overlap (1,048 objects) and the highest effectiveness value of 0.35. In Mapping set 2, the number of referenced concepts in the intermediates is larger resulting in higher effectiveness values, but the relative order Uber, FMA, and Radlex remains. However, the route via UMLS has the highest effectiveness measure (0.67) and, is thus the most promising route for Mapping set 2.

5.3 Correlation of Routes Effectiveness and Composition Quality

Figure 5 correlates the effectiveness (dashed line, z-axis on the right) for each route with the match quality of the composed mapping in terms of precision, recall and F-measure (bars, y-axis). The routes are decreasingly ordered by their effectiveness from left to right and separated for both mapping sets. Overall, there is an excellent correlation between the effectiveness values and achieved match quality for both mapping sets. This means that the composed correspondences are indeed valuable and contribute to the match result so that higher effectiveness values translate into higher F-measure values. For instance, for Mapping set 1 the route via Uber has the best effectiveness and the highest F-measure of 0.76 whereas the route via TAO with the lowest effectiveness (0.05) results in the worst F-measure of only 0.16. The same holds for Mapping set 2: the route via UMLS (Radlex) with the highest (lowest) effectiveness generates a mapping with the best (worst) F-measure of 0.87 (0.6). Therefore, using the effectiveness metric is a valid and reliable means to select the intermediate ontology providing the best match quality.

5.4 Top K Selection and Composition

In the next experiment, we evaluate whether the match quality (F-measure) can be increased when using the proposed selection strategies topKByEffectiveness and topKByComplement for selecting k routes and combining their composition results. We set k to 3 and use union as merge operation in both selection strategies. According to the effectiveness values for each route (see Table 1 and Algorithm 1) we select routes via Uber, FMA, and CL (UMLS, Uber, and FMA) in Mapping set 1 (Mapping set 2) for the topKByEffectiveness strategy and routes via Uber, FMA, and Galen (UMLS, Uber, and FMA) in Mapping set 1 (Mapping set 2) for the topKByComplement strategy. For comparison, we consider several additional selection strategies. They include the single route with the highest F-measure in the mapping set (BestSingle) and the strategies resulting in the worst (Min3), average (Avg3), and best (Max3) F-measure result for combining any three routes. Moreover, we computed the combination of all routes per mapping set (All).

Figure 6 shows the F-measure for all selection strategies and both mapping sets. The results show that in both cases the topKByComplement strategy focusing on complementary mappings produces the max. possible match quality, i.e., it is able to identify the best and most effective composition routes. Interestingly, doing a compose-based match on only three out of the 10/4 possible routes results in better match quality than using all available routes since it apparently avoids wrong correspondences introduced by weaker routes. For instance, in Mapping set 1 F-measure is increased by 3 % (74.2 % $\rightarrow $ 77.4 %) compared to the ‘All’ strategy. For Mapping set 2, the F-measure is improved by 0.2 % compared to ‘All’. Using this strategy we participated with GOMMA in the 2011.5 OAEI contest^{Footnote 2}. The achieved F-measure of 91.5 % is comparable to the best result in the OAEI contest (91.7 % F-measure of AgreementMaker [7] in 2011). While the OAEI contest poses certain restrictions, the participating prototypes did also exploit background knowledge for the Anatomy test case. Our topKByEffectiveness strategy shows marginally worse results compared to topKByComplement (76.2 % vs. 77.4 % for Mapping set 1), apparently since CL complements Uber and FMA less well than using Galen as intermediate ontology.

6 Related Work

Ontology matching is the process of determining a set of semantic correspondences (ontology mapping) between concepts of two ontologies. A manual matching by domain experts is very time-consuming and for large ontologies almost infeasible. Thus, many (semi-)automatic matching algorithms have been developed for ontology matching (see [10, 26, 27] for surveys). Common match approaches follow a direct matching by employing lexical and structural methods; some approaches also consider the similarity of ontology instances. State-of-the art match systems such as COMA++ [2], Falcon [16] or SAMBO [21] combine multiple matchers within a match strategy to achieve better match quality. Results of matching biomedical ontologies showed that linguistic matching methods based on the similarity of concept names and synonyms produce very good results [12, 35]. To improve the runtime of matching (especially for large ontologies) some systems try to reduce the search space [17] or perform parallel matching on multiple compute nodes [13].

The composition of mappings has mainly been studied for schemas [9, 11] and in model management [4]. Only a few approaches consider mapping composition for deriving new mappings in ontology matching. For instance, [34] utilizes FMA as an intermediate to indirectly generate a mapping between MA and NCIT. Similarly, the SAMBO system [21] utilizes background knowledge (e.g., UMLS) to find additional correspondences in the match process. Reference [30] presents an empirical analysis of mapping composition available in BioPortal. In own related work [14], we already studied mapping composition. The primary focus of this work was on match quality (F-measure) by a manual intermediate selection but not on automatic strategies to select the best intermediates according to their expected contribution to the overall match quality.

In contrast to these approaches this paper differs in the following points. First, we apply mapping composition with multiple routes, while most match approaches only consider one route or purely apply a direct match. Second, we focus on finding the most valuable routes for mapping composition out of a pool of possible routes in two different mapping sets. A ranking of routes w.r.t. their effectiveness allows us to compose mappings for a reduced number of routes saving time and possibly improving match quality as shown in the evaluation.

7 Conclusion and Future Work

We proposed a new approach to rank and select promising routes for composing mappings between biomedical ontologies. The introduced effectiveness measure can be easily computed and allows a reliable identification of the most promising intermediate ontologies for composition-based ontology matching. We further proposed the selection of the k top routes and the combination of their composition results for improved match quality. Our evaluation for an OAEI match task on large anatomy ontologies showed the effectiveness of the proposed approach. In particular we found that the effectiveness metric for different routes correlates excellently with their achievable F-measure quality. Furthermore, we found that the topKByComplement ranking strategy is most effective that combines the route with the best effectiveness with routes providing most complementary correspondences. Our approach could effectively exploit existing mappings and achieved an excellent 91.5 % F-measure for the challenging OAEI anatomy task. This shows that mapping composition is not only an efficient method to derive new mappings but can also increase the match quality, e.g., by finding additional correspondences compared to a direct match approach.

In future work we plan to apply and extend the approach for other domains, ontologies and data sources, e.g., matching Linked Data sources. In particular, we want to investigate inter-linking of instance objects and to consider further correspondence types. We further like to study the discussed advanced composition techniques in more detail, e.g., longer mapping chains via multiple intermediates.

Notes

1.
BioPortal: http://bioportal.bioontology.org, http://rest.bioontology.org.
2.
http://oaei.ontologymatching.org/2011.5/.

References

Adult Mouse Anatomy: http://www.informatics.jax.org/searches/AMA_form
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: Proceedings of the SIGMOD, pp. 906–908 (2005)
Google Scholar
Belleau, F., et al.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)
Article Google Scholar
Bernstein, P., Melnik, S.: Model management 2.0: manipulating richer mappings. In: Proceedings of the SIGMOD, pp. 1–12 (2007)
Google Scholar
Bodenreider, O., Stevens, R.: Bio-ontologies: current trends and future directions. Briefings Bioinform. 7(3), 256–274 (2006)
Article Google Scholar
Bodenreider, O., et al.: Of mice and men: aligning mouse and human anatomies. In: Proceedings of the AMIA Annual Symposium, pp. 61–65 (2005)
Google Scholar
Cruz, I.F., et al.: Using AgreementMaker to align ontologies for OAEI 2011. In: Proceedings of the International Workshop on Ontology Matching, pp. 114–125 (2011)
Google Scholar
Do, H., Rahm, E.: Matching large schemas: approaches and evaluation. Inform. Syst. 32(6), 857–885 (2007)
Article Google Scholar
Dragut, E., Lawrence, R.: Composing mappings between schemas using a reference ontology. In: Meersman, R. (ed.) CoopIS/DOA/ODBASE 2004. LNCS, vol. 3290, pp. 783–800. Springer, Heidelberg (2004)
Chapter Google Scholar
Euzenat, J., Shvaiko, P.: Ontology Matching. Springer-Verlag New York, Secaucus (2007)
MATH Google Scholar
Fagin, R., Kolaitis, P., Popa, L., Tan, W.: Composing schema mappings: second-order dependencies to the rescue. ACM Trans. Database Syst. (TODS) 30(4), 994–1055 (2005)
Article Google Scholar
Ghazvinian, A., Noy, N., Musen, M.: Creating mappings for ontologies in biomedicine: simple methods work. In: Proceedings of the AMIA Annual Symposium, pp. 198–202 (2009)
Google Scholar
Gross, A., Hartung, M., Kirsten, T., Rahm, E.: On matching large life science ontologies in parallel. In: Lambrix, P., Kemp, G. (eds.) DILS 2010. LNCS, vol. 6254, pp. 35–49. Springer, Heidelberg (2010)
Chapter Google Scholar
Gross, A., Hartung, M., Kirsten, T., Rahm, E.: Mapping composition for matching large life science ontologies. In: 2nd International Conference on Biomedical Ontology (ICBO), pp. 109–116 (2011)
Google Scholar
Hartung, M., Gross, A., Kirsten, T., Rahm, E.: Effective mapping composition forbiomedical ontologies. In: eProcceedings of Semantic Interoperability in Medical Informatics @ ESWC (2012)
Google Scholar
Hu, W., Qu, Y.: Falcon-AO: a practical ontology matching system. Web Semant. Sci. Serv. Agents World Wide Web 6(3), 237–239 (2008)
Article MathSciNet Google Scholar
Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. Data Knowl. Eng. 67(1), 140–160 (2008)
Article Google Scholar
Jakoniene, V., Lambrix, P.: Ontology-based integration for bioinformatics. In: VLDB Workshop on Ontologies-Based Techniques for DataBases and Information Systems-ODBIS 2005, pp. 55–58 (2005)
Google Scholar
Jean-Mary, Y., Shironoshita, E., Kabuka, M.: Ontology matching with semantic verification. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 235–251 (2009)
Article Google Scholar
Kirsten, T., Gross, A., Hartung, M., Rahm, E.: GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. J. Biomed. Semant. 2, 6 (2011)
Article Google Scholar
Lambrix, P., Tan, H.: Sambo-a system for aligning and merging biomedical ontologies. Web Semant. Sci. Serv. Agents World Wide Web 4(3), 196–206 (2006)
Article Google Scholar
Lambrix, P., Tan, H., Jakoniene, V., Strömbäck, L.: Biological ontologies. In: Baker, C.J.O., Cheung, K.-H. (eds.) Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, pp. 85–99. Springer, New York (2007)
Chapter Google Scholar
NCI Thesaurus: http://ncit.nci.nih.gov/
Noy, N.F., et al.: Bioportal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 37(Suppl. 2), W170–W173 (2009)
Article MathSciNet Google Scholar
Ontology Alignment Evaluation Initiative: http://oaei.ontologymatching.org/
Rahm, E.: Towards large scale schema and ontology matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping, Chap. 1, pp. 3–27. Springer, Heidelberg (2011)
Chapter Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Article MATH Google Scholar
Raunich, S., Rahm, E.: Atom: automatic target-driven ontology merging. In: ICDE, pp. 1276–1279 (2011)
Google Scholar
Smith, B., et al.: The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25(11), 1251–1255 (2007)
Article Google Scholar
Tordai, A., et al.: Lost in translation? Empirical analysis of mapping compositions for large ontologies. In: Proceedings of the Ontology Matching Workshop (2010)
Google Scholar
UBERON: http://obofoundry.org/wiki/index.php/UBERON:Main_Page
Unified Medical Language System: http://www.nlm.nih.gov/research/umls
Whetzel, P.L., et al.: Bioportal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications. Nucleic Acids Res. 39(Suppl. 2), W541 (2011)
Article Google Scholar
Zhang, S., Bodenreider, O.: Alignment of multiple ontologies of anatomy: deriving indirect mappings from direct mappings to a reference. In: AMIA Annual Symposium Proceedings, pp. 864–868 (2005)
Google Scholar
Zhang, S., Bodenreider, O.: Experience in aligning anatomical ontologies. Int. J. Semant. Web Inform. Syst. 3(2), 1–26 (2007)
Article Google Scholar

Download references

Acknowledgment

This work is supported by the German Research Foundation (DFG), grant RA 497/18-1 (“Evolution of Ontologies and Mappings”).

Author information

Authors and Affiliations

Department of Computer Science, University of Leipzig, Leipzig, Germany
Michael Hartung, Anika Gross & Erhard Rahm
Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
Michael Hartung, Anika Gross, Toralf Kirsten & Erhard Rahm

Authors

Michael Hartung
View author publications
You can also search for this author in PubMed Google Scholar
Anika Gross
View author publications
You can also search for this author in PubMed Google Scholar
Toralf Kirsten
View author publications
You can also search for this author in PubMed Google Scholar
Erhard Rahm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Hartung .

Editor information

Editors and Affiliations

University of Southampton, Southampton, United Kingdom
Elena Simperl
British Museum, London, United Kingdom
Barry Norton
Ljubljana, Slovenia
Dunja Mladenic
DEIB - Politecnico di Milano, Milano, Italy
Emanuele Della Valle
Foundation for Research and Technology Hellas (FORTH), Heraklion, Greece
Irini Fundulaki
MDG Web Limited, Dublin, Ireland
Alexandre Passant
Multimedia Communications Department, EURECOM, Campus SophiaTech, Biot, France
Raphaël Troncy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hartung, M., Gross, A., Kirsten, T., Rahm, E. (2015). Effective Composition of Mappings for Matching Biomedical Ontologies. In: Simperl, E., et al. The Semantic Web: ESWC 2012 Satellite Events. ESWC 2012. Lecture Notes in Computer Science(), vol 7540. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46641-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-662-46641-4_13
Published: 21 April 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46640-7
Online ISBN: 978-3-662-46641-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics