1 Introduction

Units of measurement like meter, kilogram or yard are essential for a precise description of data. They alone allow an unambiguous interpretation of values in datasets. Ontologies like the ones proposed in [1,2,3,4,5] provide one good option for modeling this aspect. However, different data-centric applications cater to different audiences and provide different functionalities. As a consequence, ontologies created by different projects differ in their level of support for individual use cases. This situation challenges new projects to select a suitable ontology that fits their specific needs.

In this paper, we aim to provide support for such a decision, by analyzing a set of use cases for unit of measurement ontologies. After providing the necessary background in Sect. 2, we present possible use cases in Sect. 3. In order to cover all relevant aspects, we have complemented use cases described in the literature with a number of new ones. Following this requirements analysis, a set of seven ontologies will be studied to check their support for each requirement. In Sect. 4 suitable metrics will be defined to rate each ontology use case pairing. Finally, in Sect. 5 the ontologies are evaluated with respect to their suitability for each use case. This results in a ranking of ontologies for each use case, which can be used to identify the best existing ontology for new projects’ use cases.Footnote 1

2 Related Work

Several use cases (UCs) for unit ontologies are described in [2, 3, 5, 7,8,9]. They will be reviewed in detail in Sect. 3. In [8] the coverage of features in multiple unit ontologies was analyzed. This analysis determined a lack of a unit ontology containing all important concepts of this domain. In [7] five feature support levels were defined to rank unit ontologies, which provides a fast overview of scope and level of development of ontologies. The order of requirements for each ranking level, however, seems biased by the author’s background. An example is conversions, which are necessary to the second level. Even an ontology modeling all other features mentioned can not go beyond level two as long as it is missing conversions. Finally, the ranking was applied to multiple ontologies. Nevertheless, a metric based suitability evaluation of unit ontologies per UC is still missing.

The application of Competency Questions (CQs) [10] is a popular method in the field of ontology engineering to describe the required concepts for a UC of an ontology, that can also be used for ontology evaluation [11]. However, this approach is limited to the mere assessment of a single ontology, instead of comparing multiple ones. Furthermore, if the list of requirements can be gathered otherwise, it is not mandatory to formulate CQs. Therefore, a metric that directly uses a list of requirements is favorable.

OntoQA [12] is a popular set of metrics in the field of ontologies. These metrics provide different relationship based rankings of a schema and its classes and instances. In addition, it is possible to provide a keyword list, to focus the ranking on relevant terms. But a high ranking does not assure that an ontology can fulfill a given UC, even if an adequate keyword list was provided.

Another extensive set of metrics is provided by OntoMetric [13]. It consists of a taxonomy of 160 metrics in the five main branches content, language, methodology, tools and costs and a method to calculate the total ranking of the ontologies. This includes, for instance, the metrics essential concepts and essential relations. The metrics can be weighted by the user, but it is not possible to rank the importance of the required concepts and relations.

3 Use Cases

To evaluate the suitability of ontologies for a certain use case, the corresponding requirements have to be known. Therefore we will provide a description and a requirements analysis for each use case. Requirements will be distinguished in necessary and optional requirements. Necessary requirements are features that make the ontology eligible for a use case - if one of them is not modeled, the ontology is not able to provide even basic support for the use case. Optional requirements are those that are not necessary but simplify the implementation of a use case or increase its usefulness. Besides covering all use cases mentioned in the literature, we also provide some new use cases (marked by *) that have, to the best of our knowledge, not yet been presented. To provide a better overview, we group use cases that are concerned with similar domains.

Figure 1 outlines the use case grouping, while Table 1 summarizes the relationship between use cases and requirements.

Fig. 1.
figure 1

Schematic overview over use cases and groups.

Table 1. Requirements per use case. (\(\CIRCLE _{}\)...necessary requirement; \(\LEFTcircle _{}\)...optional requirement; ...not required; for entries with the same index at least one has to be present)

Group 1

(Data Annotation). The first group consists of use cases that are related to data annotation. Data annotation here is the assignment of a unit of measurement or kind of quantity to a dataset or parts thereof. Consistent and consequent data annotation can prevent misunderstandings and ambiguities when exchanging, merging or comparing datasets.

UC 1

(Manual Annotation). [2, 7]

An ontology can assist manual data annotation by providing lists containing kinds of quantities or units of measurement for the user to choose from.

Example: Before publishing a dataset, researchers have to create meta data, which includes annotation with units of measurement.

Necessary: An ontology has to model kinds of quantities or units of measurement.

Optional: The connection between kinds of quantities and units of measurement can be modeled so after choosing from one list, the other one is limited to matching entries. In the same way, fields of application and their connections to units of measurement or kinds of quantities as well as systems of units and their connections to units can be used. Additionally, if there are values given in the dataset, those can be used alike if there is a model of typical or allowed values for kinds of quantities or units of measurement. The content of the lists can also be translated into the preferred language of the user if there are labels in multiple languages present in the ontology. To improve the visual representation of annotated data, symbols for units and kinds of quantities can be included.

UC 2

(Automated Annotation). [2]

When the amount of datasets grows, manual annotation is not feasible anymore and has to be replaced by an automatic approach. An ontology can enable a system to automatically derive kinds of quantities or units of measurement from a textual description.

Example: For populating a new, semantically enhanced data management platform with a large amount of datasets, they have to be annotated.

Necessary: An ontology has to model kinds of quantities or units of measurement to enable this.

Optional: To improve the efficiency of such a system the ontology can include the connection between units of measurement and kinds of quantities. It can also model fields of application and systems of units as well as the respective connections to units of measurement and kinds of quantities. Additionally, typical and allowed values per units of measurement or per kinds of quantity can be used to limit the possible options.

The textual description can contain symbols and be written in the user’s preferred language, so models for symbols and labels in multiple languages can be exploited, too. In [2] the authors also mention modeling everyday language designators to handle common mistakes like writing “weight” instead of “mass”.

UC 3

(Automated Translation). [3, 8]

Designators, e.g., for kinds of quantities and units of measurement can automatically be translated for annotated data to cater to users of different language backgrounds. This will also reduce the number of errors as a result of missing (English) language skills.

Example: When datasets are exchanged between researchers each individual can work on them using their own language.

Necessary: An ontology needs to provide models for units of measurement or kinds of quantities and labels in at least two languages.

UC 4

(Representation of Experiments). [2]

An ontology can be used to represent observations and experiments. [2] defines an observation as a link between a phenomenon, a kind of quantity, a numerical value and a unit of measurement. Hence, this can be interpreted as the annotation of an observation with the aforementioned concepts of the ontology.

Example: A user wants to represent his measurement of the height of a certain specimen within an ontology.

Necessary: An ontology needs to provide models for units of measurement, kinds of quantities, measurements and the connections between those concepts. Additionally, there has to be the possibility to state the measured phenomenon and the measured value.

Optional: The suitability can further be improved if the ontology itself models phenomenon so no further ontology has to be included.

Group 2

(Conversion). The second group consists of use cases that are related to conversions between units. Unit conversion is changing the unit used to represent a measurement.

UC 5

(Conversion between Units). [2, 7]

For the unit conversion a proper formula has to be provided.

Example: Differences in measured units can easily be overcome as, e.g., measurements taken using imperial units can be converted into the metric system.

Necessary: An ontology has to model units and a conversion between them. A conversion here consists of a conversion factor and an offset.

UC 6

(*Precision of Conversions).

Many applications depend on exact data. Due to the limited precision of floating point arithmetics in computer systems, conversions influence the accuracy of the converted data. As a consequence, an ontology has to augment each conversion it provides with an estimation of the respective accuracy for the values.

Example: Many conversions introduce an error of some degree. For the final result of possibly multiple conversions one has to be able to estimate whether the achieved accuracy of the result still matches the given requirements.

Necessary: An ontology needs to model units of measurement, conversion and information about the precision for the latter.

Group 3

(Consistency Checking). The third group includes all use cases that check formulas or annotated terms for consistency. In [3] consistency checking is mentioned but is not described in detail. Hence, is not listed as a reference in the individual use cases.

UC 7

(Dimensional Consistency). [2, 7]

Equations and terms can be checked for dimensional consistency by comparing the dimensions or dimension vectors of all its components. Individual terms can also be checked for conformance with a given dimension vector. In [7] the necessity to check code for dimensional consistency is mentioned, too.

Example: Considering a formula like “x m + y ft = z pc” a system should state that the formula is dimensional consistent.

Necessary: An ontology has to model dimension vectors, units of measurement and the connection between them to be suitable for this use case.

Optional: The suitability can be improved by modeling dimensions and their connection to units of measurement so equations do not have to be compared by their dimension vectors but their dimensions.

UC 8

(Unit Consistency). [2]

In extension of UC 7, not only the dimensions of the involved components are compared, but also the actually used units. This highlights cases, where, e.g., values given in meter and foot are added without the necessary conversions.

Example: Using the same formula as UC 7, “x m + y ft = z pc”, a system should this time determine that the formula is not unit consistent.

Necessary: An ontology has to model units of measurement and unit compositions.

UC 9

(*Quantity Consistency).

Similar to UC 8 the consistency with regard to kinds of quantities can also be tested. An equation or term is considered quantity consistent if all its components use kinds of quantities in a compatible manner.

Example: Adding two lengths is considered compatible, whereas adding a width and a height is not, although they might share the same unit of measurement.

Necessary: An ontology has to model kinds of quantities and the quantity composition.

UC 10

(Consistency between Kind of Quantity and Unit of Measurement). [2]

Each kind of quantity is accompanied by a set of units of measurement that can be used to express observations of it. A system can now check for the cases, where a unit of measurement is used in conjunction with a kind of quantity without being assigned to it.

Example: A measurement of two meters is considered compatible to height, whereas a measurement of two seconds is not.

Necessary: To check consistency between a given unit of measurement and a kind of quantity an ontology has to provide both concepts and a connection between them.

UC 11

(Value Consistency). [8]

Some units of measurement and kinds of quantities have a restricted range of allowed values. A system can assure the data quality by checking entered data.

Example: A value of minus five for degree Celsius is considered compatible, whereas for Kelvin it is not.

Necessary: To check if values that are annotated with such a kind of quantity or unit of measurement lie within those ranges, an ontology has to model units of measurement or kinds of quantities and the respective allowed values.

Optional: To further improve on this, an ontology can not only model allowed values but also typical values for units of measurement or kinds of quantities. Since typical values vary heavily depending on the field of application, they should be stated per field of application. A model for conversions between units can help further, because typical and allowed values for units of measurement, that have not been specified, can then be calculated from the values of other units of measurement.

Group 4

(Ontology as a Knowledge Base). The ontology can be used as a knowledge base to search for important information. Depending on the kind of information, multiple use cases can be distinguished.

UC 12

(Search for alternative Units of Measurement). [8]

An ontology can be used to search for possible alternatives given a unit of measurement. To determine the set of possible alternatives kinds of quantities, dimensions or dimension vectors can be used.

Example: When encountering an unfamiliar unit like Gunter’s chain this allows for easy access to possible alternatives like meter.

Necessary: An ontology has to model units of measurement and kinds of quantities, dimensions or dimension vectors as well as their connections to units of measurement.

Optional: Similar to the manual annotation, the suitability for this use case can be improved by modeling fields of application and systems of units and their connections to units of measurement so the number of possible alternatives can be reduced.

UC 13

(Search for Symbols). [8, 9]

Symbols for units and kinds of quantities can, e.g., be used for informal data annotation or for a shortened representation in a user interface. The search for symbols and abbreviations for units of measurement or kinds of quantities therefore is an everyday use case.

Example: When creating natural language texts from more formal data sources measurements usually will use abbreviations of used units instead of their full name.

Necessary: An ontology has to model kinds of quantities or units of measurement and the respective symbols.

UC 14

(*Unit Resolving).

In unit resolving, one is given a formula and the unit for each contained value. The task is now to determine the resulting unit of this formula. This assumes, that the formula is consistent with regard to UCs 7 to 9.

Example: Given a formula like “\(x~kg \times y~\frac{m}{s^2} = z~\text {?}\)” a system has to deduce that the missing unit could be Newton.

Necessary: This use case relies on units of measurement and unit composition because it has to compute possible compositions for the units of measurement used in the formula.

Optional: It can further be improved by using conversions so that mismatching units can automatically be converted.

UC 15

(Search for Units of Measurement). [8]

The search for units of measurement is not restricted to alternatives, but can use a variety of different inputs. The input can, for example, consist of kinds of quantities, symbols, dimensions, dimension vectors, prefixes, systems of units or any combination of those.

Example: A user is looking for a metric unit of measurement for the kind of quantity length that uses the prefix kilo.

Necessary: Any ontology that models units of measurement is eligible to support the search for units because a plain list is sufficient to choose a unit of measurement.

Optional: Each concept modeled in addition can improve the suitability by enabling more input combinations and therefore narrowing down the results. These concepts are kinds of quantities, symbols for units of measurement, fields of application, dimensions, dimension vectors, prefixes, systems of units and the connections between each of those concepts and units of measurement. Labels in multiple languages and everyday language designators can also be helpful in order to enable users to state input in their preferred language.

UC 16

(Ontology as Unit Reference). [3, 5]

A unit ontology can be used as a reference by other ontologies by providing unique identifiers for units of measurement.

Example: An ontology about animals can reuse the definition of meter or kilogram in the description of specimen, without having to redefine them.

Necessary: An ontology only needs to model units of measurement.

Optional: To improve the suitability for this use case, more concepts can be modeled to provide even more unique identifiers. These concepts are systems of units, kinds of quantities, fields of application and dimensions. To enable the user to easily access further information, there should be labels in multiple languages and resolvable URIs for the ontology.

4 Methods

We will use a metric to evaluate the suitability of an ontology for a UC. This metric depends on the list of necessary and optional requirements of each UC outlined in Sect. 3. To simplify the metric we first define a set of sub-metrics. For each required concept, relation or other feature, except the language support, we define a boolean metric \(m\) in Eq. 1. Those sub-metrics remain boolean since we are only concerned with the mere existence of a feature and not the extent of its usability.

$$\begin{aligned} m = {\left\{ \begin{array}{ll} 1 &{} \text {: concept, relation (direct or indirect) or feature contained}\\ 0 &{} \text {: otherwise} \end{array}\right. } \end{aligned}$$
(1)

RDF provides a dedicated mechanism for the usage of different languages by allowing developers to attach language tags to labels [14]. Hence, the ontologies do not have to model this on their own. To assess the support, we check the usage of the RDF concept. The value an ontology reaches should be the higher the more languages are supported by it. Therefore we need a metric to rate the number of different languages \(l\) in an ontology.

$$\begin{aligned} m_{lang} = 1-\frac{1}{l+1} \end{aligned}$$
(2)

Finally, we define for each UC the encompassing suitability metric \(m_{suit}\) as the aggregation of its sub-metrics:

$$\begin{aligned} M_{nec}= & {} \{m \mid m \text { is metric of a necessary requirement} \} \end{aligned}$$
(3)
$$\begin{aligned} M_{all}= & {} \{m \mid m \text { is metric of a necessary or optional requirement} \} \end{aligned}$$
(4)
$$\begin{aligned} m_{suit} = \left( \min _{m \in M_{nec}}{\lceil m \rceil } \right) \times \left( \displaystyle \sum _{m\in M_{all}}\frac{m}{\vert M_{all}\vert }\right) \end{aligned}$$
(5)

The first part in Eq. 5 ensures that an ontology is rated with zero if at least one necessary feature is missing. The ceiling function is necessary to accommodate for the language sub-metric. The second part is the average over all sub-metrics and provides a gradation between ontologies, that implement a different number of optional requirements. All sub-metrics are equally weighted for now, but this can easily be extended to use a vector of weights.

5 Results

To evaluate the current state of ontology development in the field of units of measurement we applied the requirements of the use cases identified in Sect. 3 and the metrics defined in Sect. 4. We analyzed the following seven prominent representatives of unit ontologies.

  • Measurement Units Ontology (MUO)Footnote 2; result of a project to exploit semantics in mobile environments; the instances were automatically generated from UCUM [15],

  • Extensible Observation Ontology (OBOE)Footnote 3; an ontology suite to represent scientific observations,

  • Ontology of units of Measure and related concepts (OM)Footnote 4; an ontology to model concepts and relations important to scientific research, developed in the context of food research [2],

  • Library for Quantity Kinds and Units (QU)Footnote 5; a showcase ontology based on the OMG SysML 1.2 QUDV specifications and the UN/CEFACT Recommendation 20 code list [16],

  • Quantities, Units, Dimensions and Data Types Ontologies (QUDT)Footnote 6; developed in the context of NASA projects,

  • Semantic Web for Earth and Environmental Terminology (SWEET)Footnote 7; also developed in the context of NASA projects and

  • Units of Measurement Ontology (UO)Footnote 8 + Phenotypic Quality Ontology (PATO)Footnote 9; both modules of the OBO family to model units and phenotypic qualities.

In a first step, each ontology was examined with respect to the requirements. In the process, the results of [17, 18] were used where possible. In that project we analyzed the ontologies’ instances with respect to their distribution and possible errors. Bear in mind, though, that with this work we are just analyzing ontologies with regard to their basic support for use cases and not the extent of such support. As a consequence, a feature is regarded as supported if there is any modeling of such a feature. The number of actual instances of such a feature does not matter as long as there is a matching concept. Note, furthermore, that the modeling of concepts related to UC 4 like phenomenon or measurement is not part of [18] and therefore had to be checked manually.

Table 2. The presence of features within the examined ontologies. (\(\CIRCLE _{}\)...feature modeled; ...feature not modeled)
Table 3. Suitability scores for the examined ontologies.

To judge the number of languages used by an ontology we counted the number of different language tags appearing within. This, however, is not accurate as ontologies do not seem to use language tags consequently: Even if a language tag is used in the label for one instance, one should not assume the same for all instances. Sometimes the language tag is even missing entirely. That is if there is a label at all, which can not be taken for granted. To improve this sub-metric a further analysis on an instance level has to be conducted. In this work, however, the main focus was the modeling used by the ontologies and hence the number of different language tags seems a suitable approximation. The existence of features in the ontologies as per our analysis is given in Table 2.

Using the requirements of Sect. 3, the metrics presented in Sect. 4 and the results from Table 2 a suitability score has been computed for each pair of ontology and use case. Table 3 shows an overview of the computed values. Note that the sub-metric describing the presence of language tags can never reach a value of one (cp. Eq. (2)). As a consequence all metrics using that sub-metric should only be used to compare ontologies and not to rate a single ontology.

The support for different use cases varies quite a lot. One prime example is data annotation (Group 1): While both manual (UC 1) and automatic (UC 2) annotation are basic features supported by all ontologies, the translation of designators (UC 3) on the other hand oftentimes fails as just OM contains multiple languages for its labels. The representation of experiments (UC 4) fails in most ontologies as well due to missing concepts in that area.

Conversion (Group 2) in its basic form (UC 5) is supported by almost all ontologies, but no ontology includes any estimation of the accuracy of the provided values (UC 6).

Consistency checks (Group 3) just succeed for connections between unit of measurement and kind of quantity (UC 10). Other checks fail for different reasons with just a few exceptions: OM (UCs 7 and 8) and QUDT (UC 7).

Finally, the use of the ontology as a knowledge base (Group 4) seems pretty well supported. The only exception here is unit resolving (UC 14), which fails in all ontologies but OM due to the missing unit composition.

Overall there are just three use cases, that are currently not supported by any ontology. For each of those use cases, one crucial feature is missing:

  • UC 6: Precision of Conversions.

  • UC 9: Quantity Consistency.

  • UC 11: Value Consistency.

From the point of view of a new project, OM seems to be the best choice right now. For no use case, any other ontology surpasses OM with respect to the suitability scoring with the closest overall contenders being QUDT, QU and SWEET.

6 Conclusion

We compiled an inventory of possible use cases for unit ontologies, grouped by similarity. This list consists of use cases given in literature as well as some, that have not been covered so far. We analyzed necessary as well as optional requirements. This resulted in the definition of a metric to compare the suitability of different ontologies for specific use cases. Using both requirement list and metric we then evaluated a set of seven representative ontologies.

The comparison highlighted the different focus in the development of the ontologies. Each one was created with a different set of use cases in mind. Summing up, current ontologies support a lot of use cases to a pretty decent level. However, our analysis reveals missing support for some use cases by ontologies.