Building Time-Affordable Cultural Ontologies Using an Emic Approach

Petit, Jean; Boisson, Jean-Charles; Rousseaux, Francis

doi:10.1007/978-3-319-55970-4_8

Building Time-Affordable Cultural Ontologies Using an Emic Approach

Jean Petit^17,18,
Jean-Charles Boisson¹⁹ &
Francis Rousseaux¹⁸

Conference paper
First Online: 31 March 2017

545 Accesses
7 Altmetric

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 497))

Abstract

Recently, studies about culturally-aware systems have arisen to address digitized culture. Among these systems those enculturated driven by cultural knowledge embed culture in their design. To deal with the specifics of cultural groups, the development of machine-readable cultural knowledge representations can provide a substantial help. In this research we present a process to build time-affordable, emic, conceptually-sound and machine-readable cultural representations. These representations originate from Cognitive Anthropology. They follow a three steps methodology: ethnographic sampling, individuals’ personal knowledge elicitation and cultural consensus analysis. We use lexico-semantic relation extraction as a mean to automatically elicit knowledge structures. Their formalisation is achieved through Ontology Engineering.

We conducted experiments to build three cultural ontologies in order to assess the whole process. It came out that with the lexico-semantic relation extraction technique, the best representations we can obtain are consensually-limited, incomplete and contain some errors. However, many clues indicate that these problems should be solved by using higher quality elicitation techniques.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Interest in cultural awareness grows more popular as globalisation is vector of increasing cultural diversity. Since the 2000s, with the rapidly expending web, culture is digitized and computer systems are now the entities which are the most exposed to its diversity. Culture shapes users’ behaviors and thus impacts the performance of many systems/applications. That is why these systems have to develop cultural awareness.

Blanchard et al. [1] define culturally-aware systems as “any system where culture-related information has had some impact on its design, runtime or internal processes, structures, and/or objectives”. They present three types of systems: enculturated systems, runtime cultural adaptation systems and cultural data management systems. Enculturated systems are systems whose design meet the cultural requirements of given cultures [1]. Runtime cultural adaptation systems aim to artificially reproduce cultural intelligence through two steps: understanding and adaptation. In other words, by identifying one’s culture a culturally-intelligent system can provide the right enculturation as presented by Rehm [2]. The enculturation of a system is constrained by the cultural knowledge available for the latter or a designer. That is why, machine-readable representations providing understanding about cultures could effectively support the development of these systems.

Two approaches can be used to produce representations of cultures. The etic approach has for objective to find cultural universals. It is an outsider view of culture. In contrast, the emic approach tries to identify the specifics of a culture such as their concepts and behaviors. Insight is gained from inside. Currently, cultural knowledge representations used to support the development of enculturated systems are etic-based. Their main appeals are that they are ready-to-use representations easily applicable to any culture [3, 4]. However, these representations are coarse-grain and limit the understanding of the cultures they describe [5]. Therefore, finer-grained emic-based representations are more relevant to develop enculturated systems.

While emic-based representations solve the problem associated with the lack of granularity, their creation is time-consuming. Most of the methodologies used in practice by ethnographers require intensive human intervention (from the ethnographers or participants) in the process of eliciting knowledge. Therefore the latter is hardly scalable, and thus not practicable to deal with the diversity of cultures. As such, the process supporting the construction of emic-based cultural representations must be relatively automatic.

In this paper we present a process applicable to any cultural domains to build time-affordable, emic, conceptually-sound and machine-readable cultural knowledge representations. To construct these representations we followed a methodology coming from Cognitive Anthropology. It is composed of three steps leading to the acquisition of culturally-relevant information: ethnographic sampling, individuals’ personal knowledge elicitation and cultural consensus analysis. The time-affordable elicitation of knowledge and its formalisation are similar to what already exist in other ontology engineering works such as SPRAT [6] or DYNAMO^{Footnote 1} [7]. We follow Hearst’s [8] method to automatically extract hypernym/hyponym relations from texts. As for the formalisation of the representations, we rely on the Resource Description Framework (RDF) formal language. Therefore, this research is about the emic and automatic generation of cultural ontologies from texts.

Our plan is as follow. We begin by introducing the methodology. It starts with the creation of the cultural knowledge representations and ends with their formalisation. Then, we present our process and the associated design choices. We end by experimenting extensively our process on the public safety domain with police forces coming from Australia, USA and England. Obtaining encouraging results, we conclude this study.

2 Emic-Based Cultural Knowledge Representations

Ethnography is the process of collecting, recording and searching for pattern to describe a culture of people. In other words, ethnography is about discovering cultural knowledge leading to the production of cultural knowledge representations. “New ethnography”, ethnoscience or Cognitive Anthropology are founded on the premise that culture is a “conceptual mode underlying human behavior” [9]. The cognitive theory of culture situates culture in the mind as a system of learnt and shared knowledge [10, 11].

This theory shaped a number of methodologies to produce cultural representations which are intrinsically emic. “Ethnographers must discover the organizing principles of a culture–the semantic world of the natives–while avoiding the imposition of their own semantic categories on what they perceive” [12].

To our knowledge, there is no clearly defined methodology to create cultural representations. Most of the ones developed in the literature are based on the ethnographers’ experiences. However, these methodologies share three main steps: ethnographic sampling, individuals’ personal knowledge elicitation and cultural consensus analysis [13,14,15,16].

2.1 Ethnographic Sampling

The ethnographic sampling step is based on the idea that cultural knowledge is socially-constructed. It aims to capture a representative number of individuals likely to share the same culture and thus similar knowledge. This task is generally achieved through the identification of a community, a set of individuals with long-term, strong, direct, intense, frequent and positive relations [17].

Once the ethnographic sample is determined, the knowledge of each participant needs to be elicited.

2.2 Individuals’ Personal Knowledge Elicitation

Knowledge is personal [18]. It roots deeply in the subconscious of one self in a tacit state. In order to elicit knowledge, it has to become object of thought [19]. The goal of the knowledge elicitation step is to explicit tacit internal knowledge structures. Jones et al. [20] distinguish two categories of knowledge elicitation: direct and indirect. In the first category, knowledge is directly elicited by the individual possessing the knowledge whereas in the second, knowledge emerges from the analysis of data collected from the individual.

“[C]oncepts are the building blocks of knowledge [and] relations [...] the cement that links up concepts into knowledge structures” [21]. Lexico-semantic relations are universal/intercultural knowledge structures representative of basic cognitive functions [21, 22]. They constitute the core of any conceptualisation. As such, individuals’ knowledge elicitation is mainly about acquiring concepts and lexico-semantic relations.

After eliciting the personal knowledge structures of each individual constituting the sample, their distribution has to be analysed to determine their cultural dimension.

2.3 Cultural Consensus Analysis

The cultural consensus analysis step enables the operationalization of culture [15]. Cultural Consensus Theory (CCT) “formalizes the insight that agreement among [individuals] is a function of the extent to which each knows the culturally defined ‘truth’ ” [23]. CCT also “refers to a family of models that enable researchers to learn about [individuals’] shared cultural knowledge” [24] such as the General Condorcet Model [25]. Depending on the form of the elicited knowledge, either formal or informal CCT models are used [26, 27]. However, simple aggregations, majority or averaging responses across respondents also constitute reasonable cultural estimates [28].

The three steps of the methodology leads to the production of cultural knowledge representations. However as such, they cannot be used for the development of enculturated systems as computers systems are not yet able to make sense of them. To be understandable, they have to be formalised.

3 Formal Cultural Knowledge Representations

The cultural representations are composed of knowledge structures. The formalisation of such structures is studied in the field of Knowledge Engineering, more precisely the Ontology Engineering subfield. Therefore, methodologies to build ontologies could be used to formalise the cultural representations.

3.1 Ontologies

Gruber had defined an ontology as “an explicit specification of a conceptualisation” [29]. The term ‘explicit’ in Gruber’s definition means that the knowledge must be specified unambiguously, constraining its interpretation. The principal components of an ontology are labels, concepts, relations and axioms. Axioms are rules associated to the relations in order to embed logic necessary for reasoning.

Borst [30] added to the former definition that the specification had to be formal and the conceptualisation shared. Indeed, it is necessary that the conceptualisation results from a consensual agreement to ascertain that the knowledge embedded is coherent and consistent within a specific context. This task is called an ‘ontological commitment’. This aspect is ensured by the shared dimension of the cultural representations. The formalisation of the specification is needed for interoperability, re-usability and especially for enculturated systems to read cultural representations.

There are different levels of formalism depending on the language used to express the ontology ranging from informal, mostly written in natural languages, to formal, based on machine-readable languages. Formal languages like RDF (Resource Description Framework) or OWL (Web Ontology Language) are supporting the semantic web. RDF is a language based on entities (resource, property, value) which constitute triples of the form (subject, predicate, object). Resources are concepts described thanks to an Uniform Resource Identifier (URI). It makes sense since ontologies are non-ambiguous specifications. Properties can be attributes or any other kind of relations, most likely semantic ones. Values are literals pointing either to a symbol or another resource. The common syntax to formalize RDF is the XML, called RDF/XML. Ontologies written in RDF can be interpreted by machines through SPARQL Protocol and RDF Query Language (SPARQL).

3.2 METHONTOLOGY

Methodologies to create ontologies are mostly based on experience [31]. The METHONTOLOGY is a proven framework describing the general steps to build an ontology [32]. Common steps are composed of specification, conceptualisation, formalisation, implementation and evaluation.

The specification consists in planning the production and exploitation of an ontology. At a minimum, it defines its primary purpose, level, granularity and scope. These specifications are mainly guidelines for the conceptualisation. Typically, the conceptualisation step is carried out by a group of domain experts. The goal is to discover the significant concepts and associated relations related to a domain [33]. The formalisation step expresses the conceptualisation with formal languages. It is often manually supervised by knowledge engineers or with the support of a software like Protégé^{Footnote 2}. Mapping techniques can also be used to automatically transpose informal to formal knowledge [34]. The implementation step addresses the technical and practicable aspects associated with the usage of an ontology by a computer system. The evaluation step validates each step according to the specifications.

Following the METHONTOLOGY, we are able to produce formal cultural ontologies by considering cultural knowledge representations as conceptualisations. Finally, these ontologies are readable by computer systems and can provide a significant amount of understanding about the cultures they represent.

4 Building Time-Affordable, Formal and Emic-Based Cultural Representations

The design of our process was driven by the METHONTOLOGY whose conceptualisation step consists in the methodology coming from Cognitive Anthropology. Among other choices required to build the process, we decided to use the lexico-semantic relation extraction to have an elicitation as automatic as possible.

4.1 Selecting Individuals Based on Shared Social Criteria

Typically, cognitive anthropologists select their sample through shared socially-related criteria such as genders, religions, jobs or areas - working places [35], towns [36] or regions [16]. While the strength of this method comes from its ease of use and speed, its weakness is that it cannot fully guarantee that the selected individuals actually represent a community. Effective but costly techniques to identify communities can be found in social sciences such as the community detection algorithms coming from social network analysis.

In this study, samplings are created by following the traditional technique as a number of studies proved its efficiency.

4.2 Eliciting Automatically Individuals’ Knowledge from Texts

Automatically extracting individuals’ knowledge structures from texts is an indirect elicitation technique [37]. It is composed of two tasks. The first one consists in collecting a sufficient amount of textual data for a given individual. The second task aims to retrieve the latter’s knowledge (i.e. significant concepts and/or relations) by analyzing the data.

4.2.1 Collecting Web Data

Ethnographic data are mainly textual and most of the time collected thanks to interviews or observations. Besides being costly in time, recording data through these means also biases to some extent the data. The safest and fastest technique to collect data is to gather already existing raw data.

Nowadays, the web provides a large amount of freely available textual data about many individuals from which data can be collected. In our process, the data were retrieved directly from websites. Textual data collection was achieved thanks to HTTRACK^{Footnote 3}. It is a tool that can mirror the content of a website by crawling and downloading its files.

The automation of the data collection came with an additional constraint during the sampling step. Indeed, it became necessary to verify that the individuals composing the sampling disposed of accessible online data.

4.2.2 Textual Data Analysis

The goal of the data analysis is to retrieve the conceptualisation of an individual [37]. This part of our process consists in acquiring knowledge structures by mining significant concepts and their relations. It required several preprocessing steps. We started by cleaning the data, followed with natural language processing and ended by annotating the lexico-semantic relations to extract.

Preprocessing

The web nature of the data collected drove the cleaning operations. Web data can come in various file formats (.doc, .odt, etc). The text extraction from any files was achieved by Apache Tika^{Footnote 4}. We handled language heterogeneity by identifying the language of each document with the LangDetect API [38]. We only kept English documents. OpenNLP^{Footnote 5} was used to detect sentences. We decided to work on the sentence level rather than the document level mainly to avoid data redundancy by ensuring that the sentences were unique. For example, documents coming from websites are often distinct from each other while they are composed of duplicate contents such as menus, Twitter or Facebook feeds and so on.

Then, we used the Stanford CoreNLP API^{Footnote 6} to support common natural language processing operations: tokenization, Part of Speech (PoS) tagging and lemmatization. Eventually, nominals which constitute the main concepts of conceptualizations were found using a simple pattern matching based on the PoS tags of the tokens.

After having cleaned and preprocessed the textual data, the results were stored as annotations in a ‘serial data store’ using GATE^{Footnote 7} (General Architecture for Text Engineering). This last operation was required to easily retrieve and mine the data.

Discovering Important Concepts

Finding significant concepts in content is based on the idea that the number of occurrence and importance of a token are correlated. Thus, term frequency is often used to weight and rank terms. Other metrics can achieve similar results, such as TF/IDF (Term Frequency/Inverse Document Frequency).

In our process, the important concepts were selected by coupling the quantification of nominals with a rough filtering on their total occurrences.

Finding Significant Relations

In this study, we use the most popular method to find lexico-semantic relations. Introduced by Hearst [8], it relies on handwritten syntactic patterns indicative of semantic relations. For example, in the sentence: ‘A dog is an animal’, the syntactic pattern ‘is a’ indicates that there is a hypernym/hyponym relation between ‘animal’ and ‘dog’. Therefore, hypernym/hyponym relations can be discovered through a simple mapping, by using the expression Y is a X, with Y and X two nominals. Thereafter, many researchers confirmed the relevance of Hearst’s methodology by applying it for other lexico-semantic relations [39,40,41,42,43,44].

Like Wang et al. [45], the implementation of lexico-semantic relation extraction was achieved through the Java Annotation Patterns Engine^{Footnote 8} (JAPE) which is specific to GATE. The syntactic patterns we used are summarized in the Table 1.

Table 1. Syntactic patterns indicative of hypernym/hyponym relations.

Full size table

The final set of extracted lexico-semantic relations is constituted by filtering them according to the significance of their pairs of concepts.

At the level of the individuals, we are able to elicit their personal knowledge. However, we cannot yet determine which part is cultural. To this end, we have to analyze the ‘sharedness’ of these distributed knowledge structures.

4.3 Aggregating Concepts and Lexico-Semantic Relations

To analyze the cultural consensus of the sample, the elicited personal knowledge (concepts and lexico-semantic relations) of each individual was aggregated. It led to a mixed representation composed of knowledge ranging from personal to cultural (similarly to Vuillot et al. [16]). To obtain a valid cultural representation, it is necessary to evaluate the knowledge and filter the latter based on its distribution.

At this stage we are able to create a cultural representation from an ethnographic sample. However, these representations cannot be implemented into enculturated systems and thus are still unusable. They have to be formalised.

4.4 Ontologizing Concepts and Lexico-Semantic Relations

In our process, we used the “ontologizing” technique [34]. After the consensus analysis, we mapped the concepts and hypernym/hyponym relations into RDF classes and RDFs sub-classes.

The formalisation constitutes the last step of our process which is summarized in Fig. 1. It starts by selecting individuals based on shared social criteria. Then, web data about each individual of the sample are collected. These data are analysed through text-mining techniques to automatically elicit their respective personal knowledge (embodied in the conceptual structures). By quantifying the sharedness of individuals’ personal knowledge, we are able to determine the cultural consensus. The latter analysis enables the production of a cultural representation. Finally by ontologizing the conceptual structures, a formal cultural ontology is created. Having described the whole process to produce formal time-affordable cultural representations, the next section consists of experiments to assess its performances.

5 Experiments

The public safety domain was chosen for our experiments for two main reasons: the available amount of data and current social context. After a description of the settings, we present and discuss the results associated to three formal cultural representations we tried to produce.

5.1 Settings

We constituted three samples with culturally different police forces coming respectively from Australia, United States and England (see Table 2 ^{Footnote 9}). Considering agencies as individuals may not be the best choice to carry out our experiments. However, this decision was driven by the necessity of being able to collect large amount of textual data about a single domain for a consequent number of ‘individuals’.

Table 2. Samples with their respective number of individuals.

Full size table

While collecting data from the web, due to the robot protection or other factors, the content of some websites could not be retrieved. Therefore, we excluded these police forces from our samples.

After having retrieved the data, we preprocessed it. We cleaned the textual data and kept well formed sentences with a length between 40 and 500 characters. We removed police forces having less than 10,000 sentences left. This threshold was used to separate the individuals which possess too few data. The Table 3 ^{Footnote 10} provides updated information about our samples.

Table 3. Samples with the final number of individuals as well as the minimum, average and maximum number of sentences.

Full size table

Then, we quantified the nominals and extracted the lexico-semantic relations for each individual. For each sample, the nominals were ranked given their averaging position. We kept arbitrarily the top 1000 nominals and filtered accordingly the hypernym/hyponym relation candidates in order to create the various domain conceptualizations. At this point, we were able to produce cultural representations for the Australian, American and English police forces.

5.2 Evaluation

The evaluation of our experimental results was achieved by relying on a semi-automatically constituted gold standard. Three gold standards were constituted with labeled lexico-semantic relations, one for each sample. Because every police forces belong to the westerner culture, we were able to use WordNet [46], which possesses a similar cultural bias, to obtain automatically assessments on the elicited lexico-semantic relations. Then, we reviewed these relations to ensure their correctness as well as to validate contextual relations. Contextual relations are often considered as wrong [8], but for us they are relevant manifestations of cultural features, thus they were kept. For instance, we validated the relation (issue, hypernym, crime) or (partner, hypernym, school) because crime is an issue for police forces and English ones have often partnerships with schools. The raw results for each sample are given in Table 4. It has to be understood that they are not based on consensus, thus not representative of cultural representations. They are produced with the mixed elicited knowledge of every individuals.

Table 4. Raw results for each sample.

Full size table

The precision of the extraction of lexico-semantic relation candidates is known to be relatively low. For the hypernym/hyponym relations, Cederberg and Widdows reported 40% [43], Maynard et al. 48.5% [6] and Hearst 52% [47]. Whereas, our raw results show an average precision of 30%. According to Cederberg and Widdows, the discrepancy in precision is mainly due to the difference of quality between the datasets. In fact, Hearst use Grolier’s Encyclopedia, Maynard et al. use Wikipedia and themselves the British National Corpus. In contrast, we are using sources of poorer quality as our data came directly from website pages. We believe that it can explain our lower initial precision.

We observed the potential cultural representations by varying the number of agreements increasingly. We expected that highly consensual representations have higher precision but a lower relation coverage compared to mixed ones. Our hypothesis was that to obtain the best cultural representations, it is necessary to manage properly this trade-off between precision and loss. We computed the loss as follow: \(loss(n)\,=\,(v_1 - v_n)/v_1\), with n the minimum number of agreements (\(n\,>\,1\)) and v the number of valid relations remaining. The new results are provided in Table 5.

Table 5. Loss and precision for each sample – Australian Police Forces (A.P.F.), American State Police Forces (A.S.P.F.) and English Police Forces (E.P.F.) – according to the number of agreements (N.A.).

Full size table

Our first observation concerns the cultural dimension of our study. To produce cultural representations based on consensually shared knowledge, a weak agreement of at least half the sample is expected. Obtaining such a number in our experiment leads to cultural representations with a loss of 98% to 99% for a 100% precision. Such representations would have too few relations to be directly usable.

The second observation is that to obtain a satisfying precision (superior or equals to 90%), the loss is again too important: 98% for the Australian Police Forces, 97% for the American State Police Forces and 98% for the English Police Forces. The best trade-off is around 77% loss for 63% precision.

Our third observation is related to the practical aspect regarding the time required to produce cultural representations. To carry out the whole experiment, it took one full day using a normal laptop (by multi-threading it on a quad core computer with 16 Gb rams). Using industrial means for production would shorten the necessary time in terms of minutes, thus leading to highly time-affordable representations. The problem is that based on the trade-off, reviewing the cultural representations for corrections will take hours or days.

Based on these observations, we conclude that the main problem is the high loss. The loss could be explained by three factors. The first one concerns the high number of relations specific to individuals such as (partner, hypernym, northumbria police), but it does not constitute a problem as we are not interested by those. The second factor corresponds to the cultural domain. Many extracted relations are related but do not strictly belong to the public safety domain like (resource, hypernym, goods). Similarly to the first factor, this loss does not matter. The third factor concerns the scarcity of the syntactic patterns enabling the extraction of the lexico-semantic relations. Their low recall has for consequence that the discovery of a relation in a corpora is related to luck. This last factor is truly problematic.

This issue is directly linked to the knowledge elicitation technique used in our study. Indeed, lexico-semantic relation extraction relying on syntactic patterns cannot provide the quantity nor the quality required to support properly our process to produce cultural representations. In fact, no existing hypernym/hyponym relation mining technique using large corpora might be able to achieve this task. So we were expecting those results.

Nevertheless, with few efforts we were able to produce a relevant partial cultural ontology for the English Police Forces composed of 131 hypernym/hyponym relations. We used Gephi^{Footnote 11} to visualize the end result.

On Fig. 2 we focused on the concept ‘crime’. We observe common hypernym/hyponym relations as well as an interesting contextual relation between ‘hate crime’ and ‘issue’. Such relations are really meaningful in a cultural context. In fact, the focus on hate crimes by English police forces comes from the enforcement of the Equality Act 2010^{Footnote 12}. It also becomes obvious that many relations are missing, but we believe that this representation provides a coherent foundation to support further improvements.

6 Conclusion

We have to remind that our goal was to build time-affordable, emic, conceptually-sound and machine-readable cultural representations. We introduced a methodology coming from Cognitive Anthropology to build emic-based cultural conceptualisations. In addition, we explained their formalisation through Ontology Engineering. Then, we presented a process to produce mostly automatically the representations. Using lexico-semantic relation extraction, the best we can obtain with this technique are representations consensually-limited, incomplete and containing some errors. However in the future, by using higher quality elicitation techniques, these problems could be solved.

Up to day, culturally-intelligent systems are developed using etic-based cultural representations. While facilitating cross-cultural mediation, these coarse-grain representations are not fitted for the development of systems requiring a deep understanding of cultural aspects [5]. We believe that the production of fine-grain cultural ontologies, obtained through an emic approach, is a first step for the development of a new generation of artificial cultural awareness supporting these systems.

Notes

1.
https://www.irit.fr/dynamo/.
2.
Available at: http://protege.stanford.edu/.
3.
http://www.httrack.com/.
4.
https://tika.apache.org/.
5.
https://opennlp.apache.org/.
6.
http://stanfordnlp.github.io/CoreNLP/.
7.
https://gate.ac.uk/.
8.
https://gate.ac.uk/sale/tao/splitch8.html.
9.
The ‘Appendix A’ provides the detailed list of individuals we had at the initial stage of the process and their respective sample.
10.
Details about the police forces remaining and the associated number of sentences are available in ‘Appendix B’.
11.
https://gephi.org/.
12.
http://www.legislation.gov.uk/ukpga/2010/15/contents.

References

Blanchard, E.G., Mizoguchi, R., Lajoie, S.P.: Structuring the cultural domain with an upper ontology of culture. In: The Handbook of Research on Culturally-Aware Information Technology: Perspectives and Models, pp. 179–212 (2010)
Google Scholar
Rehm, M., Nakano, Y., André, E., Nishida, T., Bee, N., Endrass, B., Wissner, M., Lipi, A.A., Huang, H.-H.: From observation to simulation: generating culture-specific behavior for interactive systems. AI Soc. 24(3), 267–280 (2009)
Article Google Scholar
Reinecke, K., Bernstein, A.: Tell me where You’ve lived, and I’ll tell You what You like: adapting interfaces to cultural preferences. In: Houben, G.-J., McCalla, G., Pianesi, F., Zancanaro, M. (eds.) UMAP 2009. LNCS, vol. 5535, pp. 185–196. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02247-0_19
Chapter Google Scholar
Marcus, A., Gould, E.W.: Crosscurrents: cultural dimensions and global web user-interface design. Interactions 7(4), 32–46 (2000)
Article Google Scholar
Mohammed, P., Mohan, P.: Breakthroughs and challenges in culturally-aware technology enhanced learning. In: Proceedings of Workshop on Culturally-aware Technology Enhanced Learning in Conjuction with EC-TEL 2013, Paphos, Cyprus, 17 September 2013
Google Scholar
Maynard, D., Funk, A., Peters, W.: Sprat: a tool for automatic semantic pattern-based ontology population. In: International Conference for Digital Libraries and the Semantic Web, Trento, Italy (2009)
Google Scholar
Sellami, Z., Camps, V., Aussenac-Gilles, N.: DYNAMO-MAS: a multi-agent system for ontology evolution from text. J. Data Semant. 2(2–3), 145–161 (2013)
Article Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational linguistics, vol. 2, pp. 539–545. Association for Computational Linguistics (1992)
Google Scholar
Goodenough, W.H.: Report at the 7th Annual Round Table Meeting on Linguistics and Language Study (1957)
Google Scholar
Goodenough, W.H.: Culture, Language, and Society. Benjamin-Cummings Pub Co, Menlo Park (1981)
Google Scholar
d’Andrade, R.: A folk model of the mind (1987)
Google Scholar
Corsaro, W.A., Heise, D.R.: Event structure models from ethnographic data. In: Sociological Methodology, pp. 1–57 (1990)
Google Scholar
Stone-Jovicich, S., Lynam, T., Leitch, A., Jones, N.: Using consensus analysis to assess mental models about water use and management in the Crocodile River catchment, South Africa. Ecol. Soc. 16(1) (2011)
Google Scholar
Mathevet, R., Etienne, M., Lynam, T., Calvet, C.: Water management in the Camargue Biosphere Reserve: insights from comparative mental models analysis. Ecol. Soc. 16(1) (2011)
Google Scholar
Bennardo, G., De Munck, V.C.: Cultural Models: Genesis, Methods, and Experiences. Oxford University Press, Oxford (2014)
Google Scholar
Vuillot, C., Coron, N., Calatayud, F., Sirami, C., Mathevet, R., Gibon, A.: Ways of farming and ways of thinking: do farmers’ mental models of the landscape relate to their land management practices? Ecol. Soc. 21(1), 1–23 (2016)
Article Google Scholar
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, New York (1994)
Book MATH Google Scholar
Polanyi, M.: Personal Knowledge: Towards a Post-critical Philosophy. University of Chicago Press, Chicago (1958)
Google Scholar
Alexander, P.A., Schallert, D.L., Hare, V.C.: Coming to terms: how researchers in learning and literacy talk about knowledge. Rev. Educ. Res. 61(3), 315–343 (1991)
Article Google Scholar
Jones, N., Ross, H., Lynam, T., Perez, P., Leitch, A.: Mental models: an interdisciplinary synthesis of theory and methods. Ecol. Soc. 16(1) (2011)
Google Scholar
Khoo, C.S., Na, J.-C.: Semantic relations in information science (2006)
Google Scholar
Wierzbicka, A.: English: Meaning and Culture. Oxford University Press, Oxford (2006)
Book Google Scholar
Kempton, W., Boster, J.S., Hartley, J.A.: Environmental Values in American Culture. MIT Press, Cambridge (1996)
Google Scholar
Oravecz, Z., Vandekerckhove, J., Batchelder, W.H.: Bayesian cultural consensus theory. Field Methods 26(3), 207–222 (2014)
Article Google Scholar
Batchelder, W.H., Romney, A.K.: Test theory without an answer key. Psychometrika 53(1), 71–92 (1988)
Article MathSciNet MATH Google Scholar
Romney, A.K., Weller, S.C., Batchelder, W.H.: Culture as consensus: a theory of culture and informant accuracy. Am. Anthropol. 88(2), 313–338 (1986)
Article Google Scholar
Romney, A.K., Batchelder, W.H., Weller, S.C.: Recent applications of cultural consensus theory. Am. Behav. Sci. 31(2), 163–177 (1987)
Article Google Scholar
Weller, S.C.: Cultural consensus theory: applications and frequently asked questions. Field Methods 19(4), 339–368 (2007)
Article Google Scholar
Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing? Int. J. Hum. Comput. Stud. 43(5–6), 907–928 (1995)
Article Google Scholar
Borst, W.N.: Construction of engineering ontologies for knowledge sharing and reuse. Universiteit Twente (1997)
Google Scholar
Gòmez-Pérez, A., Benjamins, R.: Overview of knowledge sharing and reuse components: ontologies and problem-solving methods. In: CEUR Workshop Proceedings. IJCAI and the Scandinavian AI Societies (1999)
Google Scholar
Fernàndez-Lòpez, M., Gòmez-Pérez, A., Juristo, N.: Methontology: from ontological art towards ontological engineering (1997)
Google Scholar
Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: principles and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998)
Article MATH Google Scholar
Pennacchiotti, M., Pantel, P.: Ontologizing semantic relations. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 793–800. Association for Computational Linguistics (2006)
Google Scholar
Mathieu, J.E., Rapp, T.L., Maynard, M.T., Mangos, P.M.: Interactive effects of team and task shared mental models as related to air traffic controllers’ collective efficacy and effectiveness. Hum. Perform. 23(1), 22–40 (2009)
Article Google Scholar
Young, J.C.: A model of illness treatment decisions in a Tarascan town. Am. Ethnologist 7(1), 106–131 (1980)
Article Google Scholar
Carley, K., Palmquist, M.: Extracting, representing, and analyzing mental models. Soc. Forces 70(3), 601–636 (1992)
Article Google Scholar
Shuyo, N.: Language detection library for Java, vol. 7 (2010). Accessed July 2016
Google Scholar
Girju, R., Moldovan, D.I., et al.: Text mining for causal relations. In: FLAIRS Conference, pp. 360–364 (2002)
Google Scholar
Girju, R., Badulescu, A., Moldovan, D.: Learning semantic constraints for the automatic discovery of part-whole relations. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 1–8. Association for Computational Linguistics (2003)
Google Scholar
Girju, R., Badulescu, A., Moldovan, D.: Automatic discovery of part-whole relations. Comput. Linguist. 32(1), 83–135 (2006)
Google Scholar
Caraballo, S.: Automatic acquisition of a hypernym-labeled noun hierarchy from text. Brown University, Ph.D. thesis (2001)
Google Scholar
Cederberg, S., Widdows, D.: Using lsa and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 111–118. Association for Computational Linguistics (2003)
Google Scholar
Pantel, P., Ravichandran, D.: Automatically labeling semantic classes. In: HLT-NAACL, vol. 4, pp. 321–328 (2004)
Google Scholar
Wang, T., Li, Y., Bontcheva, K., Cunningham, H., Wang, J.: Automatic extraction of hierarchical relations from text. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 215–229. Springer, Heidelberg (2006). doi:10.1007/11762256_18
Chapter Google Scholar
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Hearst, M.: Wordnet: an electronic lexical database and some of its applications (1998)
Google Scholar

Download references

Acknowledgement

We want to give a special thank to Eunika Laurent-Mercier, Associated Researcher in the Research Centre Magellan, University Jean Moulin, Lyon 3 for her support and advices.

Author information

Authors and Affiliations

Capgemini Technology Services, 7 Rue Frederic Clavel, 92287, Suresnes, France
Jean Petit
MODECO Team, CReSTIC Laboratory (EA 3804), University of Reims Champagne-Ardenne, Reims, France
Jean Petit & Francis Rousseaux
CASH Team, CReSTIC Laboratory (EA 3804), University of Reims Champagne-Ardenne, Reims, France
Jean-Charles Boisson

Authors

Jean Petit
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Charles Boisson
View author publications
You can also search for this author in PubMed Google Scholar
Francis Rousseaux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean Petit .

Editor information

Editors and Affiliations

Jean Moulin University Lyon 3, Lyon, France
Eunika Mercier-Laurent
Jean Moulin University Lyon 3, Lyon, France
Danielle Boulanger

Appendices

A Samples with Their Respective Individuals

Sample	Individuals
Australian Police Forces	New South Wales Police Force, Northern Territory Police, Queensland Police, South Australia Police, Tasmania Police, Western Australia Police, Victoria Police
American State Police Forces	Arkansas State Police, Connecticut State Police, Delaware State Police, Idaho State Police, Illinois State Police, Indiana State Police, Kentucky State Police, Louisiana State Police, Maine State Police, Maryland State Police, Massachusetts State Police, Michigan State Police, New Hampshire State Police, New Jersey State Police, New Mexico State Police, New York State Police, Oregon State Police, Pennsylvania State Police, Rhode Island State Police, Vermont State Police, Virginia State Police
English Police Forces	Avon and Somerset Constabulary, Bedfordshire Police, Cleveland Police, Dorset Police, Essex Police, Greater Manchester Police, Hampshire Constabulary, Hertfordshire Constabulary, Lincolnshire Police, Nottinghamshire Police, Staffordshire Police, Suffolk Constabulary, Surrey Police, Sussex Police, Thames Valley Police, West Mercia Police, West Yorkshire Police, Wiltshire Police, Cambridgeshire Constabulary, Cheshire Constabulary, Cumbria Constabulary, Derbyshire Constabulary, City of London Police, Devon and Cornwall Police, Durham Constabulary, Gloucestershire Constabulary, Humberside Police, Kent Police, Lancashire Constabulary, Leicestershire Police, Merseyside Police, Metropolitan Police Service, Norfolk Constabulary, North Yorkshire Police, Northamptonshire Police, Northumbria Police, South Yorkshire Police, Warwickshire Police, West Midlands Police

B Individuals with Their Number of Valid Sentences

Individual	Number of sentences
New South Wales Police Force	66,362
Northern Territory Police	47,196
South Australia Police	28,303
Western Australia Police	58,872
Victoria Police	57,583
Connecticut State Police	81,755
Idaho State Police	50,163
Illinois State Police	111,931
Indiana State Police	37,341
Louisiana State Police	21,256
Massachusetts State Police	174,314
New Jersey State Police	135,880
Oregon State Police	40,339
Rhode Island State Police	29,214
Virginia State Police	79,426
Bedfordshire Police	42,942
Dorset Police	37,022
Essex Police	22,181
Greater Manchester Police	89,152
Hampshire Constabulary	41,300
Hertfordshire Constabulary	89,825
Lincolnshire Police	19,611
Nottinghamshire Police	75,610
Staffordshire Police	23,721
Thames Valley Police	65,234
West Mercia Police	45,964
Wiltshire Police	44,409
Cambridgeshire Constabulary	19,852
Cumbria Constabulary	16,572
Humberside Police	17,197
Lancashire Constabulary	12,504
Northumbria Police	60,931
South Yorkshire Police	35,128
West Midlands Police	10,499

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petit, J., Boisson, JC., Rousseaux, F. (2016). Building Time-Affordable Cultural Ontologies Using an Emic Approach. In: Mercier-Laurent, E., Boulanger, D. (eds) Artificial Intelligence for Knowledge Management. AI4KM 2015. IFIP Advances in Information and Communication Technology, vol 497. Springer, Cham. https://doi.org/10.1007/978-3-319-55970-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-55970-4_8
Published: 31 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55969-8
Online ISBN: 978-3-319-55970-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Abstract

1 Introduction

2 Emic-Based Cultural Knowledge Representations

2.1 Ethnographic Sampling

2.2 Individuals’ Personal Knowledge Elicitation

2.3 Cultural Consensus Analysis

3 Formal Cultural Knowledge Representations

3.1 Ontologies

3.2 METHONTOLOGY

4 Building Time-Affordable, Formal and Emic-Based Cultural Representations

4.1 Selecting Individuals Based on Shared Social Criteria

4.2 Eliciting Automatically Individuals’ Knowledge from Texts

4.2.1 Collecting Web Data

4.2.2 Textual Data Analysis

4.3 Aggregating Concepts and Lexico-Semantic Relations

4.4 Ontologizing Concepts and Lexico-Semantic Relations

5 Experiments

5.1 Settings

5.2 Evaluation

6 Conclusion

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Samples with Their Respective Individuals

B Individuals with Their Number of Valid Sentences

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation