Quantify: An Information Fusion Model Based on Syntactic and Semantic Analysis and Quality Assessments to Enhance Situation Awareness

Botega, Leonardo Castro; Oliveira, Allan Cesar Moreira de; Pereira Junior, Valdir Amancio; Saran, Jordan Ferreira; Ladeira, Lucas Zanco; Cáceres Pereira, Gustavo Marttos; Isotani, Seiji

doi:10.1007/978-3-030-03643-0_23

Quantify: An Information Fusion Model Based on Syntactic and Semantic Analysis and Quality Assessments to Enhance Situation Awareness

Leonardo Castro Botega⁴,
Allan Cesar Moreira de Oliveira⁵,
Valdir Amancio Pereira Junior⁴,
Jordan Ferreira Saran⁵,
Lucas Zanco Ladeira⁶,
Gustavo Marttos Cáceres Pereira⁴ &
…
Seiji Isotani⁷

Chapter
First Online: 02 April 2019

1266 Accesses

Part of the book series: Information Fusion and Data Science ((IFDS))

Abstract

Situation awareness is a concept especially important in the area of criminal data analysis and refers to the level of consciousness that an individual or team has about a situation, in this case a criminal event. Being unaware of crime situations can cause decision-makers to fail, affecting resource allocation for crime mitigation and jeopardizing human safety and their patrimony. Data and information fusion present opportunities to enrich the knowledge about crime situations by integrating heterogeneous and synergistic data from different sources. However, the problem is complicated by poor quality of information, especially when humans are the main sources of data. Motivated by the challenges in analyzing complex crime data and by the limitations of the state of the art on critical situation assessment approaches, this chapter presents Quantify, a new information fusion model. Its main contribution is the use of the information quality management throughout syntactic and semantic fusion routines to parameterize and to guide the work of humans and systems. To validate the new features of the model, a case study with real crime data was conducted. Crime reports were submitted to the modules of the model and had situations depicted and represented by an Emergency Situation Assessment System. Results highlighted the limitations of using only lexical and syntactical variations to support data and information fusion and the demand and benefits provided by quality and semantic means to assess crime situations.

Download chapter PDF

1 Introduction

Situation awareness (SAW) is an important cognitive process of decision-makers in several critical areas. It concerns the perception of the presence and nature of the entities of interest in the environment, the understanding of their meaning and the importance of their individual and collective actions, and the projection of their status in the near future.

In the field of emergency management, SAW is a crucial factor for the success of operations involving humans. A limited SAW can compromise operators’ understanding of what is happening and lead to a poor decision-making, which can result in disastrous consequences for people, property, or the environment.

Human operators that are aware of an emergency can not only characterize entities, events, and their relationships but also reveal trends and the existence of threats and infer the increase or decrease of imminent risks.

Although SAW cannot guarantee better decision quality, its improvement can help operators to maintain a superior knowledge of current events and situations. Operators of emergency services can be routinely subjected to information overload, especially because of the inherent need to perform multiple tasks. Supporting situation awareness of the operators is a challenge fundamental to the effectiveness of their activities.

Supporting SAW is even more challenging when data are provided by humans, as is the case when a report of a crime is offered by a victim or a witness. Typically, such data can be incomplete, outdated, inconsistent, and sometimes even irrelevant to the associated event. In addition, that data can also be influenced by human factors such as stress, fear, and cultural particularities. The presence of low-quality data and information influences the computational methods that use this human report as input to infer useful information to support operators in developing SAW.

To overcome this problem, information fusion (IF) processes have been designed to guide the development of systems. They comprise acquisition, inference, evaluation, and representation phases of high-level situational information. These systems typically use multiple heterogeneous data sources and computational intelligence to support environmental changes and help operators to develop SAW [11, 13, 14, 17, 18].

However, determining how a semiautomated situation analysis activity can be structured to better amplify operator’s SAW is still a challenging issue for the fusion community, especially in the field of emergency management. In addition, there is still difficulty in dealing with quality problems inherent to human produced data [5, 6, 23, 25, 28].

Lack of knowledge about the quality of data and information propagated by fusion processes (such as information assessment process, inference process, information recovery process, or simply process) can also lead operators to uncertainties and errors in SAW, thereby degrading decision-making. The visual representation of situational information should consider data and information quality indexes to better support human observations and interpretation of a situation.

In this context, data fusion systems dedicated to supporting SAW can be augmented by information quality management processes, benefiting both the automated data assessment routines and the human understanding on crime situations. These assessment processes can be oriented and parameterized by quality indexed information. Quality indexes can also help operators by increasing their confidence in situational information and stimulate a proactive interaction with the system.

Furthermore, automated systems are great at processing a large amount of data but may fail on determining connections and meaning of data. Hence, quality assessment can support automated or machine-human processes to better complement each other, sharing objectives and contributing to the construction of situational knowledge. Thus, SAW can be better and more quickly acquired, maintained, and even reacquired [4, 6, 19]. Consequently, quality-aware information fusion systems (IFS) must present capabilities and processes to reveal, process, represent, and mitigate information limitations.

Literature presents data and information fusion models that explicitly describe the role of the human operator in semiautomatic approaches, typically originating from the JDL (Joint Directors of Laboratories), DFIG (Data Fusion Information Group), and User-Centered Information Fusion models. In these models humans are solely consumers of information or are active participants in managing and transforming information [3, 6]. These models are limited on presenting solutions for the problems associated with the human’s deeper involvement in the process of transforming information to enrich SAW in IFS. More recent approaches present opportunities for human interaction throughout each level of fusion [1, 15, 19].

However, there are not many records of human-system collaboration supported by information quality in scenarios where time is a critical factor. In addition, known approaches are limited to providing refinements in a reactive fashion to the final product of the process [6].

The goal of this project is to present the Quality-aware Human-driven Information Fusion Model (Quantify) that aims to contribute to the improvement of the SAW of human operators. This model can be used by an emergency situational assessment system, dealing with scenarios that are complex, changeable, and dynamic, in which information is constantly evaluated and parametrized by several variables.

In addition, another objective of this chapter is to present details on how Quantify employs a combination of syntactic and semantic methods for a hierarchical and multicriteria integration of information. This chapter also shows how this model deals with fusion of semantic information using information quality criteria. These advances aim to increase the inference capacity of complex information and to contribute to the SAW of analysts.

The pillars of the Quantify model are the processes for continuous assessment of data and information quality for the orientation of the IF process and the means for the semantic analysis of qualified situational information.

Specifically, through the Quantify model, this chapter aims to demonstrate:

How the information quality management (inference, representation, and mitigation) can be beneficial in global and local contexts, at either low or high fusion levels, and contribute to SAW;
How to ensure the propagation and usefulness of qualified information produced by humans, up to the highest levels of abstraction, useful for situation assessment (SA);
How integrated syntactical and semantic analysis can contribute to empowering the inference capabilities of the fusion process;
How to deal the information as linked knowledge about crime situations that gets enriched over time, produced by humans and machines, and its connections with the other steps of the fusion process.

To demonstrate in practice the use of Quantify, this chapter will end with a study case of real-time crime assessment.

2 The Incorporation of Data and Information Quality in the Fusion Process

Mapping complex entities, such as humans and their interactions with the real world, is a challenging process. The dynamic and complex nature of interactions between people, objects, and places demands the use of comprehensives computational techniques to reveal their states over time.

The process that fusion systems use to understand human interactions starts with the search and determination of which entities are present in a real scenario. Next, the states of the entities are determined, formed by their physical characteristics, position, orientation, and other data relevant to the domain. Finally, fusion systems establish the possible relationships among entities, relating each entity context and state to one another. These relationships may help humans and systems to understand situations.

Achieving SAW is a challenging process that SA systems seek to support. With the advent of SAW models, especially the Endsley model [12, 13], new IF models, architectures, and processes emerged aiming to support the development of SA systems.

In this context, the knowledge of the quality of information by the IFS emerges as a complementary resource to the process of inference of situations. Also, IFS act as technology to support the existing models and processes for the acquisition and maintenance of SAW.

The benefits of knowing the quality of information in an IF process at each of the possible levels of inference of the JDL model are the reliability of data sources and the effectiveness of data preparation algorithms (Level 0 ), the completeness and accuracy of the identification of objects (Level 1), the integrity of a relation between objects (Level 2), the assertiveness of a projection (Level 3), and the graphical representation of information (Level 5). Knowledge of information quality can not only support decisions at each level but also influence and parameterize internal inference routines.

In addition to contributing to the operationalization of the internal mechanisms of data fusion levels, knowledge about the quality of data and information can also contribute to the relationships between the levels of fusion, i.e., to help determine and direct information by virtue of desired outputs and inputs for each level. Quality indexes operate as a lever to determine the usefulness of information throughout the process from one level to another. This routine further contributes to fill the gap between low-level and high-level IF inferences [2, 4].

Among the challenges of incorporating data and information quality into a fusion process, we can highlight the role of information quality and the management of information dynamics for defining information quality. Regarding the first challenge, it is known that computerized processes of an IFS (e.g., mining, integration, and correlation) infer new information on a distributed, asynchronous, and dynamic fashion. To collaborate with each other, these processes must have a mechanism that qualifies each new data or information produced with a quality indicator (quality metadata). Thus, the parameterization of the process gains a new variable (different from attributes or objects) that must be considered every time a new fusion process is performed. This routine contributes to quality information reaching the upper levels of the process.

In addition, by helping to parameterize automation, the second challenge is to properly represent information quality and stimulate interactions of operators on specialized user interfaces and visualizations dedicated to SAW. To visually stimulate the perception of operators in the search for patterns and relationships, it is necessary to use cues or suggestions that qualify the information. These signals help to justify human behavior and explain why information is accepted or not and, consequently, can help guide operators to improve the quality of information through a continuous refinement process.

3 Quality-Aware Human-Driven Information Fusion Model

To overcome some of the information quality assessment challenges, this paper proposes a fusion model called Quantify (Quality-aware Human-driven Information Fusion Model) (Fig. 23.1) [9]. The major differentials of this model are the combination of syntactic and semantic approaches to assess data and the use of information quality management throughout the fusion process.

Quantify consists of six internal processes: data acquisition, data and information quality assessment, object assessment, situation assessment, information representation, and user interfaces (UI).

This model has the goal to orient the development of IF systems, dedicated to supporting the assessment of situations that occur in complex real-time scenarios, especially when it is hard to acquire reliable information. These complex scenarios comprise highly complex entities that interact and relate to each other to form situations, which evolve in time and space [3, 6].

Among the main features of the model are mechanisms designed to:

Manage data and information quality (infer, represent, and mitigate) in local and global contexts of the IF process at low and high levels of abstraction;
Support operators in improving their perception and understanding of the situation and in orienting and refining information;
Parameterize automated processes of the IF routine using qualified situational information in syntactic and semantic fusion processes.

This model also comprises:

A complete process of SA over complex real-time scenarios, with internal processes of acquisition, processing, representation, and refinement of situations that have human-provided data produced by heterogeneous sources;
A cyclic, iterative, and interactive operation, which allows operators to accompany changes in the situation;
A set of mechanisms to manage data and information quality to asses each new information inferred, to enrich situational representation, to parameterize processes, and to orient operator in the refinement task;
A semantic fusion approach that uses semantic models (ontologies) with associated data quality to improve the finding of synergistic information in human provided data;

In the next sections, we will describe the Quantify model, as well as its internal processes in detail.

3.1 Data Acquisition

In complex scenarios, there are multiple types of data available, such as audios, text messages from social networks, records from historical databases, camera images, and information from diverse subsystems. Each application has sources and input data that may be used to perform the assessment of a situation.

Therefore, the internal process of human-provided data acquisition is responsible for collecting information generated by humans and making it available for the use of other internal mechanisms of Quantify. The result of this process is the identification and classification of objects, attributes, and preliminary situations, according to an application domain. To achieve this objective, this process is structured in three stages, namely, (1) obtain sentences, (2) grammatical analysis of sentences, and (3) search and identification of relevant objects. To perform the stage obtain sentences, natural language processing (NLP) techniques are used to transcribe the audio and to format it in a string structure. This step can be accomplished with a tool like the one provided by Google [10, 26].

In the emergency management domain, data from social networks, like Twitter, can also be used through its public API. Posts that report a situation are searched based on the objects previously identified by NLP. Once data has been captured, transcribed and stored in a structured way, it can then be sent for a sentence analysis, which is performed to identify patterns and logical sequences of characters and words [21, 27].

At the grammatical analysis of sentences step, the input text must be analyzed in real time by a grammar checking tool, such as CoGrOO [16, 26]. Thus, it makes it possible to add labels such as nouns, number, object, or any other classification. It is also possible to connect the sentences obtained in the input text.

Each object found is classified, along with its attributes, by using keywords. These keywords have already been defined through the analysis of several sentences and also as a product of systems’ requirements [20, 23].

The search and identification of relevant objects processes elements defined as important in fulfilling the requirements. During the process of defining these requirements, meaningful words are defined in an account (report), generating lists of words classified in different categories, such as tagCor (tag for color) and tagTypePhysical (tag for physical type) [16, 24].

In this way whenever a word from any of these categories is found, new analyses are performed of the following words of the input text data, seeking additional meanings such as status, situation, and even quality of objects, people, or situations. While analyzing the result of the classification of a word, it is possible to infer what type of information it represents, such as addresses, names, etc. To determine the possibility of a next word, several block words are analyzed and compared to a glossary.

At the end, the identified objects and the first situations are encapsulated in an object model (e.g., JavaScript Object Notation – JSON) and submitted to the next internal process to assess the information according to quality dimensions and metrics.

3.2 Data and Information Quality Assessment

The data and information quality assessment internal process aims at qualifying the situational information by quantifying quality dimensions for the guidance and parameterization of the fusion process as a whole, so that other processes can use the qualified information [8].

The quality assessment is applied to raw data and also to the situational information after it has been formed and represented as linked relations (situations formed by pieces of information).

The dimensions assumed to perform the assessment of human provided crime data are:

Timeliness (considering how fresh the data is);
Completeness (the percentage of attributes and objects a situation has);
Temporal completeness (how complete is time-referring data);
Consistency (the alignment of new processed data with situational information);
Relevance (the new data is useful to the current situational data);
Syntactic precision (the data are within an acceptable threshold of syntactic variation);
(Un)certainty (trust of the system in the information)

This process also relies on the “Methodology for Data and Information Quality Assessment in the Context of Emergency Situational Awareness” (IQESA), described in Botega et al. to assess and evaluate data and information quality [8].

The IQESA methodology is performed in three steps: elicitation of information quality requirements, in which quality requirements are determined for a specific domain by interviews and a goal-driven task analysis (GDTA) [9, 14, 24]; definition and use of functions and metrics to assess quality dimensions, in which metrics are defined and applied to infer quality indexes; and finally the representation of situational information, in which data and the associated quality are represented in the form of semantic models (ontologies).

3.3 Object and Situation Assessment: Information Fusion Using Information Quality

3.3.1 Syntactic Information Fusion

The process of syntactic information fusion can be abstracted in two main stages, each with its internal mechanisms that play specific roles in improving the representativeness of information. The stages are search for synergistic information and the multicriteria association.

After acquisition and quality assessment, a new object is produced. This resulting object corresponds to what we know as L1 data fusion results, comprised by objects and attributes found with quality indexes attributed to each object [22, 23].

The decoded object and its attributes feed the creation of a preliminary ontology (Fig. 23.2), which represents an initial situation with its current classes and object instances (semantic data associated with ontology classes). This situation has people, objects, and places, each with their respective attributes, with indications of common activities between them (relationship properties), defined at acquisition time.

The internal process of “information representation” is responsible for that task and directly connected to all other processes that use its output and provide information to it. They provide/use information asynchronously, distributed and on a dynamic fashion. The resulting data is also transported to/from the situational knowledge frequently.

This instantiated ontology composes the input for the IF phase. Among the input parameters are the objects identified in the previous phases; the type of data source, which properties must be present; and even a quality threshold of information.

Once the fusion process is started, search for synergistic information is performed between classes that are already present in the current ontology and that can hold information of objects, attributes, properties, and quality indexes that have some kind of correspondence.

After a search in information already represented in the ontology, a new search is made for more information that has not yet been considered in the process, from the same source or from other data sources, and which have already been submitted to the “data acquisition”. This routine is designed to obtain new information about the associated objects at any given time, validating and giving greater consistency to the already defined information. The entry to this process is either an isolated information (Level 1 JDL) or an relationship between objects (Level 2 JDL), and the result of this step may be a new object or a new situation.

This process can be implemented by data mining techniques, for example, by the Apriori algorithm [7, 9, 16], which infers the frequency of the presence of certain information when analyzed in relation to the rest of the available input data (from current or other sources). This inference is made by considering an information support formula (covariance).

The results of this process are new objects, attributes, quality indexes, and properties that may complement the current information held by the ontology.

The next stage, the multicriteria association, computes the synergistic information using predefined criteria for quality indexes and semantic properties. The results are insertions of new information into the ontology, satisfying the similarity found in the context of the original information and satisfying the multicriteria process [9, 24].

Two criteria are currently suggested, one is the input from the operator during execution time and the other based on the information and knowledge obtained through the analysis of requirements by developers, before the operation of the system, and that automatically affects the algorithms of this process. As a result of the automated part, all the initial information that is submitted to the fusion process is analyzed for synergy (have their syntactic or semantic similarities checked). The results are the discovery of new attributes, properties, and even new objects, in a combined and hierarchical way, resulting in new situational information. This result can be resubmitted to the previous synergistic data search process, increasing the process’s ability to find new information and further consolidate the information already found.

Hence, information is increasingly specialized, enriching the current situation with more details and qualified data. The syntactic fusion is cyclical and is performed until the result of the multicriteria association does not reach the requirements previously defined, i.e., until the quality levels are sufficient for the decision-maker.

The resulting information is submitted to the information quality evaluation process, now punctuating the new information found, while also focusing on the indexes of the current situation. After this process, the information reevaluated will be reinstantiated in the ontology.

In the automated fusion process performed after the acquisition and assessment of information quality, the greatest possible number of associations is made between the objects, their attributes, properties, and quality indexes. For such, it is considered the existence of two or more data sets available from the same source or from different sources.

This process, called the primary fusion, employs primarily criteria for automated fusion. These criteria can include either a minimum level of quality or priority of object properties, which is useful to define what should be processed and shown to the operator first. These priorities are defined by the information requirements gathered through questionnaires filled by several specialists of various functions and career time.

In the case of the on-demand fusion by a human operator, the algorithm is activated once again, but the integration options are selected entirely by the operator through the user interface, rather than considering all possible combinations of objects, attributes, and properties identified in step of acquisition. This process of association, now manual, though based on objects and attributes, is strongly supported by indexes of quality and hypotheses employing information related to previously classified objects that were obtained in past cycles or different data sources.

Since this process is performed by the operator utilizing a user interface, the criterion for the data fusion process (e.g., quality indexes or an object characteristic or even a physical property) can be chosen and changed by the same operator, as well as the removal of predefined criteria by the requirement analysis. This capability provides for the flexibility of the structure to receive and process different criteria for a given situation, as well as allows operators to interact with the system based on their experiences and knowledge.

3.3.2 Semantic Information Fusion

Based on the identified objects and attributes inferred by the implementation of the model described in the previous section, a preliminary ontology is instantiated. For example, in the crime domain, this ontology classes represent victims, criminals, stolen things, information quality, and location, each with their respective attributes and relations. The ontology also reveals the existence of semantic properties of the information (meanings), as shown in the example of Fig. 23.2 [24].

This part of Quantify is also responsible for generating input to the information fusion process by considering the semantic aspects, that is, the meaning of certain information according to a context. This technique greatly enhances the power of SA, because instead of analyzing the structure of a word, it seeks to analyze its semantic connections (whether there are common contexts that would have some possibility of building a situation based on such linked information).

Let’s consider two situations transcribed in different ways. Even if the meaning of their objects is the same, they may not have been considered synergistic in the process of syntactic fusion, for example, considering the situation “man flew” and, in another situation, “a guy ran.” They are completely different information from the syntactic point of view, but in a semantic context that considers meaning, this sentence has points of similarity.

The semantic fusion process is performed by an algorithm based on data mining techniques, using the same Apriori technique employed on the syntactic analysis. The result of this semantic search procedure is the same as the ones of syntactic process but grouped into collections with a degree of similarity based on their meaning. In this process, there is a possibility to find terms that do not have associated meaning. Hence, they cannot be assigned to an existing ontology class. In the next process, these collections will be integrated into a new situation in order to contribute to improve quality scores and the real meaning of this information.

Each new information is submitted to the quality assessment layer where quality scores are assigned to it. Further, using expected threshold values, it is decided whether the routine continues or whether the new information should be disregarded.

Finally, information generated in both semantic and syntactic processes are compared based on the quality indexes. Better ranked information is then sent one last time to the quality assessment layer, and if the updated values are not sufficient for the criteria chosen by the expert, the complete process of SA can be redone. After that, they are displayed to the experts to ensure that all processing possibilities have been done.

This option of forwarding the information to the syntactic and semantic assessment layers can be automated or triggered by the human operator, which can start a fusion event via the user interface and demand that each situation to be processed two or more times.

The result selected after several evaluation cycles is sent to the information representation layer, which will be instantiated and considered as the most current situation, as explained in the next section.

3.4 SAW-Driven User Interfaces

The user interface (UI) aims at specifying a sequential activity routine to fully manage the situational information, generated, propagated, and maintained by the Quantify model [9, 20].

To this purpose, it includes a user interface in the Quantify modeling dynamics, not only to represent situational knowledge but also to promote the two-way relationship with the other stages of the process. In addition, it also seeks to specify the process management routines as a whole to promote the necessary refinements to the SAW process.

As previously discussed, situational knowledge was constructed by the “data acquisition” phase, qualified by the “data and information quality assessment” phase under the requirements pertinent to the domain of complex environments, and enriched by “object and situation assessment” and is now visually represented in this phase.

Additionally, the UI also receives from the previous process the metadata that qualifies the situations, enables the use of information visualization techniques, and graphically represents this accumulated and qualified knowledge.

The development of the SAW-oriented interface must follow the design principles of Endsley [12], which are organize information according to objectives, present level 2 of SAW directly, support global SAW and information filtering, support local and global trust verification, represent historical events to follow information evolution, and support uncertainty and quality management.

4 Case Study

4.1 Situational Awareness and the Problem of Crime Analysis in Brazil

Critical SAW-oriented systems, such as risk management and risk assessment systems, require specialized intelligence to provide operators a dynamic understanding of what is going on or what has happened in an environment. In Brazil, criminal record databases, based on unstructured human provided data, have problems related to the quality of information, mainly the reliability of registered addresses as places of crime. These problems are due to the imprecision of the information obtained from the victims, and the lack of prioritization of this data by collectors (civilians or military police), which focuses more on the description of the event than on location data.

In addition, most electronic system recording events allow the completion of the record even without the address of the fact or only with a reference point (e.g., a restaurant, a store, or a public place). This aspect is particularly important for the decision-making process, since the criminal mapping as a data analysis tool for defining public policies has become popular in Brazil. However, by ignoring the poor quality of location information and the absence of data processing routines, criminal information systems are often based on georeferenced maps and do not reflect the actual information about the crime incidents.

In a complex decision-making environment, commanders need a clear, concise, and accurate assessment of the situation and whether there is any risk to people’s lives, patrimony, or environment [28]. To support the production of information useful for supporting and maintaining SAW for solution of problems dealing with criminal behavior and their diverse environmental contexts, the techniques of filtering, mining, and data integration, elements present in IF processes, are critical.

An appropriate SAW-oriented fusion synergistically integrates information into the current situational picture, performs an analysis of the input information, and commits to providing information according to the needs and expectations of the expert.

Supported by semantic fusion and the quality of information, opportunities for improving the parameters (or criteria) for data fusion have become imminent, enhancing the possibilities for effective contributions to the process of SA [2, 5, 6, 17, 28].

4.2 Risk Analysis Through Syntactic and Semantic Perspectives

The case study is based on the situation assessment supporting situation awareness of a real crime situation, more specifically a crime of robbery, reported to the police emergency response service. The reported information was submitted to each step of Quantify and its algorithms. Results are analyzed based on the acquisition, mining, fusion, and representation of relevant information useful for decision-making. The IF consists of two processes of data assessment: syntactic and semantic. Complementing them, there are other processes crucial for implementing IF, such as object identification, information representation, and quality assessment.

SA starts with the identification of entities and objects present in the reports, which is based on NLP techniques, using a rich vocabulary of the language, focused on the emergency management domain, more specifically the analysis of real-time crime data. After the identification of entities and objects, the information, communicated through JSON objects, starts being instantiated in small ontologies (Figs. 23.6, 23.7 and 23.8) that will later compose an entire situation. The information representation module is used here to support the IF processes during execution, by maintaining the most current version of situational information.

Each result is also submitted to the module of quality assessment, which at this time only evaluates the completeness of the identified entities and objects. Data will return to this module each time new inferences or any kind of information transformation is made.

Hence, the execution flow follows the cycle: object identification, representation, quality assessment, and back to the representation module. The following reports were submitted to Quantify, and the results are presented and discussed below, highlighting objects of interest.

Crime Report 1

The victim stopped the vehicle at a semaphore. Then, he was surprised by two individuals in a black motorcycle. The motorcycle passenger hit the glass of the car with a gun. Threatened the victim and subtracted his vehicle. The criminals escaped with the destination location ignored. Victim does not have conditions to describe the authors in detail.

Crime Report 2

Two guys on a dark motorcycle stole a car at the semaphore at the intersection between Mooca and Taquari Streets, pointed a revolver, and took the woman’s car. One of them was in a red coat.

Crime Report 3

Two men robbed a gray car in the street from Mooca to the side of Santander Bank. The men were on a black motorcycle, one of them was in a blue jacket and jeans. They left in the direction of Hospital Villa-Lobos.

4.3 Syntactic Analysis

The syntactic fusion process starts by searching for synergistic information in data sources and previously processed situations, looking for a similarity in their syntax. For instance, the contents of the reports are analyzed by making Boolean comparisons if one word is equal to another, considering some word variations such as tension, gender, and radical.

After the synergistic search among the reports, those that present a sufficient level of synergy are grouped in data sets. Analyzing the three reports used in this case, we can note only some terms that satisfy this syntactic fusion condition, such as “two,” “black,” “Mooca,” and “motorcycle.” However, these terms, even if they are covariant, do not express any rich or explicit meaning enough to carry out a fusion of information between the reports.

Finally, these terms go through the multicriteria association, which considers not only synergy but also quality index assigned to data and inferred information. This activity associates new collected information to the current situation, so that the final product is a single situation, as complete and detailed as possible.

The result is sent again to the quality assessment layer to be updated. The inferred situations that did not meet a predefined quality level are temporarily stored to be compared with the results of the semantic fusion, which occurs in parallel. Higher-quality information is permanently aggregated to the current situation.

The result of the syntactic fusion process for the three reports is shown in Fig. 23.3. The main downfalls of the syntactic fusion are the lack of properties among the identified terms (activities that imply a relation between objects, e.g., “wallet belongs to victim” or “criminal ran to subway”) and the capacity of recognizing multiple similar objects of the same class with the same meaning (e.g., victim’s car and criminal’s car).

Using only this process could lead to major failures in the final result, directly affecting quality indexes such as completeness and consistency. These failures in turn may affect the process of building SAW.

4.4 Semantic Analysis

At the beginning of the semantic process, information is not analyzed directly, i.e., by the way the terms are written, as is in the syntactic analysis. At first, each report is structured in ontological instances, based on an ontology developed for this domain. Each of the reports described above will be a different instance, depending on the situation it represents.

This process of semantic instantiation starts from the identification of elements that are stored in the information representation layer. Then, for each report, SPARQL queries are made in the domain ontology for each previously identified element. These queries seek to identify which class or set of classes of the ontology best represents the elements and the probable properties that characterize them, even if they are not explicit in the report. The result of this process is an instance of a situation, which does not necessarily represent a crime.

At this point, we have several instances belonging to the same events without relationships between them. The next step is to perform SPARQL queries, now considering a local context, i.e., inside each class, to identify common properties of instances of elements, to make it possible to infer new objects, properties, and attributes, in order to define a situation for each report. These instances are shown in Figs. 23.4, 23.5, and 23.6.

In these figures, the rectangles represent the instances, and the circles represent the attributes. The arrows and links represent the connections and their properties between classes and attributes, defined according to a vocabulary that was also developed for the domain and shown in the following figures.

The instances are assigned according to the classes they represent to facilitate the understanding of the situation. Pink represents objects related to the victim; red represents objects related to the criminal; green are site-related; purple is the victim; blue are criminals; and white represents the instance of the situation.

The semantic processing starts with the analysis of the elements found in Crime Report 1, being victim, vehicle, two individuals, black motorcycle, motorcycle passenger, with a, gun, threatened, subtracted, and his vehicle.

Based on these terms, queries are made to the situational ontology, which will return which class they fit into, or if they are just properties. In this case, a set of classes is returned with victim, criminal, and object and some properties, such as threatened and subtracted, which may represent a theft.

With SPARQL queries to a rich ontology, it is also possible to make associations between distinct terms. For instance, Report 1 does not have the term “criminal,” and yet a criminal class was identified, because in the ontology, one of the instantiated terms that characterizes a criminal and is present in Report 1 is “individuals.”

In another example, we perceived a “hold” property of a criminal for a firearm, indicating that the criminals have a gun, even though there is no term “held” in the report. This is because the term “with” is on the report, and through the ontology, it is possible to establish a covariance between them. At the end of the semantic analysis, the entities are correlated according to properties, dependencies, or relationships found in the reports and predicted in the vocabulary. The result (Fig. 23.4) for Crime Report 1 is also persisted in the information representation layer.

This routine of instantiating the ontology with inferred objects, attributes, and properties regarding a single situation occurs for all reports submitted to the semantic process. Figures 23.5 and 23.6 shows the results of this process for Reports 2 and 3. As it is possible to observe in Figs. 23.1, 23.2, and 23.3, the three reports seem to refer to the same situation, and fragments of it were described by different people present in the situation.

This relationship between the situations found in the reports is easily understood by a human, who deduces the situations and interprets them into a situational knowledge (what is known about a situation). However, in a more complex scenario, humans become error-prone and cannot absorb all the characteristics of a situation.

At this point semantic fusion combines new reports with instances of RDFs (previously processed information stored in the information representation layer). The goal is to build a computational model very close to a human mental model, through ontologies and vocabularies. This process uses the information in the information representation layer, after the process of semantic identification.

Semantic fusion is very similar to the semantic identification, but at a higher level of significance, once its inferences are made using instances from all reports. Also, semantic fusion considers all the properties presented in each internal instance of the elements.

At the end of the semantic fusion, we will have a new set of possible situations varying in the organization and presence of the elements and properties. Each of these possible situations is saved in the representation layer and has its quality assessed.

Then the multicriteria assessment is performed, mainly evaluating the improvements in the quality indexes. The situation that presents the best quality indexes and that satisfies criteria elicited in the requirements, like the presence of some specific element, will be elected as the final situation resulting from the fusion. Again, this situation is persisted by the ontology in the representation layer and later presented at the interface.

The semantic fusion of the analyzed reports allows to obtain an information with greater added value by the junction of the terms found. In this case study with the three reports, the fused information allows the identification of the clothes of both criminals, the characteristics of the stolen object, the weapon used by the criminals and vehicle of escape, as well as the place where the crime occurred. The examples show that the crime reports alone do not present these situations explicitly.

The semantic fusion result is shown in Fig. 23.7, which uses the same colors as before, with the difference that the yellow color represents the information that was fused from the three reports. In addition to the presence of more instances and attributes in the fused situation, it is also possible to note new properties, which were not explicit when considering each report separately.

Another point to be noted is that the quality of the information can be assessed in the ontology to allow the verification of what information has a better quality and thus to decide what should be used in the fusion or not. For this assessment we used the Data Quality Vocabulary (DQV) ontology that is built on top of another quality ontology Dataset Quality Vocabulary (DAQ). DQV allows the creation of instances of categories, dimensions, and metrics to map quality measurements.

For each instance of the ontology, an instance of quality graph is created, making the connection between all quality measures applied to that instantiated information. Figure 23.7 shows the result of the semantic fusion with quality assessment, showing the dimensions of consistency and currentness for the theft situation assessed.

To graphically represent the results of this study, a system called Emergency Situation Assessment System (ESAS) (Fig. 23.8) was developed, guided by the Quantify model. This system has the capabilities of dealing with human-generated input and inferring what is/was going on by processing the natural language, which is useful to real-time (emergency) or risk analysis (historical data).

Figure 23.8 also shows the UI of ESAS, which has in the top right corner the “Event Table,” where human-generated input is reproduced, highlighting the transformations over the raw data, e.g., the discovery of a new relevant entity such as a criminal and its characteristics. In the bottom right corner of ESAS, there is the “Map of Reports.” The display shows the data sources placed on their origin location (where the reports came from). The raw data can be extracted from sources by user interaction with the placed pins. This map can also be populated by data from social network (e.g., Twitter posts).

In the left side, there is the “Situation Graph.” This display contains a hierarchical structure that represents the current situation picture, i.e., what is going on, with the central node being the situation itself, the next level the classes that composes the situation, and the leaves the instances of each classes that specifies the event. The color of the nodes represents what the information quality is, ranging from solid red when the quality is low to solid green when the quality is high.

5 Conclusions

This chapter presented a new IF model named Quantify and highlighted how it deals with situational information by syntactic and semantic processes. In general, Quantify aims to improve SAW for human operators that assess situations to make decision in complex scenarios, such as in risk and emergency management.

This work also demonstrated the way of incorporating new objects and a situation assessment cycle by utilizing the semantic fusion routines into the IF process. This approach can be inserted in situation evaluation routines, so that decision-makers may reason about information quality dimensions and seek better quality. Moreover, this work also presented methods for fusing information by using multiple hierarchical and representation criteria of knowledge.

The use of quality indexes may contribute in the future to the fusion of data from physical sensors with human-generated data. The Quantify and its IF process, with the associated methods, were validated by the results of the acquisition of useful information for supporting SAW, according to the requirements set by domain experts. This work also showed that data and information quality can act as a method for integrating heterogeneous data.

The information required to develop SAW was successfully built incrementally using syntactic and semantic inputs. The use of multicriteria information fusion enabled the assessment of situations by generating various possibilities for integration of synergistic information for the analysis of a specialist.

The continuous assessment of data and its quality proved that, at each evolution in the situation, improved and updated information were available for fusion and graphic representation, even if recently acquired and inferred. These assessment routines also proved the capabilities of Quantify in processing human feedbacks and supporting their interactions with the automation. Hence, with the establishment of new connections on situational information, by semantics and quality assessments, the authors state that the awareness of decision-makers on critical situations can be improved and their uncertainty mitigated.

References

E. Blasch, Level 5: user refinement issues supporting information fusion management, in 2006 9th International Conference on Information Fusion, Florence, vol. 5, July 2006, pp. 1–8
Google Scholar
E. Blasch, High level information fusion (HLIF): survey of models, issues, and grand challenges, in IEEE A&E Systems Magazine (2012), pp. 4–20
Google Scholar
E. Blasch, S. Plano, DFIG level 5 user refinement issues supporting situational assessment reasoning, in 2005 7th International Conference on Information Fusion, Philadelphia, vol. 5 (2005), pp. xxxv–xliii
Google Scholar
E. Blasch, P. Valin, A.L. Jousselme, D. Lambert, É. Bossé, 2 Top ten trends in high-level information fusion, in Proceedings of the 15th International Conference on Information Fusion (FUSION), Singapore (2012), pp. 2323–2330
Google Scholar
E. Blasch, J. Schubert, K.B. Laskey, G.W. Ng, R. Nagi, P.C.G. Costa, D. Stampouli, J. Schubert, P. Valin, Issues of uncertainty analysis in high-level information fusion, in Proceedings of the 15th International Conference on Information Fusion (FUSION), Singapore (2012), p. 13
Google Scholar
E. Blasch, A. Steinberg, S. Das, J. Llinas, C. Chong, O. Kessler, E. Waltz, F. White, Revisiting the JDL model for information exploitation, in 16th International Conference on Information Fusion, Istanbul (2013), pp. 129–136
Google Scholar
L.C. Botega, C. Berti, R.B. Araújo, V.P.A. Neris, A model to promote interaction between humans and data fusion intelligence to enhance situational awareness, in Human-Computer Interaction. Theories, Methods, and Tools. HCI 2014, ed. by M. Kurosu. Lecture Notes in Computer Science, vol. 8510 (Springer Cham, 2014), https://doi.org/10.1007/978-3-319-07233-3_37
L.C. Botega, J.O. Souza, F.R. Jorge, C.S. Coneglian, M.R. de Campos, V.P.A. Neris, R.B. Araújo, Methodology for data and information quality assessment in the context of emergency situational awareness. Univ. Access Inf. Soc. 16(4), 889–902 (2016)
Article Google Scholar
L.C. Botega, V.A.P. Junior, A.C.M. Oliveira, R.B. Araujo, J.F. Saran, L.A. Villas, Quality-aware human-driven information fusion model, in International Conference on Information Fusion, Xian (IEEE Computer Society, 2017), pp. 1–10
Google Scholar
C. Chelba, D. Bikel, M. Shugrina, P. Nguyen, S. Kumar, Large scale language modeling in automatic speech recognition (2012). CoRR, abs/1210.8
Google Scholar
M.R. Endsley, The challenge of the information age, in Proceedings of the Second International Workshop on Symbiosis of Humans, Artifacts and Environment, Kyoto (2001)
Google Scholar
M.R. Endsley, What is situation awareness? in Designing for Situation Awareness: An Approach to User-Centered Design, 2nd edn. (CRC Press, Boca Raton, 2011), pp. 13–30
Book Google Scholar
M.R. Endsley, Final reflections: situation awareness models and measures. J. Cogn. Eng. Decis. Mak. 9, 101–111 (2015)
Article Google Scholar
M.R. Endsley, D.G. Jones, Designing for Situation Awareness: An Approach to User-Centered Design, 2nd edn. (Taylor & Francis, London/New York, 2012)
Google Scholar
D.L. Hall, J. Llinas, M.D. McNeese, Modeling and mapping of human source data. Technical report, College of Information Sciences and Technology, The Pennsylvania State University (2011)
Book Google Scholar
V.A.P. Junior, M.F. Sanches, L.C. Botega, C.S. Coneglian, Towards semantic fusion using information quality awareness to support emergency situation assessment, in Advances in Intelligent Systems and Computing, vol. 444 (Springer, Cham, 2016), pp. 145–155
Google Scholar
M. Kokar, M.R. Endsley, Situation awareness and cognitive modeling. IEEE Intell. Syst. 27(3), 91–96 (2012)
Article Google Scholar
J. Llinas, C. Bowman, G. Rogova, A. Steinberg, Revisiting the JDL data fusion model II, in 7th International Conference on Information Fusion, Stockholm (2004)
Google Scholar
M. Nilsson, J.V. Laere, T. Susi, T. Ziemke, Information fusion in practice: a distributed cognition perspective on the active role of users. Inform. Fusion 13(1), 60–78 (2012)
Article Google Scholar
N. Oliveira, F.R. Jorge, J.O. Souza, V.A.P. Junior, L.C. Botega, Development of a user interface for the enrichment of situational awareness in emergency management systems, in Advances in Safety Management and Human Factors. Advances in Intelligent Systems and Computing, vol. 491, ed. by P. Arezes (Springer, Cham, 2016), https://doi.org/10.1007/978-3-319-41929-9_17
A.C.M. Oliveira, L.C. Botega, J.F. Saran, J.N. Silva, J.O.S.F. Melo, M.F.D. Tavares, V.P.A. Neris, Crowdsourcing, data and information fusion and situation awareness for emergency management of forest fires: the project DF100Fogo (FDWithoutFire). Comput. Environ. Urban. Syst. (2017). https://doi.org/10.1016/j.compenvurbsys.2017.08.006
V.A.Pereira Junior, M.F. Sanches, L.C. Botega, C.S. Coneglian, N. Oliveira, R.B. Araújo, Using semantics to improve information fusion and increase situational awareness, in Advances in Intelligent Systems and Computing, vol. 491 (Springer, Cham, 2016), pp. 101–113
Google Scholar
V.A. Pereira Junior, M.F. Sanches, J.F. Saran, C.S. Coneglian, L.C. Botega, R.B. Araujo, Towards semantic fusion using information quality and the assessment of objects and situations to improve emergency situation awareness, 2016 Eleventh International Conference on Digital Information Management (ICDIM), Porto, pp. 260–265, doi: https://10.1109/ICDIM.2016.7829794
Google Scholar
V.A. Pereira Junior, J.F. Saran, L.Z. Ladeira, J.H. Martins, V. Pagotti, A.M. Souza, L.A. Villas, L.C. Botega, Beyond syntactic data fusion in the context of criminal data analysis, in 2017 12th Iberian Conference on Information Systems and Technologies (CISTI) (IEEE, Piscataway, 2017), pp. 1–6
Google Scholar
J.J. Salerno, Information fusion: a high-level architecture overview, in Fifth International Conference on Information Fusion, Maryland, vol. 1 (2002), pp. 680–686
Google Scholar
M.F. Sanches, V.A.P. Junior, J.O. de Souza, C.S. Coneglian, F.R. Jorge, N.P. Oliveira, L.C. Botega, Objects assessment approach using natural language processing and data quality to support emergency situation assessment, in 18th International Conference on Human-Computer Interaction, Toronto (Springer, Berlin/Heidelberg, 2016), pp. 238–244
Google Scholar
J.F. Saran, V. Mendes, V.A.P. Junior, C.G. Santos, G. Nascimento, M.F.D. Tavares, A.C.M. Oliveira, L.C. Botega, Data and information fusion in the context of emergency management: the DF100Fogo project, in 2017 12th Iberian Conference on Information Systems and Technologies (CISTI) (IEEE, Piscataway, 2017), pp. 1–6
Google Scholar
N.A. Stanton, J. Piggott, Situational awareness and safety. Saf. Sci. 44, 0–17 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School in Information Science, São Paulo State University (UNESP), Marília, Brazil
Leonardo Castro Botega, Valdir Amancio Pereira Junior & Gustavo Marttos Cáceres Pereira
Computer Science and Information Systems, University Centre Eurípides of Marília (UNIVEM), Marília, Brazil
Allan Cesar Moreira de Oliveira & Jordan Ferreira Saran
Institute of Computing, State University of Campinas (UNICAMP), Campinas, Brazil
Lucas Zanco Ladeira
Institute of Mathematics and Computer Science, University of São Paulo (USP), São Carlos, Brazil
Seiji Isotani

Authors

Leonardo Castro Botega
View author publications
You can also search for this author in PubMed Google Scholar
Allan Cesar Moreira de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Valdir Amancio Pereira Junior
View author publications
You can also search for this author in PubMed Google Scholar
Jordan Ferreira Saran
View author publications
You can also search for this author in PubMed Google Scholar
Lucas Zanco Ladeira
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Marttos Cáceres Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Seiji Isotani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Castro Botega .

Editor information

Editors and Affiliations

IMT-Atlantique, Brest, France
Éloi Bossé
The State University of New York at Buffalo, Buffalo, NY, USA
Galina L. Rogova

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Botega, L.C. et al. (2019). Quantify: An Information Fusion Model Based on Syntactic and Semantic Analysis and Quality Assessments to Enhance Situation Awareness. In: Bossé, É., Rogova, G. (eds) Information Quality in Information Fusion and Decision Making. Information Fusion and Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-03643-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-03643-0_23
Published: 02 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03642-3
Online ISBN: 978-3-030-03643-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics