Incomplete Information in the Relational Data Model
Correct treatment of incomplete information in databases is of fundamental importance, since it is very rare that in practice all the information stored in a database is complete. There are several different types of incompleteness that need to be taken into account. In the first case some information in the database may be missing. Missing information generally falls into two categories; applicable information, for example, if the name of the course that Iris is taking is applicable but unknown, and inapplicable information, for example, if Iris does not have a spouse. In both cases the missing information can be modelled by special values, called nll values, which act as place holders for the information that is missing. Varied interpretations of null values within these two categories were listed in [ ANS75]. In the second case information in the database may be inconsistent, for example, if two different ages were recorded for Iris when Iris is only allowed to have one age. Inconsistency can normally be detected during updates to the database and in such cases it can be avoided. In the third case incompleteness involves the modelling of disjunctive information, which is a special case of applicable but unknown information. For example, we may know that Iris either belongs to the Computer Science department or to the Maths department but we do not know for certain to which department she belongs. Disjunctive information can be modelled by a finite set of values, called an or-set, one of these values being the true value. That is, Iris's department is a member of the or-set Computer_Science, Maths. In the fourth case incompleteness relates to fuzzy information. In this case the membership of an attribute value may be fuzzy; namely, it may be a number in the interval [0, 1] or a linguistic value such as short, medium or tall. For example, Iris's age may be recorded as young and her performance in last year's exam may be recorded as 0.7. Fuzzy sets are also able to model the situation where there is uncertainty about the membership of a tuple in a relation. For example, we may only know with a degree of 0.8 certainty that the tuple recording information about Iris is actually true, i.e. the membership of that tuple is 0.8. Finally, we could use a probabilistic approach to incomplete information by attaching to each attribute value a probability between 0 and 1 according to a known distribution for that attribute domain. This approach allows the use of statistical inference during query processing in order to obtain approximate answers. We will further discuss the use of probability theory in modelling incomplete information at the end of the chapter.
KeywordsFunctional Dependency Axiom System Integrity Constraint Relation Schema Relational Algebra
Unable to display preview. Download preview PDF.