Advertisement

Reliability Models for Data Integration Systems

  • Adriana Marotta
  • Héctor Cancela
  • Verónika Peralta
  • Raul Ruggia
Chapter
Part of the Springer Series in Reliability Engineering book series (RELIABILITY)

Abstract

Data integration systems (DIS) are devoted to providing information by integrating and transforming data extracted from external sources. Examples of DIS are the mediators, data warehouses, federations of databases, and web portals. Data quality is an essential issue in DIS as it concerns the confidence of users in the supplied information. One of the main challenges in this field is to offer rigorous and practical means to evaluate the quality of DIS. In this sense, DIS reliability intends to represent its capability for providing data with a certain level of quality, taking into account not only current quality values but also the changes that may occur in data quality at the external sources. Simulation techniques constitute a non-traditional approach to data quality evaluation, and more specifically for DIS reliability. This chapter presents techniques for DIS reliability evaluation by applying simulation techniques in addition to exact computation models. Simulation enables some important drawbacks of exact techniques to be addressed: the scalability of the reliability computation when the set of data sources grows, and modeling data sources with inter-related (non independent) quality properties.

Keywords

Data Integration Systems (DIS) Quality Values Quality-Oriented Design Quality Evaluation Algorithm Restriction Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bulteau S, El Khadiri M (2002) A new importance sampling Monte Carlo method for a flow network reliability problem. Naval Res Logist 49(2):204–228zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Canavos G (1988) Probabilidad y estadística. Aplicaciones y métodos. McGraw Hill, Madrid, Spain [ISBN: 968-451-856-0]Google Scholar
  3. 3.
    Cho J, Garcia-Molina H (2003) Estimating frequency of change. ACM Trans Internet Technol 3(3):256–290CrossRefGoogle Scholar
  4. 4.
    Cancela H, El Khadiri M, Rubino G (2006) An efficient simulation method for K-network reliability problem. In 6th international workshop on rare event simulation (RESIM’2006), Bamberg, GermanyGoogle Scholar
  5. 5.
    Cancela H, El Khadiri M, Rubino G (2009) Rare events analysis by Monte Carlo techniques in static models. In: Rubino G and Tuffin B (eds) Rare event simulation methods using Monte Carlo methods, Chap 7. Wiley, Chichester, UKGoogle Scholar
  6. 6.
    Cancela H, Murray L, Rubino G (2008) Splitting in source-terminal network reliability estimation. In: 7th international workshop on rare event simulation (RESIM’2008), Rennes, FranceGoogle Scholar
  7. 7.
    Gertsbakh I (1989) Statistical reliability theory. Probability: pure and applied. (A series of text books and reference books.) Marcel Dekker, New York, NY, USA [ISBN: 0-8247-8019-1]Google Scholar
  8. 8.
    Gertz M, Tamer Ozsu M, Saake G, Sattler K (1998) Managing data quality and integrity in federated databases. In: 2nd working conference on integrity and internal control in information systems (IICIS’1998), Warrenton, USA, Kluwer, Deventer, The NetherlandsGoogle Scholar
  9. 9.
    Gertz M, Tamer Ozsu M, Saake G, Sattler K (2004) Report on the Dagstuhl seminar: data quality on the web. SIGMOD Rec 33(1), March. vol 33, issue 1 (March 2004) ACM, New York, NY, USA, pp 127–132Google Scholar
  10. 10.
    Helfert M, Herrmann C (2002) Proactive data quality management for data warehouse systems. In: International workshop on design and management of data warehouses (DMDW’2002), Toronto, Canada. University of Toronto Bookstores, Toronto, Canada, pp 97–106Google Scholar
  11. 11.
    Hui K, Bean N, Kraetzl M, Kroese D (2005) The cross-entropy method for network reliability estimation. Oper Res 134:101–118zbMATHMathSciNetGoogle Scholar
  12. 12.
    Jankowska M A (2000) The need for environmental information quality. Issues in Science and Technology Librarianship. http://www.library.ucsb.edu/istl/00-spring/article5.html (Last modified in 2000.)
  13. 13.
    Jarke M, Vassiliou Y (1997) Data warehouse quality: a review of the DWQ project. In: 2nd conference on information quality (IQ’1997), Cambridge, MA, MIT Pub, Cambridge, MA, USAGoogle Scholar
  14. 14.
    Marotta A (2008) Data quality maintenance in data integration systems. PhD thesis, University of the Republic, UruguayGoogle Scholar
  15. 15.
    Marotta A, Ruggia R (2008) Applying probabilistic models to data quality change management. In: 3rd international conference on software and data technologies (ICSOFT’2008), Porto, Portugal, INSTICC, Setubal, PortugalGoogle Scholar
  16. 16.
    Mazzi G L, Museux J M, Savio G (2005) Quality measures for economic indicators. Statistical Office of the European Communities, Eurostat, http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-DT-05-003/EN/KS-DT-05-003-EN.PDF [ISBN 92-894-8623-6]
  17. 17.
    Müller H, Naumann F (2003) Data quality in genome databases. In: Proceedings of the 8th international conference on information quality (IQ 2003), MIT, Cambridge, MA, USAGoogle Scholar
  18. 18.
    Neely M (2005) The product approach to data quality and fitness for use: a framework for analysis. In: 10th international conference on information quality (IQ’2005), Cambridge, MA, MIT Pub, Cambridge, MA, USAGoogle Scholar
  19. 19.
    Peralta V (2006) Data quality evaluation in data integration systems. PhD thesis, University of Versailles, France and University of the Republic, Uruguay.Google Scholar
  20. 20.
    Peralta V, Ruggia R, Bouzeghoub M (2004) Analyzing and evaluating data freshness in data integration systems. Ing Syst Inf 9(5–6):145–162Google Scholar
  21. 21.
    Peralta V, Ruggia R, Kedad Z, Bouzeghoub M (2004) A framework for data quality evaluation in a data integration system. In: 19th Brazilian symposium on databases (SBBD’2004), Brasilia, Brazil, Universidade de Brasilia, Brasilia, Brasil, pp 134–147Google Scholar
  22. 22.
    Rubino G (1999) Network reliability evaluation. In: Walrand J, Bagchi K, Zobrist G (eds) Network performance modeling and simulation. Gordon and Breach Science Publishers, AmsterdamGoogle Scholar
  23. 23.
    Salanti G, Sanderson S, Higgins J (2005) Obstacles and opportunities in meta-analysis of genetic association studies. Genet Med 7(1):13–20CrossRefGoogle Scholar
  24. 24.
    Scannapieco M, Missier P, Batini C (2005) Data quality at a glance. Datenbank-Spektrum 14:6–14Google Scholar
  25. 25.
    US Environment Protection Agency (2004) Increase the availability of quality health and environmental information. Available at http://www.epa.gov/oei/increase.htm (last accessed August 2004)

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  • Adriana Marotta
    • 1
  • Héctor Cancela
    • 1
  • Verónika Peralta
    • 1
    • 2
  • Raul Ruggia
    • 1
  1. 1.Computer Science Institute at the Engineering SchoolUniversidad de la RepúblicaMontevideoUruguay
  2. 2.Laboratoire d’InformatiqueUniversité François Rabelais ToursToursFrance

Personalised recommendations