Skip to main content

Repair Diversification for Functional Dependency Violations

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8422))

Included in the following conference series:

Abstract

In practice, data are often found to violate functional dependencies, and are hence inconsistent. To resolve such violations, data are to be restored to a consistent state, known as “repair”, while the number of possible repairs may be exponential. Previous works either consider optimal repair computation, to find one single repair that is (nearly) optimal w.r.t. some cost model, or discuss repair sampling, to randomly generate a repair from the space of all possible repairs.

This paper makes a first effort to investigate repair diversification problem, which aims at generating a set of repairs by minimizing their costs and maximizing their diversity. There are several motivating scenarios where diversifying repairs is desirable. For example, in the recently proposed interactive repairing approach, repair diversification techniques can be employed to generate some representative repairs that are likely to occur (small cost), and at the same time, that are dissimilar to each other (high diversity).Repair diversification significantly differs from optimal repair computing and repair sampling in its framework and techniques. (1) Based on two natural diversification objectives, we formulate two versions of repair diversification problem, both modeled as bi-criteria optimization problem, and prove the complexity of their related decision problems. (2) We develop algorithms for diversification problems. These algorithms embed repair computation into the framework of diversification, and hence find desirable repairs without searching the whole repair space. (3) We conduct extensive performance studies, to verify the effectiveness and efficiency of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS (1999)

    Google Scholar 

  2. Bertossi, L.: Database repairing and consistent query answering. Morgan & Claypool Publishers (2011)

    Google Scholar 

  3. Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost based model and effective heuristic for repairing constraints by value modification. In: SIGMOD (2005)

    Google Scholar 

  4. Beskales, G., Ilyas, I., Golab, L.: Sampling the repairs of functional dependency violations under dard constraints. VLDB (2010)

    Google Scholar 

  5. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. VLDB (2007)

    Google Scholar 

  6. Chu, X., Ilyas, I., Papotti, P.: Holistic data cleaning: Putting violations into context. ICDE (2013)

    Google Scholar 

  7. Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I., Ouzzani, M., Tang, N.: NADEEF: A commodity data cleaning system. SIGMOD (2013)

    Google Scholar 

  8. Drosou, M., Pitoura, E.: Search result diversification. SIGMOD Record 39(1), 41–47 (2010)

    Article  Google Scholar 

  9. Fan, W., Geerts, F.: Foundations of data quality management. Morgan & Claypool Publishers (2012)

    Google Scholar 

  10. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB (2010)

    Google Scholar 

  11. Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. WWW (2009)

    Google Scholar 

  12. Hassin, R., Rubinstein, S., Tamir, A.: Approximation algorithms for maximum dispersion. Operations Research Letters 21(3), 133–137 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  13. Kolahi, S., Lakshmanan, L.: On approximating optimum repairs for functional dependency violations. ICDT (2009)

    Google Scholar 

  14. Ravi, S., Rosenkrantz, D., Tayi, G.: Heuristic and special case algorithms for dispersion problems. Operations Research 42(2), 299–310 (1994)

    Article  MATH  Google Scholar 

  15. Yakout, M., Elmagarmid, A., Neville, J., Ouzzani, M., Ilyas, I.: Guided data repair. VLDB (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

He, C., Tan, Z., Chen, Q., Sha, C., Wang, Z., Wang, W. (2014). Repair Diversification for Functional Dependency Violations. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8422. Springer, Cham. https://doi.org/10.1007/978-3-319-05813-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05813-9_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05812-2

  • Online ISBN: 978-3-319-05813-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics