Advertisement

Soft Computing

, Volume 22, Issue 9, pp 3061–3075 | Cite as

Implementation of scalable fuzzy relational operations in MapReduce

  • Elham S. Khorasani
  • Matthew Cremeens
  • Zhenge Zhao
Methodologies and Application
  • 78 Downloads

Abstract

One of the main restrictions of relational database models is their lack of support for flexible, imprecise and vague information in data representation and querying. The imprecision is pervasive in human language; hence, modeling imprecision is crucial for any system that stores and processes linguistic data. Fuzzy set theory provides an effective solution to model the imprecision inherent in the meaning of words and propositions drawn from natural language (Zadeh, Inf Control 8(3):338–353, doi: 10.1016/S0019-9958(65)90241-X, 1965; IGI Global, https://books.google.com/books?id=nt-WBQAAQBAJ, 2013). Several works in the last 20 years have used fuzzy set theory to extend relational database models to permit representation and retrieval of imprecise data. However, to our knowledge, such approaches have not been designed to scale-up to very large datasets. In this paper, the MapReduce framework is used to implement flexible fuzzy queries on a large-scale dataset. We develop MapReduce algorithms to enhance the standard relational operations with fuzzy conditional predicates expressed in natural language.

Keywords

Relational operations Fuzzy set theory MapReduce Fuzzy queries 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This work does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Afrati FN, Sarma AD, Menestrina D, Parameswaran A, Ullman JD (2012) Fuzzy joins using mapreduce. In: 2012 IEEE 28th international conference on data engineering (ICDE). IEEE, pp 498–509Google Scholar
  2. Atta F, Viglas SD, Niazi S (2011) Sand join: a skew handling join algorithm for google’s mapreduce framework. In: 2011 IEEE 14th international multitopic conference (INMIC), pp 170–175. doi: 10.1109/INMIC.2011.6151466
  3. Bosc P, Prade H (1997) An introduction to the fuzzy set and possibility theory-based treatment of flexible queries and uncertain or imprecise databases. In: Motro A, Smets P (eds) Uncertainty management in information systems. Springer, New York, pp 285–324CrossRefGoogle Scholar
  4. Buckles BP, Petry FE (1982) A fuzzy representation of data for relational databases. Fuzzy Sets Syst 7(3):213–226. doi: 10.1016/0165-0114(82)90052-5 CrossRefzbMATHGoogle Scholar
  5. Buckley JJ, Eslami E (2002) An introduction to fuzzy logic and fuzzy sets, vol 13. Springer, New YorkCrossRefzbMATHGoogle Scholar
  6. Chen G (1998) Fuzzy logic in data modeling: semantics, constraints, and database design. Kluwer Academic Publishers, NorwellCrossRefzbMATHGoogle Scholar
  7. Das Sarma A, He Y, Chaudhuri S (2014) Clusterjoin: a similarity joins framework using map-reduce. Proc VLDB Endow 7(12):1059–1070. doi: 10.14778/2732977.2732981 CrossRefGoogle Scholar
  8. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  9. Dubois D, Prade H (1986) Weighted minimum and maximum operations in fuzzy set theory. Inf Sci 39(2):205–210. doi: 10.1016/0020-0255(86)90035-6 MathSciNetCrossRefzbMATHGoogle Scholar
  10. Elmeleegy K, Olston C, Reed B (2014) Spongefiles: mitigating data skew in mapreduce using distributed memory. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. SIGMOD ’14, pp. 551–562. ACM, New York. doi: 10.1145/2588555.2595634
  11. Galindo J (2005) Fuzzy databases: modeling, design and implementation: modeling, design and implementation IGI GlobalGoogle Scholar
  12. Gufler B, Augsten N, Reiser A, Kemper A (2012) Load balancing in mapreduce based on scalable cardinality estimates. In: 2012 IEEE 28th international conference on data engineering (ICDE), pp 522–533. doi: 10.1109/ICDE.2012.58
  13. Hassan MAH, Bamha M, Loulergue F (2014) Handling data-skew effects in join operations using mapreduce. Proc Comput Sci 29:145–158. doi: 10.1016/j.procs.2014.05.014. 2014 International conference on computational science
  14. Klir GJ, Clair UHS, Yuan B (1997) Fuzzy set theory: foundations and applications. Prentice Hall. https://books.google.com/books?id=DNxQAAAAMAAJ
  15. Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. SIGMOD ’12, pp. 25–36. ACM, New York. doi: 10.1145/2213836.2213840
  16. Kyritsis V, Lekeas P, Souliou D, Afrati F (2012) A new framework for join product skew. In: Lacroix Z, Vidal M (eds) Resource discovery, vol 6799., Lecture notes in computer scienceSpringer, New York, pp 1–10CrossRefGoogle Scholar
  17. Ma ZM, Yan L (2010) A literature overview of fuzzy conceptual data modeling. J Inf Sci Eng 26(2):427–441Google Scholar
  18. Ma ZM, Zhang WJ, Ma WY (2000) Semantic measure of fuzzy data in extended possibility-based fuzzy relational databases. Int J Intell Syst 15(8):705–716. doi: 10.1002/1098-111X(200008)15:8705::AID-INT23.0.CO;2-4 CrossRefzbMATHGoogle Scholar
  19. Ma ZM, Mili F (2002) Handling fuzzy information in extended possibility-based fuzzy relational databases. Int J Intell Syst 17(10):925–942. doi: 10.1002/int.10057 CrossRefzbMATHGoogle Scholar
  20. Medina JM, Vila MA, Cubero JC, Pons O (1995) Towards the implementation of a generalized fuzzy relational database model. Fuzzy Sets Syst 75(3):273–289. doi: 10.1016/0165-0114(94)00380-P MathSciNetCrossRefzbMATHGoogle Scholar
  21. Metwally A, Faloutsos C (2012) V-smart-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors. Proc VLDB Endow 5(8):704–715CrossRefGoogle Scholar
  22. Petry FE (ed) (1997) Fuzzy databases: principles and applications. Kluwer Academic Publishers, NorwellGoogle Scholar
  23. Prade H, Testemale C (1984) Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries. Inf Sci 34(2):115–143. doi: 10.1016/0020-0255(84)90020-3 MathSciNetCrossRefzbMATHGoogle Scholar
  24. Ramakrishnan SR, Swart G, Urmanov A (2012) Balancing reducer skew in mapreduce workloads using progressive sampling. In: Proceedings of the 3rd ACM symposium on cloud computing. SoCC ’12, pp 16–11614. ACM, New York. doi: 10.1145/2391229.2391245
  25. Shenoi S, Melton A (1989) Proximity relations in the fuzzy relational database model. Fuzzy Sets Syst 31(3):285–296. doi: 10.1016/0165-0114(89)90201-7 MathSciNetCrossRefzbMATHGoogle Scholar
  26. Shenoi S, Melton A (1990) An extended version of the fuzzy relational database model. Inf Sci 52(1):35–52. doi: 10.1016/0020-0255(90)90034-8 MathSciNetCrossRefzbMATHGoogle Scholar
  27. US (2016) Department of transportation. Online; accessed 23 Feb 2016. https://www.transportation.gov/
  28. Vasant P (2013) Handbook of research on novel soft computing intelligent algorithms: theory and practical applications. Advances in computational intelligence and robotics (ACIR) book series. IGI Global. https://books.google.com/books?id=nt-WBQAAQBAJ
  29. Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp. 495–506. ACMGoogle Scholar
  30. Wang Y, Metwally A, Parthasarathy S (2013) Scalable all-pairs similarity search in metric spaces. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’13, pp. 829–837. ACM, New York. doi: 10.1145/2487575.2487625
  31. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353. doi: 10.1016/S0019-9958(65)90241-X CrossRefzbMATHGoogle Scholar
  32. Zadeh LA (1999) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 100 Suppl 1(0):9–34. doi: 10.1016/S0165-0114(99)80004-9 CrossRefGoogle Scholar
  33. Zhang C, Li J, Wu L, Lin M, Liu W (2012) Sej: an even approach to multiway theta-joins using mapreduce. In: 2012 Second international conference on cloud and green computing (CGC), pp 73–80. doi: 10.1109/CGC.2012.9

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois at SpringfieldSpringfieldUSA

Personalised recommendations