A Novel Integrated Approach for Companion Vehicle Discovery Based on Frequent Itemset Mining on Spark

  • Abdulrahman Al-badwi
  • Zhe Long
  • Zuping ZhangEmail author
  • Mohammed Al-habib
  • Kamal Al-Sabahi
Research Article - Computer Engineering and Computer Science


Companion vehicle discovery received much attention from the research community. It has been widely adopted by traffic management departments in many aspects such as the involved vehicle tracking. Since there are a massive amount of traffic data that have complex and inaccurate accompanying vehicle relationships, companion vehicle discovery has become a challenge yet hot research topic. Several algorithms have been proposed to solve this issue on transactional datasets some of which are based on the frequent item mining algorithms that are used to extract knowledge from data in several real-world applications, such as market basket analysis, crime detection/prevention, and crowd mining. However, most of those algorithms mostly fail on large-scale datasets since it needs to scan the datasets iteratively for several times, which makes them unfeasible and time-consuming while dealing with big data. To this end, we proposed a novel HD-FIM algorithm to extract the companion vehicles from a massive amount of traffic data with the best execution efficiency on spark platform. It works in a hybrid approach between depth first and breadth first to handle the big data in distributed clusters. Experiment results show that the proposed algorithm, HD-FIM, outperforms the existing typical frequent itemset mining algorithms through practical vehicle set extraction calculations and it can be applied in any applicable traffic big data.


Companion vehicle discovery Frequent itemset mining Data mining Big data analysis Traffic big data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



This work is supported by the National Natural Science Foundation of China (Grant Nos. 61379109, M1321007) and Science and Technology Plan of Hunan Province (Grant Nos. 2014GK2018, 2016JC2011).


  1. 1.
    Wang, X.; Chen, L.; Zhu, M.: Instant traveling companion discovery based on traffic-monitoring streaming data. In: Web Information Systems and Applications Conference (2017)Google Scholar
  2. 2.
    Zhu, M.; Liu, C.; Wang, J.; Wang, X.; Han, Y.: Instant discovery of moment companion vehicles from big streaming traffic data. In: International Conference on Cloud Computing and Big Data (CCBD), 4–6 Nov 2015, pp. 73–80 (2015)Google Scholar
  3. 3.
    Fan, J.; Li, D.: An overview of data mining and knowledge discovery. J. Comput. Sci. Technol. 13(4), 348–368 (1998). CrossRefzbMATHGoogle Scholar
  4. 4.
    Gawwad, M.A.; Ahmed, M.F.; Fayek, M.B.: Frequent itemset mining for big data using greatest common divisor technique. Data Sci. J. 16, 25 (2017)CrossRefGoogle Scholar
  5. 5.
    Han, J.; Cheng, H.; Xin, D.; Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Agrawal, R.; Imielinski, T.; Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, DC (1993)Google Scholar
  7. 7.
    Zaki, M.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 372–390 (2000)CrossRefGoogle Scholar
  8. 8.
    Han, J.; Pei, J.; Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, no. 2 (2000)Google Scholar
  9. 9.
    Zaharia, M.; Chowdhury, M.; Das, T.; Dave, A.; Ma, J.; Mccauley, M.; Franklin, M.J.; Shenker, S.; Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Usenix Conference on Networked Systems Design and Implementation, pp. 2–2 (2012)Google Scholar
  10. 10.
    Nasreen, S.; Azam, M.A.; Shehzad, K.; Naeem, U.; Ghazanfar, M.A.: Frequent pattern mining algorithms for finding associated frequent patterns for data streams: a survey. Procedia Comput. Sci. 37, 109–116 (2014)CrossRefGoogle Scholar
  11. 11.
    Zaki, M.J.: Fast vertical mining using diffsets. In: ACM Sigkdd International Conference on Knowledge Discovery and Data Mining (2003)Google Scholar
  12. 12.
    Moens, S.; Aksehirli, E.; Goethals, B.: Frequent itemset mining for big data. In: IEEE International Conference on Big Data, 6–9 Oct 2013, pp. 111–118 (2013)Google Scholar
  13. 13.
    Lin, K.W.; Chung, S.H.; Lin, C.C.: A fast and distributed algorithm for mining frequent patterns in congested networks. Computing 98(3), 235–256 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Zahra Farzanyar, N.C.: Trip pattern mining using large scale geo-tagged photos. In: Proceedings of the International Conference on Computer and Information Science and Technology (2015)Google Scholar
  15. 15.
    Guo, J.; Ren, Y.: Research on improved apriori algorithm based on coding and mapreduce. In: 10th Web Information System and Application Conference (2013)Google Scholar
  16. 16.
    Lin, M.Y.; Lee, P.Y.; Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on mapreduce. In: Proceedings of the 16th International Conference on Ubiquitous Information Management and Communication (ICUIMC’12) (2012)Google Scholar
  17. 17.
    Nguyen, H.V.; Muller, E.; Bohm, K.: Scalable subspace search schema overcoming traditional apriori processing. In: IEEE International Conference on Big Data (2013)Google Scholar
  18. 18.
    Wei, Z.; et al.: Parallel implementation of AprioriTid algorithm with MapReduce (2015)Google Scholar
  19. 19.
    Qiu, H.; Gu, R.; Yuan, C.; Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: IPDPS Workshops (2014)Google Scholar
  20. 20.
    Rathee, S.; Kaul, M.; Kashyap, A.: R-Apriori: an efficient apriori based algorithm on spark. In: PIKM@CIKM (2015)Google Scholar
  21. 21.
    Rathee, S.; Kashyap, A.: Adaptive-Miner: an efficient distributed association rule mining algorithm on Spark. J. Big Data 5(1), 6 (2018)CrossRefGoogle Scholar
  22. 22.
    Laube, P.; Imfeld, S.: Analyzing relative motion within groups of trackable moving point objects. In: Egenhofer, M.J.; Mark, D.M. (eds.) Proceedings of Second International Conference on Geographic Information Science, GIScience 2002 Boulder, CO, USA, September 25–28, pp. 132–144. Springer, Berlin (2002)Google Scholar
  23. 23.
    Jeung, H.; Shen, H.T.; Zhou, X.: Convoy Queries in Spatio-Temporal Databases. In: IEEE 24th International Conference on Data Engineering, 7–12 April 2008, pp. 1457–1459 (2008)Google Scholar
  24. 24.
    Li, Z.; Ding, B.; Han, J.; Kays, R.: Swarm: mining relaxed temporal moving object clusters. In: PVLDB (2010)Google Scholar
  25. 25.
    Gonzalez, T.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Zhang, Q.; Lin, X.: Clustering moving objects for spatial-temporal selectivity estimation. In: ADC (2004)Google Scholar
  27. 27.
    Jensen, C.S.; L, D.; Ooi, B.C.: Continuous clustering of moving objects. IEEE Trans. Knowl. Data Eng. 19(19), 1161–1174 (2007)CrossRefGoogle Scholar
  28. 28.
    Zaki, M.J.: Parallel and distributed association mining: a survey. IEEE Concurr. 7, 14–25 (1999)CrossRefGoogle Scholar
  29. 29.
    Park, B.H.; Kargupta, H.: Distributed data mining: algorithms, systems, and applications (2002)Google Scholar
  30. 30.
    Li, Y.; Liao W.; Choudhary, A.N.: Parallel data mining algorithms for association rules and clustering. In: International Conference on Management of Data (2008)Google Scholar
  31. 31.
    Ozkural, E.; Ucar, B.; Aykanat, C.: Parallel frequent item set mining with selective item replication. IEEE Trans. Parallel Distrib. Syst. 22, 1632–1640 (2011)CrossRefGoogle Scholar
  32. 32.
    Zeng, L.; Li, L.; Duan, L.; Lu, K.; Shi, Z.; Wang, M.; Wu, W.; Luo, P.: Distributed data mining: a survey. Inf. Technol. Manag. 13, 403–409 (2012)CrossRefGoogle Scholar
  33. 33.
    Agrawal, R.; Srikant, R.: Fast algorithms for mining association rules (1994)Google Scholar
  34. 34.
    Li, L.; Min, Z.: The strategy of mining association rule based on cloud computing. In: International Conference on Business Computing and Global Informatization (2011)Google Scholar
  35. 35.
    Al-Haidari, F.; Sqalli, M.; Salah, K.: Impact of CPU utilization thresholds and scaling size on autoscaling cloud resources vol. 2 (2013)Google Scholar
  36. 36.
    Salah, K.; Elbadawi, K.; Boutaba, R.: An analytical model for estimating cloud resources of elastic services. J. Netw. Syst. Manag. 24(2), 285–308 (2016). CrossRefGoogle Scholar
  37. 37.
    Bu, Y.; et al.: HaLoop: efficient iterative data processing on large clusters. In: Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 285–296 (2010)Google Scholar
  38. 38.
    Heaton, J.: Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms. In: SoutheastCon 2016, March 30, 2016–April 3, 2016, pp. 1–7 (2016)Google Scholar
  39. 39.
    Brijs, T.: Retail market basket data set. In: Workshop on Frequent Itemset Mining Implementations (FIMI’03) (2003)Google Scholar

Copyright information

© King Fahd University of Petroleum & Minerals 2019

Authors and Affiliations

  1. 1.School of Information Science and EngineeringCentral South UniversityChangshaPeople’s Republic of China
  2. 2.Faculty of Computer and Information TechnologySana’a UniversitySana’aYemen

Personalised recommendations