Skip to main content

ProLoD: An Efficient Framework for Processing Logistics Data

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems. OTM 2017 Conferences (OTM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10573))

  • 1669 Accesses

Abstract

Logistics is a data-intensive industry. The information systems used by logistics companies generate massive volume of data which the companies store to perform different types of analysis. In addition, the advent of Big Data technologies and Internet of Things paradigm have given logistics companies an opportunity to use external data stemming from a wide variety of sources including sensors (e.g., GPS), social media and traffic controlling systems. The logistics companies aim to leverage the power of these external data and perform rigorous analysis in real-time to discover intelligence such as unpredictable delay. However, there are different challenges involved. One of the core challenges is integrating and processing a wide variety of data coming from heterogeneous sources. To the best of our knowledge, there is no off-the shelf solution which can address this challenge. In this paper, we present a framework called ProLoD which performs pre-processing and processing tasks with different types of data. Our framework relies on machine learning algorithms, for processing data; however, we found that the ready to use algorithms are not adequate to guarantee processing efficiency. Therefore, we extended an algorithm called Hierarchical Clustering Algorithm. We evaluated ProLoD by comparing its performance with the HCL algorithm found in the widely-adopted machine learning tool called WEKA. We found that ProLoD is performing reasonably better than WEKA in terms of producing optimal number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    DHL: http://www.dhl.com/en.html.

  2. 2.

    UPS: http://www.uplonline.com.

  3. 3.

    FedEx: http://www.fedex.com/us/.

  4. 4.

    Facebook: https://www.facebook.com.

  5. 5.

    Twitter: https://twitter.com/?lang=en.

  6. 6.

    http://www.datumbox.com.

  7. 7.

    http://www.philippe-fournier-viger.com/spmf/.

  8. 8.

    http://spark.apache.org/mllib/.

  9. 9.

    We contacted Dr. Albert Bifet the author of the library for assistance because it was only running for a specific number of data points then starts throwing errors but the problem was not solved.

  10. 10.

    http://spark.apache.org.

  11. 11.

    http://storm.apache.org.

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB) (2003)

    Google Scholar 

  2. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: On High Dimensional Projected Clustering of Data Streams. Data Min. Knowl. Disc. 10, 251–273 (2005)

    Article  MathSciNet  Google Scholar 

  3. AL-Zoubi, M.B., Hudaib, A., Al-Shboul, B.: A fast fuzzy clustering algorithm. In: Proceedings of the 6th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, pp. 28–32 (2007)

    Google Scholar 

  4. Babcock, B., Datar, M., O’Callaghan, R.M.L.: Maintaining variance and k-medians over data stream windows. In Proceedings of the 22nd ACM Symposium on Principles of Databases Systems (2003)

    Google Scholar 

  5. Bifet, A., Holmes, G., Kirkby, G., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  6. Chen, J.G., Wiener, L.J., Iyer, S., Jaiswal, A., Lei, R., Simha, N., Wang, W., Wilfong, K., Williamson, T., Yilmaz, S.: Realtime data processing at Facebook. In: Proceedings of the 2016 International Conference on Management of Data (SIGMOD 2016), pp. 1087–1098. ACM, New York (2016). doi:10.1145/2882903.2904441

  7. Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)

    Google Scholar 

  8. Cios, K.J., Pedrycz, W., Swiniarski, R.W.: Data mining and knowledge discovery. Data Mining Methods for Knowledge Discovery. The Springer International Series in Engineering and Computer Science, vol. 458, pp. 1–26. Springer, Boston (1998). doi:10.1007/978-1-4615-5589-6_1

    Chapter  Google Scholar 

  9. Chennamangalam, J., Karastergiou, A., Armour, W., Williams, C., Giles, M.: ARTEMIS: a real-time data processing pipeline for the detection of fast transients. In: 2015 1st URSI Atlantic Radio Science Conference (URSI AT-RASC), Gran Canaria, Spain, p. 1 (2015). doi:10.1109/URSI-AT-RASC.2015.7303171

  10. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the Sixth SIAM International Conference on Data Mining (2006)

    Google Scholar 

  11. Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)

    Google Scholar 

  12. Cheng, W.T., Goldgof, B.D., Hall, O.L.: Fast fuzzy clustering. In: Fuzzy Sets and Systems, pp. 49–56 (1998)

    Google Scholar 

  13. Cannon, R., Dave, V.J., Bezdek, C.J.: Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 8(2), 248–255 (1986)

    Article  MATH  Google Scholar 

  14. Dhamankar, R. and Gade, K.: Realtime analytics @ twitter. In: Proceedings of the Fifth International Workshop on Cloud Data Management (CloudDB 2013), pp. 1–2. ACM, New York (2013). doi:10.1145/2516588.2516593

  15. Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)

    Google Scholar 

  16. van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., van der Aalst, W.M.P.: The ProM framework: a new era in process mining tool support. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 444–454. Springer, Heidelberg (2005). doi:10.1007/11494744_25

    Chapter  Google Scholar 

  17. Estivill-Castro, V.: Why so many clustering algorithms: a position paper. SIGKDD Explor. Newsl. 4(1), 65–75 (2002). doi:10.1145/568574.568575

    Article  MathSciNet  Google Scholar 

  18. Everitt, B.S., Landau, S., Leese, M., : Cluster Analysis Arnold. A Member of the Hodder Headline Group, London (2002)

    Google Scholar 

  19. Galili, T.: dendextend: an R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics, btv428 (2015)

    Google Scholar 

  20. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)

    Article  Google Scholar 

  21. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science (2000)

    Google Scholar 

  22. Graham, R.L., Hell, P.: On the history of the minimum spanning tree problem. Ann. Hist. Comput. 7(1), 43–57 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  23. Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: The Elements of Statistical Learning, pp. 485–585. Springer, New York (2009). doi:10.1007/978-0-387-84858-7_14

  24. Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley, Hoboken (2013)

    Google Scholar 

  25. Kochhar, A.: Distributed real time data processing for manufacturing organizations. IEEE Trans. Eng. Manage. 24(4), 119–124 (1977). doi:10.1109/TEM.1977.6447256

    Article  Google Scholar 

  26. Jeseke, M., Gruner, M., Wei, F.: Big Data in Logistics - A DHL Perspective on How to Move Beyond the Hype. DHL Customer Solution and Innovation (2015)

    Google Scholar 

  27. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  28. Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.): Machine Learning: An Artificial Intelligence Approach. Springer, Heidelberg (2013)

    Google Scholar 

  29. Khalilian, M., Mustapha, N., Sulaiman, N.M., Boroujeni, Z.F.: KMeans divide and conquer clustering. Presented at ICCAE, Thiland, Bangkok (2009)

    Google Scholar 

  30. Udommanetanakit, K., Rakthanmanon, T., Waiyamai, K.: E-stream: evolution-based technique for stream clustering. In: Proceedings of the 3rd International Conference on Advanced Data Mining and Applications. ADMA (2007)

    Google Scholar 

  31. Hathaway, J.R., Bezdek, C.J.: Extending fuzzy and probabilistic clustering to very large data sets. J. Comput. Stat. Data Anal. 51(1), 215–234 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  32. Pferd, W.J.: The Challenges of Integrating Structured and Unstructured Data. Technical report. PNEC Conference (2010)

    Google Scholar 

  33. Vryniotis, V.: DatumBox machine learning framework. http://www.datumbox.com/

  34. Meng, X., Bradley, J., Yavuz, B.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17, 1–7 (2016)

    MATH  MathSciNet  Google Scholar 

  35. Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C., Tseng, V.S.: SPMF: a Java open-source pattern mining library. J. Mach. Learn. Res. (JMLR) 15, 3389–3393 (2014)

    MATH  Google Scholar 

  36. Top Logistics Challenges Facing Shippers Today. http://www.logisticsplus.net/top-logistics-challenges-facing-shippers-today/. Date accessed: 30 Mar 2016

  37. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, H.I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  38. Stelzner A.M.: Social Media marketing Industry Report - How Marketers are using Social Media to Grow Their Business. Social Media Examiners (2016)

    Google Scholar 

  39. Trevor, H., Robert, T., Jerome, F.: Hierarchical clustering. In: The Elements of Statistical Learning (PDF), 2nd edn., pp. 520–528. Springer, New York (2009). ISBN 0-387-84857-6

    Google Scholar 

  40. Nwaubani, J.: Business intelligence and logistics. In: Proceedings of the 1st Olympus International Conference on Supply Chain, Katerini, Greece

    Google Scholar 

  41. Mahobiya, C., Kumar, M.: Performance comparison of two streaming data clustering algorithms. Int. J. Comput. Trends Technol. (IJCTT) 12(2) (2014)

    Google Scholar 

  42. Perera, S., Suhothayan, S.: Solution patterns for realtime streaming analytics. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS 2015), pp. 247–255. ACM, New York (2015). doi:10.1145/2675743.2774214

  43. Taxidou, I., Fischer, F.: Realtime analysis of information diffusion in social media. Proc. VLDB Endow. 6, 416–1421 (2013). http://dx.doi.org/10.14778/2536274.2536328

  44. Vadrevu, S., Hui, C., Suju R.T., Punera, K., Dom, B., Smola, J.A., Chang, Y., Zheng, Z.: Scalable clustering of news search results. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM 2011), pp. 675–684. ACM, New York (2011). doi:10.1145/1935826.1935918

  45. Wu, C.H., Horng, S.J., Chen, Y.W., Lee, W.Y.: Designing scalable and efficient parallel clustering algorithms on arrays with reconfigurable optical buses. Image Vis. Comput. 18(13), 1033–1043 (2000)

    Article  Google Scholar 

  46. Zhang, L., Ramakrishnan, M.: BIRCH: an efficient data clustering method for very large databases. Presented at ACM SIGMOD Conference on Management of Data (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafiqul Haque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

AlShaer, M., Taher, Y., Haque, R., Hacid, MS., Dbouk, M. (2017). ProLoD: An Efficient Framework for Processing Logistics Data. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10573. Springer, Cham. https://doi.org/10.1007/978-3-319-69462-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69462-7_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69461-0

  • Online ISBN: 978-3-319-69462-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics