ProLoD: An Efficient Framework for Processing Logistics Data

AlShaer, Mohammad; Taher, Yehia; Haque, Rafiqul; Hacid, Mohand-Saïd; Dbouk, Mohamed

doi:10.1007/978-3-319-69462-7_44

Mohammad AlShaer^20,23,
Yehia Taher²¹,
Rafiqul Haque²²,
Mohand-Saïd Hacid²⁰ &
…
Mohamed Dbouk²³

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10573))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

1669 Accesses

Abstract

Logistics is a data-intensive industry. The information systems used by logistics companies generate massive volume of data which the companies store to perform different types of analysis. In addition, the advent of Big Data technologies and Internet of Things paradigm have given logistics companies an opportunity to use external data stemming from a wide variety of sources including sensors (e.g., GPS), social media and traffic controlling systems. The logistics companies aim to leverage the power of these external data and perform rigorous analysis in real-time to discover intelligence such as unpredictable delay. However, there are different challenges involved. One of the core challenges is integrating and processing a wide variety of data coming from heterogeneous sources. To the best of our knowledge, there is no off-the shelf solution which can address this challenge. In this paper, we present a framework called ProLoD which performs pre-processing and processing tasks with different types of data. Our framework relies on machine learning algorithms, for processing data; however, we found that the ready to use algorithms are not adequate to guarantee processing efficiency. Therefore, we extended an algorithm called Hierarchical Clustering Algorithm. We evaluated ProLoD by comparing its performance with the HCL algorithm found in the widely-adopted machine learning tool called WEKA. We found that ProLoD is performing reasonably better than WEKA in terms of producing optimal number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
DHL: http://www.dhl.com/en.html.
2.
UPS: http://www.uplonline.com.
3.
FedEx: http://www.fedex.com/us/.
4.
Facebook: https://www.facebook.com.
5.
Twitter: https://twitter.com/?lang=en.
6.
http://www.datumbox.com.
7.
http://www.philippe-fournier-viger.com/spmf/.
8.
http://spark.apache.org/mllib/.
9.
We contacted Dr. Albert Bifet the author of the library for assistance because it was only running for a specific number of data points then starts throwing errors but the problem was not solved.
10.
http://spark.apache.org.
11.
http://storm.apache.org.

References

Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB) (2003)
Google Scholar
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: On High Dimensional Projected Clustering of Data Streams. Data Min. Knowl. Disc. 10, 251–273 (2005)
Article MathSciNet Google Scholar
AL-Zoubi, M.B., Hudaib, A., Al-Shboul, B.: A fast fuzzy clustering algorithm. In: Proceedings of the 6th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, pp. 28–32 (2007)
Google Scholar
Babcock, B., Datar, M., O’Callaghan, R.M.L.: Maintaining variance and k-medians over data stream windows. In Proceedings of the 22nd ACM Symposium on Principles of Databases Systems (2003)
Google Scholar
Bifet, A., Holmes, G., Kirkby, G., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Google Scholar
Chen, J.G., Wiener, L.J., Iyer, S., Jaiswal, A., Lei, R., Simha, N., Wang, W., Wilfong, K., Williamson, T., Yilmaz, S.: Realtime data processing at Facebook. In: Proceedings of the 2016 International Conference on Management of Data (SIGMOD 2016), pp. 1087–1098. ACM, New York (2016). doi:10.1145/2882903.2904441
Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)
Google Scholar
Cios, K.J., Pedrycz, W., Swiniarski, R.W.: Data mining and knowledge discovery. Data Mining Methods for Knowledge Discovery. The Springer International Series in Engineering and Computer Science, vol. 458, pp. 1–26. Springer, Boston (1998). doi:10.1007/978-1-4615-5589-6_1
Chapter Google Scholar
Chennamangalam, J., Karastergiou, A., Armour, W., Williams, C., Giles, M.: ARTEMIS: a real-time data processing pipeline for the detection of fast transients. In: 2015 1st URSI Atlantic Radio Science Conference (URSI AT-RASC), Gran Canaria, Spain, p. 1 (2015). doi:10.1109/URSI-AT-RASC.2015.7303171
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the Sixth SIAM International Conference on Data Mining (2006)
Google Scholar
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)
Google Scholar
Cheng, W.T., Goldgof, B.D., Hall, O.L.: Fast fuzzy clustering. In: Fuzzy Sets and Systems, pp. 49–56 (1998)
Google Scholar
Cannon, R., Dave, V.J., Bezdek, C.J.: Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 8(2), 248–255 (1986)
Article MATH Google Scholar
Dhamankar, R. and Gade, K.: Realtime analytics @ twitter. In: Proceedings of the Fifth International Workshop on Cloud Data Management (CloudDB 2013), pp. 1–2. ACM, New York (2013). doi:10.1145/2516588.2516593
Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)
Google Scholar
van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., van der Aalst, W.M.P.: The ProM framework: a new era in process mining tool support. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 444–454. Springer, Heidelberg (2005). doi:10.1007/11494744_25
Chapter Google Scholar
Estivill-Castro, V.: Why so many clustering algorithms: a position paper. SIGKDD Explor. Newsl. 4(1), 65–75 (2002). doi:10.1145/568574.568575
Article MathSciNet Google Scholar
Everitt, B.S., Landau, S., Leese, M., : Cluster Analysis Arnold. A Member of the Hodder Headline Group, London (2002)
Google Scholar
Galili, T.: dendextend: an R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics, btv428 (2015)
Google Scholar
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)
Article Google Scholar
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science (2000)
Google Scholar
Graham, R.L., Hell, P.: On the history of the minimum spanning tree problem. Ann. Hist. Comput. 7(1), 43–57 (1985)
Article MATH MathSciNet Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: The Elements of Statistical Learning, pp. 485–585. Springer, New York (2009). doi:10.1007/978-0-387-84858-7_14
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley, Hoboken (2013)
Google Scholar
Kochhar, A.: Distributed real time data processing for manufacturing organizations. IEEE Trans. Eng. Manage. 24(4), 119–124 (1977). doi:10.1109/TEM.1977.6447256
Article Google Scholar
Jeseke, M., Gruner, M., Wei, F.: Big Data in Logistics - A DHL Perspective on How to Move Beyond the Hype. DHL Customer Solution and Innovation (2015)
Google Scholar
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Article MathSciNet Google Scholar
Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.): Machine Learning: An Artificial Intelligence Approach. Springer, Heidelberg (2013)
Google Scholar
Khalilian, M., Mustapha, N., Sulaiman, N.M., Boroujeni, Z.F.: KMeans divide and conquer clustering. Presented at ICCAE, Thiland, Bangkok (2009)
Google Scholar
Udommanetanakit, K., Rakthanmanon, T., Waiyamai, K.: E-stream: evolution-based technique for stream clustering. In: Proceedings of the 3rd International Conference on Advanced Data Mining and Applications. ADMA (2007)
Google Scholar
Hathaway, J.R., Bezdek, C.J.: Extending fuzzy and probabilistic clustering to very large data sets. J. Comput. Stat. Data Anal. 51(1), 215–234 (2006)
Article MATH MathSciNet Google Scholar
Pferd, W.J.: The Challenges of Integrating Structured and Unstructured Data. Technical report. PNEC Conference (2010)
Google Scholar
Vryniotis, V.: DatumBox machine learning framework. http://www.datumbox.com/
Meng, X., Bradley, J., Yavuz, B.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17, 1–7 (2016)
MATH MathSciNet Google Scholar
Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C., Tseng, V.S.: SPMF: a Java open-source pattern mining library. J. Mach. Learn. Res. (JMLR) 15, 3389–3393 (2014)
MATH Google Scholar
Top Logistics Challenges Facing Shippers Today. http://www.logisticsplus.net/top-logistics-challenges-facing-shippers-today/. Date accessed: 30 Mar 2016
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, H.I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Stelzner A.M.: Social Media marketing Industry Report - How Marketers are using Social Media to Grow Their Business. Social Media Examiners (2016)
Google Scholar
Trevor, H., Robert, T., Jerome, F.: Hierarchical clustering. In: The Elements of Statistical Learning (PDF), 2nd edn., pp. 520–528. Springer, New York (2009). ISBN 0-387-84857-6
Google Scholar
Nwaubani, J.: Business intelligence and logistics. In: Proceedings of the 1st Olympus International Conference on Supply Chain, Katerini, Greece
Google Scholar
Mahobiya, C., Kumar, M.: Performance comparison of two streaming data clustering algorithms. Int. J. Comput. Trends Technol. (IJCTT) 12(2) (2014)
Google Scholar
Perera, S., Suhothayan, S.: Solution patterns for realtime streaming analytics. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS 2015), pp. 247–255. ACM, New York (2015). doi:10.1145/2675743.2774214
Taxidou, I., Fischer, F.: Realtime analysis of information diffusion in social media. Proc. VLDB Endow. 6, 416–1421 (2013). http://dx.doi.org/10.14778/2536274.2536328
Vadrevu, S., Hui, C., Suju R.T., Punera, K., Dom, B., Smola, J.A., Chang, Y., Zheng, Z.: Scalable clustering of news search results. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM 2011), pp. 675–684. ACM, New York (2011). doi:10.1145/1935826.1935918
Wu, C.H., Horng, S.J., Chen, Y.W., Lee, W.Y.: Designing scalable and efficient parallel clustering algorithms on arrays with reconfigurable optical buses. Image Vis. Comput. 18(13), 1033–1043 (2000)
Article Google Scholar
Zhang, L., Ramakrishnan, M.: BIRCH: an efficient data clustering method for very large databases. Presented at ACM SIGMOD Conference on Management of Data (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Université Claude Bernard Lyon 1, 43 Boulevard du 11 Novembre 1918, 69100, Villeurbanne, France
Mohammad AlShaer & Mohand-Saïd Hacid
Université de Versailles Saint-Quentin-en-Yvelines (UVSQ), 55 Avenue de Paris, 78000, Versailles, France
Yehia Taher
Cognitus SAS, 5 Rue Lacharrire, 75011, Paris, France
Rafiqul Haque
Lebanese University, Beirut, Lebanon
Mohammad AlShaer & Mohamed Dbouk

Authors

Mohammad AlShaer
View author publications
You can also search for this author in PubMed Google Scholar
Yehia Taher
View author publications
You can also search for this author in PubMed Google Scholar
Rafiqul Haque
View author publications
You can also search for this author in PubMed Google Scholar
Mohand-Saïd Hacid
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Dbouk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafiqul Haque .

Editor information

Editors and Affiliations

University of Lorraine, Nancy, France
Hervé Panetto
Odisee University College, Brussels, Belgium
Christophe Debruyne
Télécom SudParis, Évry, France
Walid Gaaloul
Tilburg University, Tilburg, The Netherlands
Mike Papazoglou
Freie Universität Berlin and Fraunhofer FOKUS, Berlin, Germany
Adrian Paschke
Università degli Studi di Milano, Crema, Italy
Claudio Agostino Ardagna
TU Graz, Graz, Austria
Robert Meersman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

AlShaer, M., Taher, Y., Haque, R., Hacid, MS., Dbouk, M. (2017). ProLoD: An Efficient Framework for Processing Logistics Data. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10573. Springer, Cham. https://doi.org/10.1007/978-3-319-69462-7_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-69462-7_44
Published: 20 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69461-0
Online ISBN: 978-3-319-69462-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics