Abstract
Logistics is a data-intensive industry. The information systems used by logistics companies generate massive volume of data which the companies store to perform different types of analysis. In addition, the advent of Big Data technologies and Internet of Things paradigm have given logistics companies an opportunity to use external data stemming from a wide variety of sources including sensors (e.g., GPS), social media and traffic controlling systems. The logistics companies aim to leverage the power of these external data and perform rigorous analysis in real-time to discover intelligence such as unpredictable delay. However, there are different challenges involved. One of the core challenges is integrating and processing a wide variety of data coming from heterogeneous sources. To the best of our knowledge, there is no off-the shelf solution which can address this challenge. In this paper, we present a framework called ProLoD which performs pre-processing and processing tasks with different types of data. Our framework relies on machine learning algorithms, for processing data; however, we found that the ready to use algorithms are not adequate to guarantee processing efficiency. Therefore, we extended an algorithm called Hierarchical Clustering Algorithm. We evaluated ProLoD by comparing its performance with the HCL algorithm found in the widely-adopted machine learning tool called WEKA. We found that ProLoD is performing reasonably better than WEKA in terms of producing optimal number of clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
UPS: http://www.uplonline.com.
- 3.
FedEx: http://www.fedex.com/us/.
- 4.
Facebook: https://www.facebook.com.
- 5.
Twitter: https://twitter.com/?lang=en.
- 6.
- 7.
- 8.
- 9.
We contacted Dr. Albert Bifet the author of the library for assistance because it was only running for a specific number of data points then starts throwing errors but the problem was not solved.
- 10.
- 11.
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB) (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: On High Dimensional Projected Clustering of Data Streams. Data Min. Knowl. Disc. 10, 251–273 (2005)
AL-Zoubi, M.B., Hudaib, A., Al-Shboul, B.: A fast fuzzy clustering algorithm. In: Proceedings of the 6th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, pp. 28–32 (2007)
Babcock, B., Datar, M., O’Callaghan, R.M.L.: Maintaining variance and k-medians over data stream windows. In Proceedings of the 22nd ACM Symposium on Principles of Databases Systems (2003)
Bifet, A., Holmes, G., Kirkby, G., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Chen, J.G., Wiener, L.J., Iyer, S., Jaiswal, A., Lei, R., Simha, N., Wang, W., Wilfong, K., Williamson, T., Yilmaz, S.: Realtime data processing at Facebook. In: Proceedings of the 2016 International Conference on Management of Data (SIGMOD 2016), pp. 1087–1098. ACM, New York (2016). doi:10.1145/2882903.2904441
Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)
Cios, K.J., Pedrycz, W., Swiniarski, R.W.: Data mining and knowledge discovery. Data Mining Methods for Knowledge Discovery. The Springer International Series in Engineering and Computer Science, vol. 458, pp. 1–26. Springer, Boston (1998). doi:10.1007/978-1-4615-5589-6_1
Chennamangalam, J., Karastergiou, A., Armour, W., Williams, C., Giles, M.: ARTEMIS: a real-time data processing pipeline for the detection of fast transients. In: 2015 1st URSI Atlantic Radio Science Conference (URSI AT-RASC), Gran Canaria, Spain, p. 1 (2015). doi:10.1109/URSI-AT-RASC.2015.7303171
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the Sixth SIAM International Conference on Data Mining (2006)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)
Cheng, W.T., Goldgof, B.D., Hall, O.L.: Fast fuzzy clustering. In: Fuzzy Sets and Systems, pp. 49–56 (1998)
Cannon, R., Dave, V.J., Bezdek, C.J.: Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 8(2), 248–255 (1986)
Dhamankar, R. and Gade, K.: Realtime analytics @ twitter. In: Proceedings of the Fifth International Workshop on Cloud Data Management (CloudDB 2013), pp. 1–2. ACM, New York (2013). doi:10.1145/2516588.2516593
Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)
van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., van der Aalst, W.M.P.: The ProM framework: a new era in process mining tool support. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 444–454. Springer, Heidelberg (2005). doi:10.1007/11494744_25
Estivill-Castro, V.: Why so many clustering algorithms: a position paper. SIGKDD Explor. Newsl. 4(1), 65–75 (2002). doi:10.1145/568574.568575
Everitt, B.S., Landau, S., Leese, M., : Cluster Analysis Arnold. A Member of the Hodder Headline Group, London (2002)
Galili, T.: dendextend: an R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics, btv428 (2015)
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science (2000)
Graham, R.L., Hell, P.: On the history of the minimum spanning tree problem. Ann. Hist. Comput. 7(1), 43–57 (1985)
Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: The Elements of Statistical Learning, pp. 485–585. Springer, New York (2009). doi:10.1007/978-0-387-84858-7_14
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley, Hoboken (2013)
Kochhar, A.: Distributed real time data processing for manufacturing organizations. IEEE Trans. Eng. Manage. 24(4), 119–124 (1977). doi:10.1109/TEM.1977.6447256
Jeseke, M., Gruner, M., Wei, F.: Big Data in Logistics - A DHL Perspective on How to Move Beyond the Hype. DHL Customer Solution and Innovation (2015)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.): Machine Learning: An Artificial Intelligence Approach. Springer, Heidelberg (2013)
Khalilian, M., Mustapha, N., Sulaiman, N.M., Boroujeni, Z.F.: KMeans divide and conquer clustering. Presented at ICCAE, Thiland, Bangkok (2009)
Udommanetanakit, K., Rakthanmanon, T., Waiyamai, K.: E-stream: evolution-based technique for stream clustering. In: Proceedings of the 3rd International Conference on Advanced Data Mining and Applications. ADMA (2007)
Hathaway, J.R., Bezdek, C.J.: Extending fuzzy and probabilistic clustering to very large data sets. J. Comput. Stat. Data Anal. 51(1), 215–234 (2006)
Pferd, W.J.: The Challenges of Integrating Structured and Unstructured Data. Technical report. PNEC Conference (2010)
Vryniotis, V.: DatumBox machine learning framework. http://www.datumbox.com/
Meng, X., Bradley, J., Yavuz, B.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17, 1–7 (2016)
Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C., Tseng, V.S.: SPMF: a Java open-source pattern mining library. J. Mach. Learn. Res. (JMLR) 15, 3389–3393 (2014)
Top Logistics Challenges Facing Shippers Today. http://www.logisticsplus.net/top-logistics-challenges-facing-shippers-today/. Date accessed: 30 Mar 2016
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, H.I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Stelzner A.M.: Social Media marketing Industry Report - How Marketers are using Social Media to Grow Their Business. Social Media Examiners (2016)
Trevor, H., Robert, T., Jerome, F.: Hierarchical clustering. In: The Elements of Statistical Learning (PDF), 2nd edn., pp. 520–528. Springer, New York (2009). ISBN 0-387-84857-6
Nwaubani, J.: Business intelligence and logistics. In: Proceedings of the 1st Olympus International Conference on Supply Chain, Katerini, Greece
Mahobiya, C., Kumar, M.: Performance comparison of two streaming data clustering algorithms. Int. J. Comput. Trends Technol. (IJCTT) 12(2) (2014)
Perera, S., Suhothayan, S.: Solution patterns for realtime streaming analytics. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS 2015), pp. 247–255. ACM, New York (2015). doi:10.1145/2675743.2774214
Taxidou, I., Fischer, F.: Realtime analysis of information diffusion in social media. Proc. VLDB Endow. 6, 416–1421 (2013). http://dx.doi.org/10.14778/2536274.2536328
Vadrevu, S., Hui, C., Suju R.T., Punera, K., Dom, B., Smola, J.A., Chang, Y., Zheng, Z.: Scalable clustering of news search results. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM 2011), pp. 675–684. ACM, New York (2011). doi:10.1145/1935826.1935918
Wu, C.H., Horng, S.J., Chen, Y.W., Lee, W.Y.: Designing scalable and efficient parallel clustering algorithms on arrays with reconfigurable optical buses. Image Vis. Comput. 18(13), 1033–1043 (2000)
Zhang, L., Ramakrishnan, M.: BIRCH: an efficient data clustering method for very large databases. Presented at ACM SIGMOD Conference on Management of Data (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
AlShaer, M., Taher, Y., Haque, R., Hacid, MS., Dbouk, M. (2017). ProLoD: An Efficient Framework for Processing Logistics Data. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10573. Springer, Cham. https://doi.org/10.1007/978-3-319-69462-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-69462-7_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69461-0
Online ISBN: 978-3-319-69462-7
eBook Packages: Computer ScienceComputer Science (R0)