Abstract
The scale and growth rate of today’s text collection bring new challenges for index construction. To tackle this problem, Pipeline and Data Parallel Hybrid Algorithm (PDPH), is proposed to improve the indexing performance for multi-core platform. Compared to existing sequential indexing algorithms, Pipeline and data parallelism are introduced by the PDPH to improve the algorithm flexibility and scale the performance with more cores. Evaluations showed this algorithm can improve index construction speed for multi-core platform.
Chapter PDF
Similar content being viewed by others
References
Anh, V.N., Moffat, A.: Inverted Index Compression Using Word-Aligned Binary Codes. Information Retrieval, 151–166 (2005)
Anh, V.N., Moffat, A.: Improved Word-Aligned Binary Compression for Text Indexing. IEEE Transactions on Knowledge and Data Engineering 18, 857–861 (2006)
Trotman, A.: Compressing Inverted Files. Information Retrieval, 5–19 (2003)
Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. Journal of the American Society for Information Science and Technology, 713–729 (2003)
Yue, M., Li, W.: Dynamic indexing for large-scale collections. Journal of Beijing Normal University (Natural Science), 134–137 (2009)
Ling, S., Xue-jun, Y., Lan, M.: Research on Data Organization and Index of EMMDB. Journal of Frontiers of Computer Science & Technology, 742–748 (2010)
Dejiao, N., Tao, C., Yong-zhao, Z., Shiguang, J.: Hierarchical metadata indexing algorithm of mass storage system. Application Research of Computers, 510–513 (2010)
Feng, W.: Study of XML Search Engines Based on Document Type Definition. Journal of Anhui Science and Technology University, 35–39 (2010)
Yue, Z., Hao-min, Y., Qi, Z., Xuan-jing, H.: Distributed Index for Near Duplicate Detection. Journal of Chinese Information Processing, 91–97 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Zhang, S., Li, J. (2014). Pipeline and Data Parallel Hybrid Indexing Algorithm for Multi-core Platform. In: Liu, K., Gulliver, S.R., Li, W., Yu, C. (eds) Service Science and Knowledge Innovation. ICISO 2014. IFIP Advances in Information and Communication Technology, vol 426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55355-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-55355-4_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55354-7
Online ISBN: 978-3-642-55355-4
eBook Packages: Computer ScienceComputer Science (R0)